Document 14681180

advertisement
International Journal of Advancements in Research & Technology, Volume 2, Issue 4, April-2013
ISSN 2278-7763
1
357
Retrieving Content Based Images with Query point technique
based on K-mean Clustering
Miss Dipti S. Zambre, Miss Sonal P. Patil
1
Research Fellow of Department of Computer Science, G.H.R.I.E.M., Jalgaon, India; 2Department of Computer Science, G.H.R.I.E.M., Jalgaon,
India
Email: 1zambredipti@gmail.com, 2sonalpatil3@gmail.com
ABSTRACT
This paper provides an overview of the technical achievements in the research area of k-means clustering in content-based image retrieval (CBIR). K-means clustering is a powerful technique in CBIR systems, in order to improve the performance of CBIR
effectively. It is an open research area to the researcher to reduce the local maximum trapping and number of iteration to find
target image. The paper covers the current state of art of the research in k-means clustering in CBIR and query point method for
the system.
Keywords : Clustering, K-means clustering, content based image retrieval, feature extraction.
1 INTRODUCTION
Image retrieval is the process of browsing, searching and retrieving images from a large database of digital images. The
collection of images in the web are growing larger and becoming more diverse. Retrieving images from such large collections is a challenging problem. One of the main problems they
highlighted was the difficulty of locating a desired image in a
large and varied collection. While it is perfectly possible to
identify a desired image from a small collection simply by
browsing, more effective techniques are needed with collections containing thousands of items. To search for images, a
user may provide query terms such as keyword, image
file/link, or click on some image, and the system will return
images "similar" to the query. The similarity used for search
criteria could be meta tags, color distribution in images, region/shape attributes, etc. Unfortunately, image retrieval systems have not kept pace with the collections they are searching. The shortcomings of these systems are due both to the
image representations they use and to their methods of accessing those representations to find images. The problems of image retrieval are becoming widely recognized, and the search
for solutions an increasingly active area for research and development.
In recent years, with large scale storing of images the need
to have an efficient method of image searching and retrieval
has increased. It can simplify many tasks in many application
areas such as biomedicine, forensics, artificial intelligence,
military, education, web image searching etc. while dealing
with this sort of information like organizing and searching
large volumes of images in databases, users can have difficulties, as the current commercial database systems are designed
for textual data, it is not well suited and compatible for digital
images. Therefore there is a need for an efficient way for image retrieval.
There are different ways to retrieve the images in CBIR including QBIC [1], Photobook [2], MARS [3], NeTra [4],
Copyright © 2013 SciResPub.
PicHunter [5], Blobworld [6], VisualSEEK [7].
In a typical CBIR system, low-level visual image features
(e.g., color, texture, and shape) are automatically extracted for
image descriptions and indexing purposes. To search for desirable images, a user presents an image as an example of
similarity, and the system returns a set of similar images based
on the extracted features. In CBIR systems with relevance
feedback (RF), a user can mark returned images as positive or
negative, which are then fed back into the systems as a new,
refined query for the next round of retrieval. The process is
repeated until the user is satisfied with the query result. Relevance feedback uses the user’s knowledge for the marking
images, but user’s knowledge may be accurate or inaccurate. It
is not 100% accurate knowledge. Due to this we will get more
number of irrelevant images and inaccuracy. To avoid that
inaccuracy we are not integrating relevance feedback in proposed system.
To overcome the inaccuracy we are proposing k-means
clustering technique which can be applied for scalable image
retrieval from large databases. Image classification or categorization has often been treated as a pre-processing step for
speeding-up image retrieval in large databases and improving
accuracy, or for performing automatic image annotation. Image clustering inherently depends on a similarity measure,
image categorization has been performed by varied methods
that neither require nor make use of similarity metrics. Image
categorization is often followed by a step of similarity measurement, restricted to those images in a large database that
belong to the same visual class as predicted for the query. In
such cases, the retrieval process is intertwined, whereby categorization and similarity matching steps together form the
retrieval process. Similar arguments hold for clustering as
well, due to which, in many cases, it is also a fundamental
“early” step in image retrieval.
IJOART
358
International Journal of Advancements in Research & Technology, Volume 2, Issue 4, April-2013
ISSN 2278-7763
2 PROBLEM DEFINITIONS
Many CBIR systems have been developed for efficient and
effective image retrieval from a large multimedia database. In
a typical CBIR system, low-level visual image features (e.g.,
color, texture, and shape) are automatically extracted for image descriptions and indexing purposes. To search for desirable images, a user presents an image as an example of similarity, and the system returns a set of similar images based on the
extracted features. There are two general types of image
search: target search and category search. The goal of target
search is to find a specific (target) image, such as a registered
logo, a historical photograph, or a particular painting. The
goal of category search is to retrieve a given semantic class or
genre of images, such as scenery images or skyscrapers. These
strategy leads to the following disadvantages:
Local Maximum Trap Problem: In local maximum trap there is
no guarantee that the target can be found.
Slow convergence: In slow convergence problem more number
of iterations is required to lead to the target image.
The proposed methodology will be used to retrieve images
in the database that are nearest to query image. The methodology efficiently retrieves image, base on CBIR concept, in
which a clustering approach is used to efficiently prune the
search space and initializing it to minimum bounding box in
order to lead to target image with minimum number of iterations and overcome the problem of local maximum trap and
slow convergence.
2
images. So that it reduces the storage required and hence the
system becomes faster and effective in CBIR. Once the features are extracted, they are stored in the database for future
use. The degree to which a computer can extract meaningful
information from the image is the most powerful key to the
advancement of intelligent image interpreting systems. One of
the biggest advantages of feature extraction is that, it significantly reduces the information (compared to the original image) to represent an image for understanding the content of
that image. The most commonly used features are color, texture, and shape.
3 SYSTEM ARCHITECTURE
Fig .3.1 shows the general scheme of image retrieval from a
database using K-mean clustering. Image retrieval is according to the similarities between the query image and images in
image database, ignoring the similarities between images in
image database. The paper applies the clustering algorithm to
further explore the similarities between images in image database for reducing the image retrieval space.
To retrieve the image from the database, we first extract feature vectors from images (the features can be shape, color,
texture etc), then store feature vectors into another database
for future use. When given query image, we similarly extract
its feature vectors, and match those features with database
image features. If the distance between two images feature
vectors is small enough; we consider the corresponding image
in the database similar to the query.
The system could be any real value symmetric image retrieval system. First, extract the image features of each image
in image database and apply the clustering algorithm to analysis the similarities of images in the database for constructing
the images clustering database, then, input the query image,
extracting its features and comparing the similarities between
features of it and those of images in image clustering database,
and output the best matching results.
A. FEATURE EXTRACTION
Feature extraction is the basis of content based image retrieval.
It involves extracting the meaningful information from the
Copyright © 2013 SciResPub.
Fig 3.1: System Architecture
a. Colour Features
Color is linked to the chromatic part of an image. Color represents one of the most widely used visual features in CBIR systems. First a color space is used to represent color images.
Typically, RGB space where the gray level intensity is
represented as the sum of red, green and blue gray level intensities [8].In image retrieval a histogram is employed to represent the distribution of colors in image [9]. A color histogram
provides allotment of colors which is achieved by damaging
image color and plus how many numbers of pixels fit into every color. For the whole collection every image’s color histogram is examined and saved in the database. Retrieval of
those images has been done in the matching process whose
color allotment matches to the example query very much [10].
IJOART
359
International Journal of Advancements in Research & Technology, Volume 2, Issue 4, April-2013
ISSN 2278-7763
b. Texture Features
Texture is another important property of images. It refers to
the visual patterns that have property of homogeneity or arrangement that do not result from the presence of only a single color or intensity. By dissimilarity in brightness with high
frequencies in the image spectrum, textures are characterized.
While making a distinction between areas of the mages with
same color, these features are very useful. Measuring of image
texture such as the degree of coarseness, contrast, directionality, line likeness, regularity and roughness can be calculated
using second-order statistics [10].
c. Shape Features
By either the global form of the shape or local elements of its
boundary, shape features can be differentiated. Global form of
the shape: like the area, the extension and the major axis orientation. Local elements of its boundary: like corners, characteristics points or curvature elements. The degree of similarity
between two shapes is evaluated through standard mathematical distances measures, like Euclidian distance between two
pairs [11].
B. Similarity Measures
In similarity measure, the query image feature vector and database image feature vector are compared using the distance
metric. The images are ranked based on the distance value. It
is proposed in [12], the detailed comparison of nine different
metrics such as Manhattan, weighted mean-variance, Euclidean, Chebychev, Mahanobis etc distance for texture image retrieval with empirical evaluation. They found that Canberra
and Bray-Curtis distance metrics performed exceptionally well
than all other distance metrics.
4 K-MEANS CLUSTERING
Clustering algorithm has been widely used in computer vision
such as image segmentation and database organization. KMeans is a clustering method based on the optimization of an
overall measure of clustering quality is known for its efficiency in producing accurate results in image retrieval. In K-means
clustering algorithms, group the images into clusters based on
the color content, the clustered images apply to K-Means, so
that we can get better favoured image results.
After clustering and selecting the cluster centers, the
given query image is first compared with all the cluster centers. The clusters are ranked according to their similarity with
the query. Then the query image is compared directly with the
images in these clusters. Thus, the number of comparisons is
reduced considerably from comparing the query with all the
images in the database. This process iterates until the criterion
function converges. Thus, the retrieval will be very accurate.
The number of similarity comparisons required depends on
the sizes of the clusters and the number of clusters being examined. It leads to the better performance.
Copyright © 2013 SciResPub.
3
The purpose of clustering is to group images whose feature
vectors are similar by similarity judgment standard; meanwhile to separate the dissimilar images. Clustering algorithms
can be broadly divided into two groups: hierarchical and
partitional.
Hierarchical clustering algorithms recursively find nested
clusters either in agglomerative mode (starting with each data
point in its own cluster and merging the most similar pair of
clusters successively to form a cluster hierarchy) or in divisive
(top-down) mode (starting with all the data points in one cluster and recursively dividing each cluster into smaller clusters).
Compared to hierarchical clustering algorithms, partitional
clustering algorithms find all the clusters simultaneously as a
partition of the data and do not impose a hierarchical structure. Input to a hierarchical algorithm is an n*n similarity matrix, where n is the number of objects to be clustered. On the
other hand, a partitional algorithm can use either an n*d pattern matrix, where n objects are embedded in a d-dimensional
feature space, or an n*n similarity matrix. Note that a similarity matrix can be easily derived from a pattern matrix, but ordination methods such as multi-dimensional scaling (MDS)
are needed to derive a pattern matrix from a similarity matrix.
The most popular and the simplest partitional algorithm is Kmeans. Since partitional algorithms are preferred in pattern
recognition due to the nature of available data, K-means has a
rich and diverse history as it was independently discovered in
different scientific fields, it is one of the most widely used algorithms for clustering. Ease of implementation, simplicity,
efficiency, and empirical success are the main reasons for its
popularity. In the paper, we apply k-means algorithm to analysis images similarities in the database [13].
5 QUERY POINT MOVEMENT TECHNIQUE
It is essentially tries to improve the estimate of the ‘ideal query
point’ by moving it towards good example points and away
from bad example points. Query is represented by a single
point in a feature space and refinement process attempts to
move that point toward the direction where relevant points
were located. The frequently used technique to iteratively improve this estimation is Rocchio’s formula [14].
Query processing technique evaluate three types of queries
(i.e., sampling queries, constrained sampling queries, and constrained k-NN queries) based on our four target search methods. The query cost is the sum of disk seek (including cylinder
seek and rotation), data transfer, and CPU time, in which seek
time dominates the total query cost. Query processing technique, designed to minimize the disk I/O cost [15].
6 CONCLUSIONS
Image retrieval algorithms always use the similarity between
the query image and images in image database. However,
they ignore the similarities between images in image database.
In this paper we addressed this problem by introducing a clustering approach for image retrieval by finding image similarity clustering to reduce the images retrieving space.
IJOART
360
International Journal of Advancements in Research & Technology, Volume 2, Issue 4, April-2013
ISSN 2278-7763
K-means clustering, which also can improve the efficiency
of image retrieving and evidently promote retrieval precision.
This k-means algorithm independent on the feature extraction
algorithm is used as a post-processing step in retrieval. The
improvements in selecting neighbourhood vertices of the retrieval results from tradition image retrieval system in image
feature space could also improve the recall rate.
We can conclude from the results that the proposed system
achieve high retrieval performance, reduce the local maximum
trapping and number of iteration. Due to which the retrieval
speed is increased.
4
[15] Danzhou Liu,Kein A. Hua, Khanh Vu, Ning Yu, “ Fast Query Point Movement Techniques for Large CBIR Systems”, IEEE Trans on Know and data
engg, vol.21 no.5,May 2009.
REFERENCES
[1]
M. Flickner, H.S. Sawhney, J. Ashley, Q. Huang, B. Dom, M. Gorkani, J.
Hafner, D. Lee, D. Petkovic, D. Steele, and P. Yanker,“Query by Image and
Video Content: The QBIC System,”Computer, vol. 28, no. 9, pp. 23-32, Sept.
1995.
[2]
A. Pentland, R.W. Picard, and S. Sclaroff, “Photobook: Content-Based Manipulation for Image Databases,” Int’l J. Computer Vision,vol. 18, no. 3, pp.
233-254, 1996.
[3]
M. Ortega- Binderberger and S. Mehrotra, “Relevance FeedbackTechniques
in the MARS Image Retrieval Systems,”Multimedia Systems, vol. 9, no. 6, pp.
535-547, 2004.
[4]
W.Y. Ma and B. Manjunath, “Netra: A Toolbox for NavigatingLarge Image
Databases,” Proc. IEEE Int’l Conf. Image Processing (ICIP ’97), pp. 568-571,
1997.
[5]
I.J. Cox, M.L. Miller, T.P. Minka, T.V. Papathomas, and P.N. Yianilos, “The
Bayesian Image Retrieval System, PicHunter: Theory, Implementation, and
Psychophysical Experiments,” IEEE Trans. Image Processing, vol. 9, no. 1, pp.
20-37, 2000.
[6]
C. Carson, S. Belongie, H. Greenspan, and J. Malik, “Blobworld: Image Segmentation Using Expectation-Maximization and ItsApplication to Image
Querying,” IEEE Trans. Pattern Analysis and Machine Intelligence, vol. 24,
no. 8, pp. 1026-1038, Aug. 2002.
[7]
J.R. Smith and S.-F. Chang, “VisualSEEk: A Fully Automated Content-Based
Image Query System,” Proc. Fourth ACM Int’l Conf. Multimedia (MULTIMEDIA ’96), pp. 87-98, 1996.
[8]
A.C. She and T. S. Huang, "Segmentation of road scene using color and fractal
based texture classification," in Proc.ICIP, Austin,Nov 1994.
[9]
J. R Smith and S.F.Chang, "A fully automated content based image query
systems," in Proc. ACM Multimedia 96, 1996.
[10] C. Carson, S. Belongie, H.Greenspan and J.malik "Image segmentation using
expectation maximization and its application to image querying" IEEE Trans.
Pattern Anal. Mach. Intell, 2002, p.1026-1038
[11] M.Borowski, L.Brocker, S. Heisterkamp, J. Loffler, “Structuring the visual
content of Digital Libraries Using CBIR Systems”, IEEE, pp.288-293,2000.
[12] Manesh Kokare, B.N. Chatterji and P.K Biswas, “Comparision of Similarity
Metrics for Texture Image Retrieval” in IEEE TENCON, pp. 571-575, 2003.
[13] Hong Liu 1, and Xiaohong Yu 2, “Application Research of kmeans Clustering
Algorithm in Image Retrieval System”, in Proc ISCSCT, China, Dec. 2009
[14] Rocchio JJ, “Relevance Feedback in Information Retrieval” in the SMART
Retrieval system, pp. 313-323, prentice Hall. 1971.
Copyright © 2013 SciResPub.
IJOART
Download