International Journal of Engineering Trends and Technology (IJETT) – Volume 9 Number 13 - Mar 2014 A Combined Approach for Image Annotation based on Feature Extraction and Meta Data Techniques D.N.D.Harini#1, D.Lalitha Bhaskari*2 1 Research Scholar, Dept. of CS&SE, AUCE (A), Andhra University, Visakhapatnam, Andhra Pradesh, India. 2 Professor, Dept. of CS&SE, AUCE (A), Andhra University, Visakhapatnam, Andhra Pradesh, India. Abstract— During the past few years image annotation has attracted a lot of attention among the researcher’s. Several important features are computed offline for each image in the database and the appropriate features or annotations are embedded into the respective images using watermarking techniques. Based on the earlier work of the authors on image retrieval techniques, similar images are retrieved and the technique of watermarking is used to annotate the images. The idea of implementing Least Significant Bit watermarking technique is used only for annotating the images rather than for security. In this paper, the experimental results are based on a database of 1400 images which are divided into 14 different classes to demonstrate the effectiveness of the proposed scheme. The methodology works on taking any image as input, retrieve similar images and based on the similar images, annotating of the image is done. The experimental results show that the proposed methodology is effective, practicable and the retrieval performance will not be affected by watermarking procedure. Keywords—Feature extraction, image retrieval, classification, annotation, Least Significant Bit (LSB) watermarking. I. INTRODUCTION In recent years, rapid advances of digital cameras and various image processing tools made huge progress in collection and archiving of varied image databases which has necessitated the need to develop an efficient system for image retrieval. While tools based on keywords exist, they don’t meet the user requirements because of the language dependency. So, a tool independent of language is to be developed based on the image properties. These properties [1] can be for example color, shape, texture, spatial location of shape etc. which are also termed as features. The extraction of the image features should carry enough information about the image in order to retrieve the maximum number of similar images. Efficient image database retrieval can be done through a system that is able to automatically extract relevant features directly from the images stored in the database and further the retrieved images should be classified and annotated. Classification is one of the most complex tasks to be performed by the system and requires large computational effort. In this paper, the novel idea is to use the concept of watermarking (data hiding) for the purpose of image ISSN: 2231-5381 annotation. The total framework in this paper is divided into three phases. The first phase deals with building up the image database which is used as a training database. A few key words relevant to the image are embedded into each image in the training database using any of the existing watermarking techniques. For this work, LSB watermarking technique is used for embedding of the keywords into the images due to its ease of operation. The second phase deals with image retrieval using color percentage, GLCM, Wavelets, PCA and Relevance Feedback [2, 3].This phase performs all the general operations such as segmentation, feature extraction and data compression and generates a set of images similar to the query image. Finally the third phase deals with extracting the watermark (keyword) from the images obtained in the second phase. Based upon the extracted watermarks of each image which are obtained in the second phase, annotation is done. A detailed description of each and every phase is discussed in section 2 followed by brief discussions about the different data sets used. The methodology adopted is presented in section 3. Results and observations are discussed in section 4 followed by conclusions and references. II. RELATED INFORMATION A. Image Retrieval In this present digital era, huge volumes of multimedia data are available over the internet among which there are a vast number of digital images. Among the available huge amount of images, the task of automated image retrieval is complicated by the fact that many images do not have adequate textual descriptions. Retrieval of images through visual content analysis is an exciting and a worthwhile research challenge. Even though the most common features considered for image retrieval are color, shape and texture, there are many different approaches available for image retrieval which are proposed by earlier researchers [1]. The work in this paper is an extension to the earlier work done by the author’s[2,3] which deals with calculation of color percentages using color histograms, evaluating texture http://www.ijettjournal.org Page 644 International Journal of Engineering Trends and Technology (IJETT) – Volume 9 Number 13 - Mar 2014 features using Grey level Co-occurrence matrix (GLCM), retrieving and classification rather than security, the usage of Haar wavelets for shape, Principle Component Analysis (PCA) LSB watermarking technique is worth and it proved to be for dimensionality reduction and Relevance Feedback (RF) to effective. improve the efficiency of image retrieval. E. Least Significant Bit (LSB) Technique B. Image Classification & Annotation This is the one of the earliest and simplest among the Image classification plays a vital role of grouping images steganography methods used. In this method the embedding into meaningful categories using low-level visual features and process consists of replacing every message (keyword or data) is an ever challenging task in image annotation. This can be bits sequentially into the Least Significant Bit (LSB) of the very useful for image/video tagging and retrieval. Image pixels in the cover (original) image. The methodology classification [1] aims to find a description that can best followed in this work is a slight modification to the LSB describe the images in one class and to distinguish these algorithm which is explained below. images from all the other classes. Image classification is the task of assigning objects to one of several predefined Embedding of the Keyword: categories and is widely used in mining image information, Here embedding of the keyword starts from the middle especially spatial information from image databases. most value of the given image dimensions. Consider an image Automatic image annotation (AIA) emerges in recent years, of size 1024 X 768 and the starting position (x, y) is and it attempts to replace a huge amount of manual efforts for calculated as (384,512) where x=768/2 and y=1024/2. After image annotation. Automatic image annotation is an extension determining the position, each bit in the keywords is replaced of image recognition. The input for an AIA method is an in the R, B, G, and B channels of each pixel as follows: image. The output is a set of words (also referred as classes), from a given dictionary, which describe the input image in a Bits 1 2 3 4 5 6 7 8 best possible way. Color R G B R G B R G C. Watermarking channel Watermarking [4,8,9,10] as it is defined is the practice of hiding a message about an image, audio clip, (x, y) (x+1,y+1) (x+2,y video clip or other work of media within that work itself. The +2) aim of digital watermarking is to embed information into any multimedia data to ensure a security service or simply a The second pixel where the next bit in the keyword is to be labeling application. The embedded information is called embedded is obtained by incrementing x and y.As an example, watermark. In general, the watermarks are either visible or the keyword ‘tiger’ is converted to its equivalent binary as invisible. In visible watermark technique, the watermark can 0010111010010110111001101010011001001110. The middle be seen on the image directly without using any extraction value is calculated as 187 as shown below. process whereas in the invisible watermarking technique the watermark is not visible. In this work, the novel approach is that the concept of invisible watermarking is used for information hiding which is further used in annotation of images. D. Applicability of Watermarking Annotation for Image The approach and novelty in this paper mainly attributes towards the applicability of watermarking for image annotation. The idea is to embed the relevant keywords into every image in the training database based on randomly generated Least Significant Bit (LSB) watermarking technique and then extract the keywords for annotation. Numerous Steganography & Watermarking techniques were developed by various researchers for embedding messages into an image under both spatial and frequency domains [5,6]. Among the available many techniques the simplest technique is the LSB technique in which the every bit of the data (here keywords) is embedded into the least significant bits of each pixel in the cover image. Even though this method is simple to implement it is not recommended for applications where security is of major concern. Since our main idea is easy embedding, ISSN: 2231-5381 Fig 1: Calculating the middle value. The next pixel in which the next bit of the keyword is to be embedded is represented in red color cells as shown in the figure 2. http://www.ijettjournal.org Page 645 International Journal of Engineering Trends and Technology (IJETT) – Volume 9 Number 13 - Mar 2014 Fig 2: Calculating increment values Extraction of the Keyword: To extract the keyword read the LSB values of R,G,B channel values starting from the center pixel unless and until all the bits are extracted. In the example above take the R, G, B of (384,512) pixel’s least significant bits. Increment the x & y values and repeat the procedure. III. PROPOSED METHODOLOGY A detailed discussion about the proposed methodology for efficient annotation of images is presented in this section. The work in this paper progressed in three phases which includes training the image dataset using watermarking as a first phase, low level features like color and texture extraction, high level features like wavelets and PCA are applied for efficient and refined image retrieval in the second phase. The third phase involves extraction of the embedded keywords (watermarks) from the similar images retrieved and based on these keywords the images are classified. The work in this paper progressed in three phases which includes training the image dataset using watermarking as a first phase, low level features like color and texture extraction, high level features like wavelets and PCA are applied for efficient and refined image retrieval in the second phase. The third phase involves extraction of the embedded keywords (watermarks) from the similar images retrieved and based on these keywords the images are classified. A detailed explanation of the three phases is given below F. Datasets used The images considered in this paper are taken from the famous Wang’s database [7] and a few (500) images from other sources. The total image database is categorized into 14 different datasets as shown below. S. No. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 Dataset Beach Buildings Bus Elephant Food Horse Kangaroo Mountain People Dinosaur Roses Sky Tiger Bear Total No. of Images Number of Images 100 100 100 100 130 100 100 100 100 100 100 100 100 100 Fig 3: Work Flow of Different Phases Phase 1(Embedding the Keywords) Embedding of the keywords is done by using LSB watermarking technique [4]. In this method, each bit in the keyword is embedded into the least significant bit of every pixel consecutively starting from the centre pixel of the image. There will not be any modifications in the image as significant modifications will not occur using LSB technique. As security is not of prime importance in this work, adaption of LSB watermarking technique is acceptable. Input: Images taken from the database Output: Watermarked Database 1400 Table1. Details of the image database ISSN: 2231-5381 Phase 2(Image retrieval based on low and high level features): The watermarked database from phase 1 is to be trained. This is done by computing the low level features like http://www.ijettjournal.org Page 646 International Journal of Engineering Trends and Technology (IJETT) – Volume 9 Number 13 - Mar 2014 the different color components, GLCM to compute texture features and high level features like Wavelets for Shape detection and PCA for dimensionality reduction. After feature calculations based on the query image similar images are retrieved. Input: Watermarked database Output: Set of similar images Database Watermarked Database Embedding of keywords Input: Set of similar images retrieved Output: Annotation of the query image. IV. RESULTS AND DISCUSSIONS Image annotation and image retrieval are essential image understanding tasks which are required by the computers to see and interpret the visual world. Both the tasks aim to relate low level image features to a semantic concept of similarity.To show the effectiveness of the proposed method, results of five positive test cases and two negative test cases images are shown here. F E A T U R E Query Image Compute Color and Texture Features Compute Wavelets and PCA Features RF Query Image Features E X T R A C T I O N Watermarked Image Features Similarity Matching Similar Images Extract watermarks Annotation Fig 4: Proposed Architecture Phase 3(Extraction of Watermarks and Annotation) In this phase the embedded keywords are extracted from each of the similar images obtained in phase 2. Annotation is done based on the maximum count of the keywords extracted. ISSN: 2231-5381 Fig 5.1: Snapshots of image database http://www.ijettjournal.org Page 647 International Journal of Engineering Trends and Technology (IJETT) – Volume 9 Number 13 - Mar 2014 Figures 6.2, 7.2, 8.2, 9.2, 10.2 are the similar images obtained during phase 2. Figures 6.3, 7.3, 8.3, 9.3, 10.3 are the annotated images obtained during phase 3. Figures 11.2, 11.3, 12.2, 12.3 are cases of negative results. Fig 6.1: Query image Fig 6.2: Similar images retrieved based on Phase 2. Fig 6.3: Annotated image Fig 7.1: Query image Fig 5.2: Snapshot of watermarked database during phase 1. In our experiments, a total of 1400 images under 10 different categories are chosen which are mentioned in Table1. Figures 5.1 and 5.2 are snapshots of image database and watermarked database respectively. Fig 7.2: Similar images retrieved based on Phase 2. Figures 6.1, 7.1, 8.1, 9.1, 10.1, 11.1, 12.1 are five query images. ISSN: 2231-5381 http://www.ijettjournal.org Page 648 International Journal of Engineering Trends and Technology (IJETT) – Volume 9 Number 13 - Mar 2014 Fig 9.2: Similar images retrieved based on Phase 2. Fig 7.3: Annotated image. Fig 8.1: Query image Fig 9.3: Annotated image Fig 8.2: Similar images retrieved based on Phase 2. Fig 10.1: Query image Fig 10.2: Similar images retrieved based on Phase 2. Fig 8.3: Annotated images Fig 9.1: Query image ISSN: 2231-5381 Fig 10.3: Annotated image http://www.ijettjournal.org Page 649 International Journal of Engineering Trends and Technology (IJETT) – Volume 9 Number 13 - Mar 2014 Fig 11.1: Query image Fig 12.3: Annotated image Fig 11.2: Similar images retrieved based on Phase 2. V. CONCLUSIONS The proposed idea of using watermarking technique for image annotation proved to be very efficient and is 100% accurate. But due to the limitations of similar image retrieval process, as a whole is not able to reach high accuracy. The results obtained are satisfactory and the methodology proved to be efficient except for a few combinations like (i) elephant and bear (ii) beach and sky (iii) people and food (iv) bus and people. Using the watermarking technique alone for annotating images is proved to be 100% successful. Since this proposed methodology includes retrieval of similar images based on an input query image, the results obtained are 90%successful. This combined approach of image retrieval and image annotation based on watermarking technique proves to be promising for future research of image understanding. Fig 11.3: Annotated image VI. REFERENCES 1. D.N.D.Harini and Dr.D.Lalitha Bhaskari, “Image Mining Issues and Methods Related to Image Retrieval System”, 2011, International Journal of Advanced Research in Computer Science, Volume 2, No. 4, July-August 2011 in ISSN No. 0976-5697. 2. D.N.D.Harini and Dr.D.Lalitha Bhaskari 2011, “Identification of Leaf Diseases in Tomato Plant Based on Wavelets and PCA”, 2011 World Congress on Information and Communication Technologies, 978-1-4673-0125-1_c 2011 IEEE, pg. no: 1398 – 1403. 3. D.N.D.Harini and Dr.D.Lalitha Bhaskari, “Image Retrieval System Based on Feature Extraction and Relevance Feedback”,CUBE 2012, September 3–5, 2012, Pune, Maharashtra, India. Copyright 2012 ACM 978-1-4503-11854/12/09. 4. D.Lalitha Bhaskari, P.S.Avadhani, A.Damodaram, "Watermark Insertion Algorithm Fig 12.1: Query image Fig 12.2: Similar images retrieved based on Phase 2. ISSN: 2231-5381 http://www.ijettjournal.org Page 650 International Journal of Engineering Trends and Technology (IJETT) – Volume 9 Number 13 - Mar 2014 5. 6. implementation using Auxiliary carry and LSB methods" ,3-5 January, 2006 at International Conference on Systemics, Cybernetics and Informatics organized by Pentagram Research Center, Hyderabad, INDIA. D. Lalitha Bhaskari, P. S. Avadhani, A. Damodaram, “A Combinatorial Approach for Information Hiding Using Steganography And GÖdelization Techniques” in the Journal of IJSCI(International Journal of Systemics, Cybernatics and Informatics),2009,January, pgs 21-24, ISSN 0973-4864. P.S.Avadhani, D.Lalitha Bhaskari, “A Blind Scheme Watermarking Technique Using GÖdelization Technique for RGB images under spatial domain”, 8th -9th April 2010, International Conference on i-warfare (ICIW-2010), Dayton, Ohio, USA. 7. http:// www-db.stanford.edu /~ wangz/ image.vary.jpg.tar. 8. Abdullah Bamatraf ,Mohd. Najib B. MohdSalleh, Rosziati Ibrahim, “Digital Watermarking Algorithm Using LSB”, 2010 International Conference on Computer Applications and Industrial Electronics (ICCAIE 2010), December 5-7, 2010, Kuala Lumpur, Malaysia, 978-14244-9055-4/10/$26.00 ©2010 IEEE. 9. Jong Yun Jun, Kunho Kim, Jae-PilHeo, and Sung-eui Yoon, “IRIW: Image Retrieval based Image Watermarking for Large-Scale Image Databases”, IWDW 2011, LNCS 7128, pp – 126-141, 2012 © Springer – Verlag Berlin Heidelberg 2012. 10. JindongXu,Wen-hua Qin ,Meng-ying Ni, “A New Scheme of Image Retrieval Based Upon Digital Watermarking”, 2008 International Symposium on Computer Science and Computational Technology, 978-0-7695-34985/08 $25.00 © 2008 IEEEDOI 10.1109/ISCSCT.2008.52 ISSN: 2231-5381 http://www.ijettjournal.org Page 651