International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 4, Issue 4, April 2015 ISSN 2319 - 4847 A Survey on Reinforced Similarity Integration in Heterogeneous Image-Rich Information Networks Swati Todalbagi1, Jagruti Parmekar2, Shalini Pansari3, Prof. Priyadarshani V. Kalokhe4 1 Alard College of Engineering, University of Pune 2 Alard College of Engineering, University of Pune 3 Alard College of Engineering, University of Pune 4 Alard College of Engineering, University of Pune ABSTRACT Web-scale image search engines (e.g., Google image search, Bing image search) mostly rely on surrounding text features. It is difficult for them to interpret users’ search intention only by query keywords and this leads to ambiguous and noisy search results which are far from satisfactory. It is important to use visual information in order to solve the ambiguity in text-based image retrieval. In this paper, we propose a novel Internet image search approach. It only requires the user to click on one query image with minimum effort. Based on both visual and textual content images from a pool are retrieved. In this paper, a system is designed for performing image retrieval in image-rich information networks. In this paper, minimum order k-SimRank to significantly improve the speed of SimRank, one of the most popular algorithms is proposed for computing node similarity in information networks. Keywords: Image Retrieval, Information Network, Ranking 1. INTRODUCTION In this paper, we study the problem of performing image retrieval in image-rich information network. Social image sharing websites, such as Flickr and Facebook, have billions of user submitted images and the users interact with the images by social annotations and interest groups, thus forming image-rich information networks. Take Flickr as an example, images are tagged by the users and image owners contribute images to topic groups, forming an information network, as shown in Figure 1. Figure 2 shows another network of Amazon products with product images. Figure 1. Flickr image information network Searching images in such large information networks is very useful but also challenging, for example, user annotations are noisy, incomplete and there are many similar interests or product groups. For keyword based retrieval, we need to find similar annotations to avoid missing relevant images. WordNet does not work for such noisy terms, while Google Distance is too general. For content based image retrieval, traditional methods [1] are only based on image features (or the surrounding text) and do not consider the network structure. In addition, these tasks are traditionally treated Volume 4, Issue 4, April 2015 Page 78 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 4, Issue 4, April 2015 ISSN 2319 - 4847 separately. However, by forming an image information network, we can solve them simultaneously within a general framework. Figure 2. Amazon image information network SimRank [2] is one of the most popular algorithms for evaluating object similarity in information networks. It calculates the similarity between objects based on the intuition that two objects are similar if they are linked by similar objects in the network. There are two disadvantages with SimRank: (1) It is expensive to compute and not scalable to large datasets. (2) It measures object similarity solely by link information. However, in image-rich information networks, object similarity can also be estimated by image content feature. To address the above two problems, an efficient approach called minimum order k-SimRank is proposed to significantly improve the speed of SimRank. 2.RELATED WORK Conducting information retrieval in such large imagerich information networks is a very useful but also very challenging task, because there exists a lot of information such as text, image feature, user, group, and most importantly the network structure. In text-based retrieval, estimating the similarity of the words in the context is useful for returning more relevant images. WordNet manually groups words into synonym sets, Google Distance [3] computes word similarity by co-occurrence in search results. Flickr Distance [4] considers visual relationship. In image contentbased retrieval, most methods (such as Google’s VisualRank [5]) and systems [6], [7], [8], [9], [10] compute image similarity based on image content features. Hybrid approach combine text features and image content features together [11], [12], [13], [14]. Most commercial image search engines use textual similarity to return semantically relevant images and then use visual similarity to search for visually relevant images. Integration-based approaches [12], [13], [14] use linear or nonlinear combination of the textual and visual features. However, existing works cannot handle the link structure. In this paper, we propose an image-rich information network model where the similarities between same type of nodes and different types of nodes can be better estimated based on the mutual impact under the network structure. 3.PROPOSED SYSTEM The proposed system has a four-layer architecture, as shown in Figure 3. The bottom layer contains an image warehouse, and the information extraction and image feature analysis engines. The lower intermediate layer builds image-rich information network and performs link and content based similarity ranking. The upper intermediate layer is the functional module layer, which implements the major function modules including the ranking information derived from the information network analysis. The top layer contains a user-friendly interface, which interacts with users, responds to their requests, and collects feedback. Volume 4, Issue 4, April 2015 Page 79 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 4, Issue 4, April 2015 ISSN 2319 - 4847 Figure 3. System Architecture 4.ALGORITHM USED In a homogeneous network, the SimRank similarity score between two objects o and o' is defined as, where C is a constant between 0 and 1, L(o) is the set of nodes which link to object o, and Li(o) is ith object in L(o). To compute SimRank in a network G of N nodes, the space required is O(N2) to store the similarity scores for all pairs of objects. Let P be the time required to compute Equation 1. The time complexity is O(N2P) for each iteration. Among several algorithms proposed for fast SimRank computation, the pruning approach discussed in [4] is adopted. For each object, we choose top K(K ‹‹ N) potentially similar objects as candidates and only compute SimRank for the candidates, thus reduce the time complexity to O(NKP) and space complexity to O(NK). This approach is called KSimRank. This strategy is suitable for image retrieval at large scales where most images are not similar to the query and there is no need to estimate their similarity again and again. In K-SimRank, the time complexity of P is O(│Lo││Lo' │log(K)), where log(K) is the complexity to decide whether Lj(o') is a candidate of object Li(o). A more efficient approach is used. Denote T(c) as the set of top K similar candidates of object c. Among L(o) and L(o'), denote Lmax as the one that has bigger cardinality and Lmin as the smaller one. The method works as follows, 1. Starting with Lmin, for each object c є Lmin, sum up the following scores. 2. If k < │Lmax│, for each object d є T(c), search within Lmax to decide whether d є Lmax. If yes, return the score; otherwise, return 0. 3. Otherwise, for each object d є Lmax, search within T(c) to decide whether d є T(c). If yes, return the score; otherwise, return 0. The above procedure reduces the time complexity of F to O(│Lmin│K log(│Lmax│)) (when K < │Lmax│) or O(│Lmin││Lmax│log(K)) (when K > │Lmax│), which is the optimal combination with the minimum cost achieved by automatically choosing the minimum optimal order of computation. This approach is called minimum order KSimRank. 5. SYSTEMS SCREENSHOTS 1.Context Based Search-SCBR In the screenshot 1 and 2 follows the SimRank Algorithm.We are enterin g the text it will search for relevant result (approx 100). If the user want more relevant result then user will click on image that image will acts as the feedback image and this image is an input image for the SCBIR algorithm it gives the relevant result . Accurate result will appear as the user want (approx 20) images. Volume 4, Issue 4, April 2015 Page 80 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 4, Issue 4, April 2015 ISSN 2319 - 4847 Screenshot 1. Results of SimRank Algorithm Screenshot 2. Select the single image as a feedback to the CBR Screenshot 3.CBR result related with feedback image Volume 4, Issue 4, April 2015 Page 81 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 4, Issue 4, April 2015 ISSN 2319 - 4847 2.Content Based Search-CBR Screenshot 4. Content Based Search Screenshot 5. Content Based Search Selecting image from database as Input Screenshot 6. CBR results related with the given input image Volume 4, Issue 4, April 2015 Page 82 International Journal of Application or Innovation in Engineering & Management (IJAIEM) Web Site: www.ijaiem.org Email: editor@ijaiem.org Volume 4, Issue 4, April 2015 ISSN 2319 - 4847 6. CONCLUSION In this paper a novel and efficient way of finding similar objects (such as photos and products) is presented by modeling major social sharing and e-commerce websites as image rich information networks. The algorithm minimum order KSimRank is proposed which efficiently computes weighted link-based similarity in weighted heterogeneous image-rich information networks. In future, under the concept of heterogeneous image rich information network, the study can be performed how such kind of network structure may benefit various image mining and computer vision tasks, such as image categorization, image segmentation, tag annotation, and collaborative filtering. REFERENCES [1] Xin Jin, Jiebo Luo, Jie Yu, Gang Wang, Dhiraj Joshi, and Jiawei Han, “Reinforced Similarity Integration in Image-Rich Information Networks”, IEEE Transactions On Knowledge And Data Engineering, February 2013. [2] G. Jeh and J.Widom. Simrank: a measure of structural-context similarity, International Conference on Knowledge discovery and data mining, 2002. [3] R.L. Cilibrasi and P.M.B. Vitanyi, “The Google Similarity Distance,” IEEE Trans. Knowledge and Data Eng., Mar. 2007. [4] L. Wu, X.-S. Hua, N. Yu, W.-Y. Ma, and S. Li, “Flickr Distance,” Proc. 16th ACM Int’L conf. Multimedia, 2008. [5] Y. Jing and S. Baluja, “VisualRank: Applying Pagerank to Large-Scale Image Search,” IEEE Trans. Pattern Analysis and Machine Intelligence, Nov. 2008. [6] R.C. Veltkamp and M. Tanase, “Content-Based Image Retrieval Systems: A Survey,” technical report, Dept. of Computing Science, Utrecht Univ., 2002. [7] H. Tamura and N. Yokoya, “Image Database Systems: A Survey,” Pattern Recognition, 1984. [8] W.I. Grosky, “Multimedia Information Systems,” IEEE Multi- Media, 1994. [9] V.N. Gudivada and V.V. Raghavan, “Content-Based Image Retrieval Systems,” Computer, Sept. 1995. [10] Y. Rui, T.S. Huang, and S.-F. Chang, “Image Retrieval: Current Techniques, Promising Directions, and Open Issues,” J. Visual Comm. and Image Representation, 1999. [11] R. Datta, D. Joshi, J. Li, and J.Z. Wang, “Image Retrieval: Ideas, Influences, and Trends of the New Age,” ACM Computing Surveys, Apr. 2008. [12] T. Deselaers and H. Mller, “Combining Textual- and Content- Based Image Retrieval, Tutorial,” Proc. 19th Int’l Conf. Pattern Recognition (ICPR ’08), http://thomas.deselaers.de/teaching/ files/tutorial_icpr08/04-combinatio n.pdf, 2008. [13] S. Sclaroff, M. La Cascia, and S. Sethi, “Unifying Textual and Visual Cues for Content-Based Image Retrieval on the World Wide Web,” Computing Vision Image Understanding, July 1999. [14] Z. Ye, X. Huang, Q. Hu, and H. Lin, “An Integrated Approach for Medical Image Retrieval through Combining Textual and Visual Features,” Proc. 10th Int’l Conf. Cross-Language Evaluation Forum: Multimedia Experiments, 2010. Volume 4, Issue 4, April 2015 Page 83