International Journal of Research In Science & Engineering Volume: 1 Issue: 1 e-ISSN: 2394-8299 p-ISSN: 2394-8280 CLOUD SEARCH SERVICES:MULTI KEYWORD RANKING AND SYNONYM SEARCH ON ENCRYPTED DATA Nanjesh S1, Ancy Thomas2, Dr. Prashanth C M3 Post Graduate Student, Dept of CS&E, Sapthagiri College of Engineering, Bangalore. snanjesh@gmail.com Mrs. Ancy Thomas, Asst. Prof , Dept of CS&E , SCE Bangalore, India. ancythomas@sapthagiri.edu.in Dr. Prashanth C M, Prof & HOD, Dept of CS&E, SCE Bangalore, India. hodcse@sapthagiri.edu.in ABSTRACT Smart electronic devices make use of the cloud for easy and reliable storage. A versatile range of cloud services are provided to the consumers with a premise that effectiveness and efficient cloud search service is achieved. A consumer always eyes to get the most relevant data or products, achieving this in a pay-as-you use cloud environment is a difficult task. When a user uploads the data(photos, mails, personal records) to the cloud, the data is encrypted before sending to the cloud. Soon after encryption the data is outsourced to the cloud. Now, the traditional keyword searching mechanisms prove to be useless to search the outsourced data. Therefore an efficient and effective search technique to search on encrypted data proves to be crucial. It can be achieved using Multi Keyword Ranking, wherein distinct keywords are picked up from a document. Later corresponding synonyms for the picked keywords are given, so this will provide a much accurate result . Wordnet 3.0 is used to automatically extract the synonyms for the picked up keywords. Now, even if the user tries searching using common language or without the knowledge of keywords, he will be able to retrieve the appropriate information correctly. After the synonyms are extracted, an index for the document is prepared. This index will contain the document ID and the keywords present in that document. In order to rank the results achieved, Jaccard Coefficient is used and we can display the search results according the coefficient value that has been calculated. The user can now be able to search even on encrypted cloud data. Keywords: Cloud computing, Multi keyword ranking, Synonym search. ----------------------------------------------------------------------------------------------------------------------------1. INTRODUCTION In the last few years, the consumer-centric cloud computing paradigm emerged as the development of smart electronic systems combined with emerging cloud computing technologies. Many smart phone manufacturing companies have provided their users with specific cloud services like Apple has its own cloud named “iCloud”, Samsung has “Drop Box” and Nokia has “One Drive”. These are consumer-centric clouds. They work on “pay-asyou use” cloud computing paradigm. Nowadays, even a common man is aware of cloud storage due to rapid advancements in smart phone technology. As almost all smart phones have direct access to cloud storage provided by the manufacturer, many users store sensitive personal data in the cloud. The data is encrypted before outsourcing to cloud. As the data is in the encrypted form, traditional keyword search techniques are useless. Existing search approaches on the encrypted cloud data support only the exact or fuzzy keyword search. Many consumer electronic devices with support of high speed computing combined with the emerging cloud computing paradigm provide a variety of services to the consumers. Cabarcos P.A. et al [1] had proposed a novel middleware architecture. This had allowed sesions initiated on one device to seamlessly transfer to other in a cloud computing environment. Similarly, Diaz-Sanchez D. Et al [2], Diaz proposed a new cloud computing middleware Media Cloud, this could be used in set-top boxes for classifying, delivering, and searching media in a home network and in the cloud. Seung G. L. Et al [3] proposed a personalized DTV program recommender system under a cloud computing environment. However, efficient and effective cloud search is to be promised on all these services before making them available for the consumers. Finding the most relevant data or products is higly desirable in the “pay and use” cloud computing paradigm. IJRISE| www.ijrise.org|editor@ijrise.org International Journal of Research In Science & Engineering Volume: 1 Issue: 1 e-ISSN: 2394-8299 p-ISSN: 2394-8280 High quality applications and services from a shared pool of configuration computing resources are provided on-demand for consumers by the consumer-centric cloud computing[4]. Here there are some problems the users can face. There might be an unauthorized operation on the outsourced data as the Cloud Service Provider (CSP) has full control on these data . This is why the outsourced data is encrypted by the user before sending it to the cloud. But encrypted data makes the traditional keyword search useless. All the authorized cloud consumers will certainly hope to search their interested data rather than all the data hence this simple and awkward method of downloading all the data and decrypting it locally is obviously impractical. Multi keyword searching and other sophisticated searching mechanisms like ranked searching are not supported by the current searching mechanisms . Most desired result can be obtained by the customer using ranked search. The most desired data is fetched back by the ranked search mechanism which reduces the network traffic. Multi-Keyword search gives accurate result superseding the single keyword search which most often gives coarse result .It is possible that the cloud customer might give a search input which is often a synonym of the actual keyword . Existing searchable encryption scheme support only exact or fuzzy keyword search. A practically efficient and flexible searchable scheme is required which can support both synonym-based search and multi-keyword ranked search. Vector Space Model (VSM) [6] can be addressed using multi-keyword search. Every document is presented as a vector in which each dimension value is treated as the Term Frequency (TF) weight of the actual keyword. A complete new vector is generated in every phase of query. The generated vector thereby has an equivalent dimension value which is the Inverse Document Frequency (IDF) weight. Search efficiency can be improved by using a Tree based index structure (balance binary tree) . Term Frequency Inverse Document Frequency (TFIDF) can be incorporated to enhance semantic feature extraction method E-TFIDF. The ETFIDF algorithm can be used to extract the most representative keywords from the outsourced text documents, this can improve the accuracy of search results. 1.1. Related work A. Searchable Encryption As the data stored in the cloud is in the encrypted form for the safety purpose, searching such data using the traditional keyword search method will be impossible. So, a searchable encryption scheme provides a way to “encrypt“ a search index so that its content are hidden except to a party that is given appropriate tokens. Consider a search index generated over a collection of files (this could be full text index or just a keyword index). Using a searchable encryption scheme, the index is encrypted in such a way that, given a token for a keyword can retrieve pointers to the encrypted files that contain the keyword B. Symmetric Searchable Encryption (SSE) SSE is appropriate in any setting where the party which searches over the data is also the one who generates it. It is also called as Single writer/Single reader (SWSR) C. Asymmetric Searchable Encryption (ASE) Appropriate in any setting where the party searching over the data is different from the party that generates it. Also called as Many Writer/Single Reader [MWSR]. The disadvantage of the Searchable encryption is that it supports only exact keyword search and is generally not suitable for cloud computing environment. A few Techniques for searching over encrypted data are given below: IJRISE| www.ijrise.org|editor@ijrise.org International Journal of Research In Science & Engineering Volume: 1 Issue: 1 METHODS Symmetric key Cryptography DESCRIPTION e-ISSN: 2394-8299 p-ISSN: 2394-8280 REMARKS Each word in a file encrypted using 2 layered encryption function Computational complexity since 2 layered is used. Single keyword search Uses 4 algorithms KeyGen,PEKS, Trapdoor & test for searching Fails regarding access policy and dictionary attacks. Supports only comparison and subset queries. Single keyword search Hidden vector encryption Supports continuative queries search. Setup,Encrypt, GenToken, Query are the 4 algorithms used Fails for disjunctive queries, Doesn’t support one upload many download policy. Supports only single keyword search Attribute based encryption Follows one upload many download policy which is not supported in HVE and PEKS Provides best quality for searching over encrypted data and faster in accessing but doesn’t support synonym search Privacy preserving keyword search Multi-round protocol between server & user on single keyword. Each document contains an index, Keyword index is encrypted using pseudorandom bits. Fails when multiple keywords are used. Synonym search is not supported. Authorized private keyword search Deals with multi keyword search Doesn’t prevent keyword attack. Fails to detect synonym words. Public encryption keyword search Table-1: A survey on Encrypted data access techniques 2.METHODOLOGY Searching the encrypted data on cloud can be done in the following steps: Document Encryption Feature/Keyword Extraction Document Index Generator Document Index Encryption Multi Keyword Ranking IJRISE| www.ijrise.org|editor@ijrise.org International Journal of Research In Science & Engineering Volume: 1 Issue: 1 e-ISSN: 2394-8299 p-ISSN: 2394-8280 Document Encryption: The documents to be uploaded on to the cloud are first selected. Later, in order to keep them secure when outsourced, they are encrypted. Advanced Encryption Standard[AES] method is used for encryption. Feature/Keyword Extraction: Distinct keywords are picked up from the document by finding the Term Frequencies [TF] and the Inverse Document Frequency[IDF]. In order to avoid common words, Stop list is used. Document Index Generator: Once the distinct keywords are picked from all the documents, they are placed in a separate page called index based on their TFIDF value[Confidence value]. Suitable synonyms for the picked keywords are given. Document Index Encryption: After the document index is generated, it is also encrypted using suitable encryption method. Multi Keyword Ranking: When the user queries, multi keyword ranking search technique is used to rank the top k results. Fig-1: System architecture The figure 1 shows the detailed architecture of the system. A dataset may contain about 1000 documents in it. This documents are encrypted using the Advanced Standard Encryption [AES]. In the mean time, the Term Frequency[TF] for each word is found out and keywords are extracted form the document TF = Number of occurrence of the term Total number of terms in the document IJRISE| www.ijrise.org|editor@ijrise.org International Journal of Research In Science & Engineering Volume: 1 Issue: 1 e-ISSN: 2394-8299 p-ISSN: 2394-8280 Later, Inverse Document Frequency[IDF] is calculated as IDF = Total number of documents Number of documents in which the term has occurred Now, TF*IDF gives the confidence value, based on this a index is prepared. Synonyms for the keywords in the index is given. This index is encrypted using AES encryption and is stored in the cloud storage along with the encrypted document.In the Secure query processing stage, user queries for the document he/she wishes. A query index is generated. This query index is encrypted and a search is performed using Multi keyword Ranking search. The top k results are shown to the user, the user can later decrypt the one he/she was looking for. 3. CONCLUSION Based on the number of documents uploaded, a number of keywords are picked up, later the confidence value is computed. The highest confidence value corresponds to the most desirable keyword Table 1: Test cases Figure 2:Keywords picked Based on the table 1,various operations were performed such as variable number of documents were uploaded to the cloud, various keywords/features were extracted and different confidence value was obtained. The results obtained are shown above. The fig 2 shows the different distinct keywords which are picked and the different confidence value recorded. Conclusion Remarks: It has been observed that providing by the use of synonyms tagged to keywords we can get more accurate results. But at the same time, manually giving synonyms for each unique keyword picked is tedious and involves a lot of time. ACKNOWLEDGEMENT I am thankful to Mrs.Ancy Thomas, Assistant Professor, Dept of CS&E for her advice and support towards writing the paper. I express my deep gratitude towards Dr. Prashanth C M, Head of Department (CS&E)who gave me his valuable suggestions and directions to proceed at every stage. I would also thank all the anonymous referees for their reviews that significantly improved the presentation of this paper. I take this oopportunity to express my sincere thanks to all staff members of CS&E department of SCE for the valuable suggestion. IJRISE| www.ijrise.org|editor@ijrise.org International Journal of Research In Science & Engineering Volume: 1 Issue: 1 e-ISSN: 2394-8299 p-ISSN: 2394-8280 REFERENCES [1] P.A. Cabarcos, F.A. Mendoza, R.S. Guerrero, A.M. Lopez, and D. Diaz-Sanchez, “SuSSo: seamless and ubiquitous single sign-on for cloud service continuity across devices,” IEEE Trans. Consumer Electron.,vol. 58, no. 4, pp. 1425-1433, 2012. [2] D. Diaz-Sanchez, F. Almenarez, A. Marin, D. Proserpio, and P.A. Cabarcos,“Media cloud: an open cloud computing middleware for content management,” IEEE Trans. Consumer Electron., vol. 57, no. 2, pp. 970-978, 2011. [3] S. G. Lee, D. Lee, and S. Lee, “Personalized DTV program recommendation system under a cloud computing environment,” IEEE Trans. Consumer Electron., vol. 56, no. 2, pp. 1034-1042, 2010. [4] L. M. Vaquero, L. Rodero-Merino, J. Caceres, and M. Lindner, “A break in the clouds: towards a cloud definition,” ACM SIGCOMM Comput. Commun. Rev., vol. 39, no. 1, pp. 50-55, 2009. [5] S. Kamara, and K. Lauter, “Cryptographic cloud storage,” FC 2010 Workshops, LNCS 6054, PP. 136-149, Jan. 2010. [6] S. Evangeline Sharon Et al in “Keyword based search over encrypted data” [7] J. Li, Q. Wang, C. Wang, N. Cao, K. Ren, and W. Lou, “Fuzzy keyword search over encrypted data in cloud computing,” Proceedings of IEEE INFOCOM’10 Mini-Conference, San Diego, CA, USA, pp. 1-5, Mar. 2010. [8] N. Cao, C. Wang, M. Li, K. Ren, and W. Lou, “Privacy-preserving multikeyword ranked search over encrypted cloud data,” Proceedings of IEEE INFOCOM 2011, pp. 829-837, 2011. [9] ] C. Wang, N. Cao, J. Li, K. Ren, and W. Lou, “Secure ranked keyword search over encrypted cloud data,” Proceedings of IEEE 30th International Conference on Distributed Computing Systems (ICDCS), pp. 253-262, 2010. IJRISE| www.ijrise.org|editor@ijrise.org