IJRISEpaper

advertisement
International Journal of Research In Science & Engineering
Volume: 1 Issue: 1
e-ISSN: 2394-8299
p-ISSN: 2394-8280
CLOUD SEARCH SERVICES:MULTI KEYWORD RANKING AND
SYNONYM SEARCH ON ENCRYPTED DATA
Nanjesh S1, Ancy Thomas2, Dr. Prashanth C M3
Post Graduate Student, Dept of CS&E, Sapthagiri College of Engineering, Bangalore. snanjesh@gmail.com
Mrs. Ancy Thomas, Asst. Prof , Dept of CS&E , SCE Bangalore, India. ancythomas@sapthagiri.edu.in
Dr. Prashanth C M, Prof & HOD, Dept of CS&E, SCE Bangalore, India. hodcse@sapthagiri.edu.in
ABSTRACT
Smart electronic devices make use of the cloud for easy and reliable storage. A versatile range of cloud services
are provided to the consumers with a premise that effectiveness and efficient cloud search service is achieved. A
consumer always eyes to get the most relevant data or products, achieving this in a pay-as-you use cloud
environment is a difficult task. When a user uploads the data(photos, mails, personal records) to the cloud, the
data is encrypted before sending to the cloud. Soon after encryption the data is outsourced to the cloud. Now, the
traditional keyword searching mechanisms prove to be useless to search the outsourced data. Therefore an
efficient and effective search technique to search on encrypted data proves to be crucial. It can be achieved using
Multi Keyword Ranking, wherein distinct keywords are picked up from a document. Later corresponding
synonyms for the picked keywords are given, so this will provide a much accurate result . Wordnet 3.0 is used to
automatically extract the synonyms for the picked up keywords. Now, even if the user tries searching using
common language or without the knowledge of keywords, he will be able to retrieve the appropriate information
correctly. After the synonyms are extracted, an index for the document is prepared. This index will contain the
document ID and the keywords present in that document. In order to rank the results achieved, Jaccard
Coefficient is used and we can display the search results according the coefficient value that has been calculated.
The user can now be able to search even on encrypted cloud data.
Keywords: Cloud computing, Multi keyword ranking, Synonym search.
----------------------------------------------------------------------------------------------------------------------------1. INTRODUCTION
In the last few years, the consumer-centric cloud computing paradigm emerged as the development of smart
electronic systems combined with emerging cloud computing technologies. Many smart phone manufacturing
companies have provided their users with specific cloud services like Apple has its own cloud named “iCloud”,
Samsung has “Drop Box” and Nokia has “One Drive”. These are consumer-centric clouds. They work on “pay-asyou use” cloud computing paradigm. Nowadays, even a common man is aware of cloud storage due to rapid
advancements in smart phone technology. As almost all smart phones have direct access to cloud storage provided
by the manufacturer, many users store sensitive personal data in the cloud. The data is encrypted before outsourcing
to cloud. As the data is in the encrypted form, traditional keyword search techniques are useless. Existing search
approaches on the encrypted cloud data support only the exact or fuzzy keyword search.
Many consumer electronic devices with support of high speed computing combined with the emerging
cloud computing paradigm provide a variety of services to the consumers. Cabarcos P.A. et al [1] had proposed a
novel middleware architecture. This had allowed sesions initiated on one device to seamlessly transfer to other in a
cloud computing environment. Similarly, Diaz-Sanchez D. Et al [2], Diaz proposed a new cloud computing
middleware Media Cloud, this could be used in set-top boxes for classifying, delivering, and searching media in a
home network and in the cloud. Seung G. L. Et al [3] proposed a personalized DTV program recommender system
under a cloud computing environment.
However, efficient and effective cloud search is to be promised on all these services before making them available
for the consumers. Finding the most relevant data or products is higly desirable in the “pay and use” cloud
computing paradigm.
IJRISE| www.ijrise.org|editor@ijrise.org
International Journal of Research In Science & Engineering
Volume: 1 Issue: 1
e-ISSN: 2394-8299
p-ISSN: 2394-8280
High quality applications and services from a shared pool of configuration computing resources are
provided on-demand for consumers by the consumer-centric cloud computing[4]. Here there are some problems the
users can face. There might be an unauthorized operation on the outsourced data as the Cloud Service Provider
(CSP) has full control on these data . This is why the outsourced data is encrypted by the user before sending it to
the cloud. But encrypted data makes the traditional keyword search useless. All the authorized cloud consumers will
certainly hope to search their interested data rather than all the data hence this simple and awkward method of
downloading all the data and decrypting it locally is obviously impractical.
Multi keyword searching and other sophisticated searching mechanisms like ranked searching are not
supported by the current searching mechanisms . Most desired result can be obtained by the customer using ranked
search. The most desired data is fetched back by the ranked search mechanism which reduces the network traffic.
Multi-Keyword search gives accurate result superseding the single keyword search which most often gives coarse
result .It is possible that the cloud customer might give a search input which is often a synonym of the actual
keyword . Existing searchable encryption scheme support only exact or fuzzy keyword search.
A practically efficient and flexible searchable scheme is required which can support both synonym-based
search and multi-keyword ranked search. Vector Space Model (VSM) [6] can be addressed using multi-keyword
search. Every document is presented as a vector in which each dimension value is treated as the Term Frequency
(TF) weight of the actual keyword. A complete new vector is generated in every phase of query. The generated
vector thereby has an equivalent dimension value which is the Inverse Document Frequency (IDF) weight. Search
efficiency can be improved by using a Tree based index structure (balance binary tree) . Term Frequency Inverse
Document Frequency (TFIDF) can be incorporated to enhance semantic feature extraction method E-TFIDF. The ETFIDF algorithm can be used to extract the most representative keywords from the outsourced text documents, this
can improve the accuracy of search results.
1.1. Related work
A. Searchable Encryption
As the data stored in the cloud is in the encrypted form for the safety purpose, searching such data using the
traditional keyword search method will be impossible. So, a searchable encryption scheme provides a way to
“encrypt“ a search index so that its content are hidden except to a party that is given appropriate tokens. Consider a
search index generated over a collection of files (this could be full text index or just a keyword index). Using a
searchable encryption scheme, the index is encrypted in such a way that, given a token for a keyword can retrieve
pointers to the encrypted files that contain the keyword
B. Symmetric Searchable Encryption (SSE)
SSE is appropriate in any setting where the party which searches over the data is also the one who generates it. It
is also called as Single writer/Single reader (SWSR)
C. Asymmetric Searchable Encryption (ASE)
Appropriate in any setting where the party searching over the data is different from the party that generates it.
Also called as Many Writer/Single Reader [MWSR].
The disadvantage of the Searchable encryption is that it supports only exact keyword search and is generally not
suitable for cloud computing environment.
A few Techniques for searching over encrypted data are given below:
IJRISE| www.ijrise.org|editor@ijrise.org
International Journal of Research In Science & Engineering
Volume: 1 Issue: 1
METHODS
Symmetric key
Cryptography
DESCRIPTION
e-ISSN: 2394-8299
p-ISSN: 2394-8280
REMARKS
Each word in a file encrypted using 2
layered encryption function
Computational complexity since 2
layered is used. Single keyword search
Uses 4 algorithms KeyGen,PEKS,
Trapdoor & test for searching
Fails regarding access policy and
dictionary attacks. Supports only
comparison and subset queries. Single
keyword search
Hidden vector
encryption
Supports continuative queries search.
Setup,Encrypt, GenToken, Query are the 4
algorithms used
Fails for disjunctive queries, Doesn’t
support one upload many download
policy. Supports only single keyword
search
Attribute based
encryption
Follows one upload many download policy
which is not supported in HVE and PEKS
Provides best quality for searching over
encrypted data and faster in accessing
but doesn’t support synonym search
Privacy preserving
keyword search
Multi-round protocol between server & user
on single keyword. Each document contains
an index, Keyword index is encrypted using
pseudorandom bits.
Fails when multiple keywords are used.
Synonym search is not supported.
Authorized private
keyword search
Deals with multi keyword search
Doesn’t prevent keyword attack. Fails to
detect synonym words.
Public encryption
keyword search
Table-1: A survey on Encrypted data access techniques
2.METHODOLOGY
Searching the encrypted data on cloud can be done in the following steps:





Document Encryption
Feature/Keyword Extraction
Document Index Generator
Document Index Encryption
Multi Keyword Ranking
IJRISE| www.ijrise.org|editor@ijrise.org
International Journal of Research In Science & Engineering
Volume: 1 Issue: 1
e-ISSN: 2394-8299
p-ISSN: 2394-8280
Document Encryption: The documents to be uploaded on to the cloud are first selected. Later, in order to keep them
secure when outsourced, they are encrypted. Advanced Encryption Standard[AES] method is used for encryption.
Feature/Keyword Extraction: Distinct keywords are picked up from the document by finding the Term Frequencies
[TF] and the Inverse Document Frequency[IDF]. In order to avoid common words, Stop list is used.
Document Index Generator: Once the distinct keywords are picked from all the documents, they are placed in a
separate page called index based on their TFIDF value[Confidence value]. Suitable synonyms for the picked
keywords are given.
Document Index Encryption: After the document index is generated, it is also encrypted using suitable encryption
method.
Multi Keyword Ranking: When the user queries, multi keyword ranking search technique is used to rank the top k
results.
Fig-1: System architecture
The figure 1 shows the detailed architecture of the system. A dataset may contain about 1000 documents in it. This
documents are encrypted using the Advanced Standard Encryption [AES]. In the mean time, the Term
Frequency[TF] for each word is found out and keywords are extracted form the document
TF =
Number of occurrence of the term
Total number of terms in the document
IJRISE| www.ijrise.org|editor@ijrise.org
International Journal of Research In Science & Engineering
Volume: 1 Issue: 1
e-ISSN: 2394-8299
p-ISSN: 2394-8280
Later, Inverse Document Frequency[IDF] is calculated as
IDF =
Total number of documents
Number of documents in which the term has occurred
Now, TF*IDF gives the confidence value, based on this a index is prepared. Synonyms for the keywords in the
index is given. This index is encrypted using AES encryption and is stored in the cloud storage along with the
encrypted document.In the Secure query processing stage, user queries for the document he/she wishes. A query
index is generated. This query index is encrypted and a search is performed using Multi keyword Ranking search.
The top k results are shown to the user, the user can later decrypt the one he/she was looking for.
3. CONCLUSION
Based on the number of documents uploaded, a number of keywords are picked up, later the confidence value is
computed. The highest confidence value corresponds to the most desirable keyword
Table 1: Test cases
Figure 2:Keywords picked
Based on the table 1,various operations were performed such as variable number of documents were uploaded to the
cloud, various keywords/features were extracted and different confidence value was obtained. The results obtained
are shown above. The fig 2 shows the different distinct keywords which are picked and the different confidence
value recorded.
Conclusion Remarks: It has been observed that providing by the use of synonyms tagged to keywords we can get
more accurate results. But at the same time, manually giving synonyms for each unique keyword picked is tedious
and involves a lot of time.
ACKNOWLEDGEMENT
I am thankful to Mrs.Ancy Thomas, Assistant Professor, Dept of CS&E for her advice and support towards writing
the paper. I express my deep gratitude towards Dr. Prashanth C M, Head of Department (CS&E)who gave me his
valuable suggestions and directions to proceed at every stage. I would also thank all the anonymous referees for their
reviews that significantly improved the presentation of this paper. I take this oopportunity to express my sincere
thanks to all staff members of CS&E department of SCE for the valuable suggestion.
IJRISE| www.ijrise.org|editor@ijrise.org
International Journal of Research In Science & Engineering
Volume: 1 Issue: 1
e-ISSN: 2394-8299
p-ISSN: 2394-8280
REFERENCES
[1] P.A. Cabarcos, F.A. Mendoza, R.S. Guerrero, A.M. Lopez, and D. Diaz-Sanchez, “SuSSo: seamless and
ubiquitous single sign-on for cloud service continuity across devices,” IEEE Trans. Consumer Electron.,vol. 58, no.
4, pp. 1425-1433, 2012.
[2] D. Diaz-Sanchez, F. Almenarez, A. Marin, D. Proserpio, and P.A. Cabarcos,“Media cloud: an open cloud
computing middleware for content management,” IEEE Trans. Consumer Electron., vol. 57, no. 2, pp. 970-978,
2011.
[3] S. G. Lee, D. Lee, and S. Lee, “Personalized DTV program recommendation system under a cloud computing
environment,” IEEE Trans. Consumer Electron., vol. 56, no. 2, pp. 1034-1042, 2010.
[4] L. M. Vaquero, L. Rodero-Merino, J. Caceres, and M. Lindner, “A break in the clouds: towards a cloud
definition,” ACM SIGCOMM Comput. Commun. Rev., vol. 39, no. 1, pp. 50-55, 2009.
[5] S. Kamara, and K. Lauter, “Cryptographic cloud storage,” FC 2010 Workshops, LNCS 6054, PP. 136-149, Jan.
2010.
[6] S. Evangeline Sharon Et al in “Keyword based search over encrypted data”
[7] J. Li, Q. Wang, C. Wang, N. Cao, K. Ren, and W. Lou, “Fuzzy keyword search over encrypted data in cloud
computing,” Proceedings of IEEE INFOCOM’10 Mini-Conference, San Diego, CA, USA, pp. 1-5, Mar. 2010.
[8] N. Cao, C. Wang, M. Li, K. Ren, and W. Lou, “Privacy-preserving multikeyword
ranked search over encrypted cloud data,” Proceedings of IEEE INFOCOM 2011, pp. 829-837, 2011.
[9] ] C. Wang, N. Cao, J. Li, K. Ren, and W. Lou, “Secure ranked keyword search over encrypted cloud data,”
Proceedings of IEEE 30th International Conference on Distributed Computing Systems (ICDCS), pp. 253-262,
2010.
IJRISE| www.ijrise.org|editor@ijrise.org
Download