An Efficient and Multi Keyword Search in Service Oriented Architecture

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 17 Number 2 – Nov 2014
An Efficient and Multi Keyword Search in Service
Oriented Architecture
R. Venkataramana1, M.V.Satya NagaRaju2
2
1
M.Tech Student, Associate Professor
1,2
Department of CSE,GIITS Engineering College, Visakhapatnam, Andhra Pradesh, India.
Abstract:Searching top k multi keywords from the out
sourced data files is still an interesting research issue
because out sourced data over cloud can be encrypted for
confidentiality .In this paper we are proposing an efficient
top k retrieval from out sourced file through service
oriented application by computing the file relevance score
for input multi keywords and symmetric key encryption
and every manipulation comes from the server end instead
of client end every the ranking of the documents based on
file relevance scores.
I.INTRODUCTION
In could computing, the cloud services enable
users to store the data in cloud server. It also enables share
their data with other users also. Based on the cloud services
more amount of information is centralized into cloud
servers. To secure the data privacy, the data is encrypted
and store in cloud servers. By using encryption techniques
cloud services providing confidentiality and privacy in
gives assurance to data in cloud. After storing the
encrypted data in cloud increases the usage of cloud
services. But it also a problem that more number of users is
using the cloud services for utilizing the outsourced data.
Increasing the scalability of the users and the more
data exchanging the data there is increasing of malicious
users also. So then cloud services introduced authentication
before utilizing the services. After authentication of users
only retrieve the information from the cloud, this because
to limit the malicious users. Then search researches
introduced keyword based search that is most popular ways
that retrieve information which matches to given keyword
only. But this type of searching restricts the searching in
encrypted data in cloud. Then researchers focused on
encrypted data searching techniques. In these methods
allows keywords to search over encrypted data. There are
some techniques such as Boolean keyword search which
extract the files relevant to the searched query. This
technique applied directly in cloud services. But it has
some limitations for every search users have to wait for
pre-processing to get the query result.
Based on the previous work there is an establishment of the
novel techniques based on ranking. In this rank based
searching techniques rank order the relevance file in
response of the search query. This technique applied on
ISSN: 2231-5381
plain text searching inly. Then researchers work for
searchable encryption system which allows the secured
ranked search. Searchable encryption achieved best results
in retrieving the encryption data in cloud. Then next main
concept is user’s access and search patterns that are hiding
whole data but shows encapsulated data only.
In latest researches there is some confusion related to three
different methods for searching with privacy: searching on
private-key encrypted data which is the subject of this work
searching on public-key encrypted data and private
information retrieval. The common property to all three
models is a server sometimes called the database that stores
data and user that wish to access or search or manipulate
the data while revealing as little as possible to the server.
There are, however, important differences between these
three settings they are searching on private key, searching
on public key and private information retrieval.
For avoiding the cloud service from including in ranking
and believe in all the work to the user is a normal way to
prevent data leakage. Any way the limited calculative
power on the user side and more computational complexity
precludes information security. The situation of secure
multi-keyword topk retrieval on encrypted cloud data thus
leads to how to make the cloud do more work during the
process of retrieval without information leakage.
II. RELATED WORK
Homomorphic encryption is type of encryption that allows
particular types of calculations to worked on cipher text. It
results an encrypted text and at the time of decryption it
matches the output of the operations on the plain text. This
is new feature in modern communication system topology.
In this encryption it allows the linking together of various
services without showing the data to each of these services.
There are many partial homomorphic cryptographic
techniques but those are low efficient methods.
Here some examples shown below:
The message is x and encrypted message is E(X).
Unpadded RSA:
Consider modulo m and exponent e and then the cipher is
given by
http://www.ijettjournal.org
Page 71
International Journal of Engineering Trends and Technology (IJETT) – Volume 17 Number 2 – Nov 2014
E(x)=xe mod m. The homomorphic property is
years, it was unclear whether fully homomorphic
encryption was even possible.
E(x1).E(x2)=x1e x2e mod m
Elgamal:
In this Group g and the public key id(g,q,g,h) and h=gx and
x is the private key
then the cipher message is
E(x)=(gr,m.hr) for different values r belongs to {0,1,….,m1}.
For this the homomorphic property is E(b1).E(b2)=xb1
r12xb2r22=E(b1 Ex-OR b2)
Benaloh:
In this modulo is m and the base g with a block length c
then the cipher is E()=gxrc mod m for random values r
belongs to {0,1,…..m-1}
Then
the
homomorphic
E(x1).E(x2)=(gx1r1c)(gx2r2c)
property
is
Paillier:
In this the modulus is m and base is g then the cipher is
E(x)=gxrm mod m2 r belongs to {0,1,….m-1}. Then for this
the homomorphic property is E(x1).E(x2)=(gx1 r1m).(gx2 r2m)
Every example shown above allows the homomorphic
calculation og single operation on plaintexts. A
cryptographic technique allows both addition and
multiplication operations are referred as homomorphic
encryption and these are very strong and secure. By this
methods any circuit will solve and allows to construct
efficient programs which may be execute their encryption
technique. The presence of fully homomorphic encryption
have great practical situations in outsourcing of secret
calculations in the case of cloud computing.
The homomorphic part of a fully homomorphic encryption
scheme can also be described in terms of category theory.
If C is the category whose objects are integers (i.e., finite
streams of data) and whose morphisms include addition
and multiplication, then the encryption operation of a fully
homomorphic encryption scheme C. The categorical
approach allows for a generalization beyond the ring
structure (composition of addition and multiplication) of
the integers. If the morphisms of some wide super category
of C include the primitive recursive functions or even all
computable functions, then any encryption operation which
qualifies as an end of this super category is "more fully"
homomorphic since additional operations on encrypted data
(for example conditionals and loops) are possible. The
utility of fully homomorphic encryption has been long
recognized. The problem of constructing such a scheme
was first proposed within a year of the development of
RSA. A solution proved more elusive; for more than 30
ISSN: 2231-5381
Craig Gentry using lattice-based cryptography showed the
first fully homomorphic encryption scheme as announced
by IBM on June 25, 2009.His scheme supports evaluations
of arbitrary depth circuits. His construction starts from a
somewhat homomorphic encryption scheme using ideal
lattices that is limited to evaluating low-degree
polynomials over encrypted data. (It is limited because
each cipher text is noisy in some sense, and this noise
grows as one adds and multiplies cipher texts, until
ultimately the noise makes the resulting cipher text
indecipherable.) He then shows how to modify this scheme
to make it boots-trappable—in particular, he shows that by
modifying the somewhat homomorphic scheme slightly, it
can actually evaluate its own decryption circuit, a selfreferential property. Finally, he shows that any bootstrappable somewhat homomorphic encryption scheme can
be converted into a fully homomorphic encryption through
a recursive self-embedding. In the particular case of
Gentry's ideal-lattice-based somewhat homomorphic
scheme, this bootstrapping procedure effectively
"refreshes" the cipher text by reducing its associated noise
so that it can be used thereafter in more additions and
multiplications without resulting in an indecipherable
cipher text. Gentry based the security of his scheme on the
assumed hardness of two problems: certain worst-case
problems over ideal lattices, and the sparse (or low-weight)
subset sum problem.
Regarding performance, cipher texts in Gentry's
scheme remain compact insofar as their lengths do not
depend at all on the complexity of the function that is
evaluated over the encrypted data. The computational time
only depends linearly on the number of operations
performed. However, the scheme is impractical for many
applications, because cipher text size and computation time
increase sharply as one increases the security level. To
obtain 2k security against known attacks, the computation
time and cipher text size are high-degree polynomials in k.
III. PROPOSED WORK
In this current research work we have proposed an
empirical model of multi keyword secure search with
simple and efficient technique. Data owner (DO) reads the
documents to uploads into server and preprocess the
documents to remove unnecessary or irrelevant
information from the documents and extracts each
individual keyword and applies cryptographic algorithm
over extracted keywords and computes the occurrences of
the keywords i.e. Term frequency and finally, this basic
index table can be uploaded to server then user searches for
specific query and retrieves rank oriented results from
cloud service provider.
http://www.ijettjournal.org
Page 72
International Journal of Engineering Trends and Technology (IJETT) – Volume 17 Number 2 – Nov 2014
In our proposed technique DO uploads the data or
files to the server , before out sourcing the data at server ,
DO preprocess the documents .DO has a set of data files C
= (F1; F2….,Fn) ,Initially it can be preprocessed by
eliminating the unnecessary and irrelevant keywords from
the files and generates a base index table by extracting the
unique keywords W = (w1;w2; :::;wm) from the files and
encrypts them with IDEA algorithm and those cipher
keywords and frequency of the cipher keyword can be
maintained and uploaded to service provider database.
3. Compute term frequency (TF) (i.e. number of
occurrences of a keyword in a document) and inverse
document frequency (IDF) (i.e number occurrence of a
keyword in all the documents).
4. Generate base index table (Index table) and upload files to
server.
Cloud is a pay and use service ,SO DO does not
know the actual data drives are there and how the data
transferring to server. Data confidentiality and integrity are
prime concern while storing data over cloud and end user
expects relevant and interesting resultsfrom the server.
Base index table contained cipher keyword, frequency of
keyword and field. Base index table can be generated with
three attributes along with extracted file and uploads to the
cloud server. The following table shows sample baseindex
table as follows.
Algorithm for Base Index table generation
1. Read thedocument or file
2. Preprocess the document
3. Extract distinct keywords and convert to cipher keyword
with IDEA algorithm
Keyword
Cipher Keyword
Term Frequency
File ID
Mobile
Apple
Elephant
Paper
Figure 1: Base Table
$%^&*(
*(!!~*^%
##||%^$%&
$%^$%^
4
3
1
2
Abc.html
Hello.docx
Hello.docx
Main.txt
Multi Keyword Ranking:
In real time environment we cannot expect or assume the
input query is always a simple and single keyword.some
queries may contains multiple keywords. End user requests
to the service provider with input query, Service provider
process the request and compute file relevance score or
document weights in terms of terms of frequency of the
keywords and forwards results based on the order or
weight of the documents
Fscore= TF * IDF
The above computation shows the file relevance score
which is computed by the term frequency and inverse
document frequency.
Web service:
Web service is one of technology to create
SOA (service oriented architecture) with three tier
architecture, it minimizes duplication of operations by
maintain the business logic at specific one location
(centralized server). The main goal of the service oriented
ISSN: 2231-5381
architecture is language interoperability (i.e. any standard
language can communicate with other language even
though both are different languages) and minimizes the
damage chances from client end.
Data Cache is a mechanism which increases the
performance from user end and reduces over head from
server end and stores frequently access results for future
retrieval when user requested for same input query it
reduces execution time i.e. (round trip over the input
request and responsetime from server during the user input
query can be minimized in terms of time complexity and
minimizes additional overhead on server to process the
same input keyword. If any user request with same input
query which is requested before, query
need not to
process by server again and no need of a round trip ,
because previoussearch results retrieved from the web
server before forwarded to user and it can be stored in data
cache ,nextsearch onwards input query results retrieved
from cachestorage instead of web server.
http://www.ijettjournal.org
Page 73
International Journal of Engineering Trends and Technology (IJETT) – Volume 17 Number 2 – Nov 2014
Database
Business
Logic
Wsdl with Soap protocol
UI (VB.Net)
UI (Java)
Search Implementation:
UI (Android)
Step5: Service extracts query based information from base
index table
User should request to DO, for key which is required for
secure search, after successful receive of key from data
owner, user authenticates with credentials of user id and
key. An authorized user can search by decrypting the
cipher keyword and compute overall term frequency and
inverse document fr4equencies with respect to input
keyword for file relevance score.
Files can be retrieved based on the our novel file relevance
scores
Step6: Computes file relevance score or document weights
based on frequency of keywords
File_relevance_Scores[j] = Convert.ToDecimal((1 /
termsinfile[j]) * (1 + Math. Log(termfreqs[j])) * Math.
Log(1 + (file count / numberoffiles)));
Step7: Return search results based on decreasing order of
their file relevance score of the documents.
Step1: End User requests for key from Data owner
Step2: End user receives secure key for encryption of
search keyword at server.
Step3: User searches with input query (single or multiple
keywords) and key
Step4: Server authorize the user with credentials and
process input query
ISSN: 2231-5381
.
http://www.ijettjournal.org
Page 74
International Journal of Engineering Trends and Technology (IJETT) – Volume 17 Number 2 – Nov 2014
Data Owner
Base Index table
Server
Rank Oriented Results
Request Key
Key
Query
Cache
Implementation
User
Figure 3: Architecture
By the above approach only authorized user can
search and prevents the unauthorized access from personal
and private search engines and maintains data integrity and
confidentiality.
IV.CONCLUSION
We have concluded our current research with
efficient and secure multi- keyword search implementation.
Base index table reduces the time complexity by
preprocesseddocuments , Ranking gives top search results
for input query through file relevance score ,web service
maintains the language interoperability .our experimental
results shows secure and efficient search implementation
than the traditional approach.
REFERENCES
1. ARMBRUST, M., AND ET AL. Above the clouds: A Berkeley viewof
cloud
computing.
Tech.
Rep.
UCB/EECS-2009-28,
EECS
Department,U.C. Berkeley, Feb 2009.
[2] M. Arrington, “Gmail disaster: Reports of mass email
deletions,”http://www.techcrunch.com/2006/12/28/gmail-disasterreportsof-mass-email-deletions/,December 2006.
[3] Amazon.com, “Amazon s3 availability event: July 20,
2008,”http://status.aws.amazon.com/s3-20080720.html, 2008.
[4] RAWA News, “Massive information leak shakes Washington over
Afghan
war,”http://www.rawa.org/temp/runews/2010/08/20/massiveinformation-leak-shakeswashington-over-afghan-war.html, 2010
[5] AHN, “Romney hits Obama for security information
leakage,”http://gantdaily.com/2012/07/25/romney-hits-obama-forsecurity- information-leakage/,2012
[6] Cloud Security Alliance, “Top threats to cloud computing,”
http://www.cloudsecurity alliance.org, 2010.
[7] C. Leslie, “NSA has massive database of Americans’ phone
calls,”http://usatoday30.usatoday.com/news/washington/2006-05-10/.
[8] R. Curtmola, J. A. Garay, S. Kamara, and R. Ostrovsky, “Searchable
symmetric encryption:improveddefinitions and efficient constructions,” in
Proc. of ACM CCS, 2006.
[9] C. Wang, N. Cao, J. Li, K. Ren, and W. Lou, “Secure ranked keyword
search over encryptedcloud data,” in Proc. of ICDCS, 2010.
[10] S. Zerr, D. Olmedilla, W. Nejdl, and W. Siberski, “Zerber+r: Top-k
retrieval from aconfidential index,” in Proc. of EDBT, 2009.
ISSN: 2231-5381
BIOGRAPHIES
R. Venkataramana pursuing his M.Tech in
Computer Science at GIITS Engineering
College Visakhapatnam, Andhra Pradesh.
His areas of Interest are data mining,
network security.
M.V.Satya NagaRaju received the M Tech
degree in Computer Science from
Jawaharlal Nehru Technological University,
Kakinada in 2011. Currently he is working
as Associate Professor and H.O.D in
Department of computer science in GIITS Engineering
College,Visakhapatnam, Andhra Pradesh, India. He has 6
years of experience in teaching and published many papers
in international conferences
http://www.ijettjournal.org
Page 75
Download