Rank Oriented Search Results for user queries over Web services Asapu Gowthami, KattamuriSantoshJshansi

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 1 – Oct 2014
Rank Oriented Search Results for user queries over
Web services
1
Asapu Gowthami,2KattamuriSantoshJshansi
1
1,2
Final MtechStudent ,2Assistant Professor
Dept of CSE, MVGR college of Engineering, Chintalavalasa,A.P
Abstract:Searching top k multi keywords from the out sourced
data files is still an interesting research issue because out sourced
data over cloud can be encrypted for confidentiality .In this paper
we are proposing an efficient top k retrieval from out sourced file
through service oriented application by computing the file
relevance score for input multi keywords and symmetric key
encryption and every manipulation comes from the server end
instead of client end every the ranking of the documents based on
file relevance scores.
I.INTRODUCTION
In could computing, the cloud services enable
users to store the data in cloud server. It also enables share
their data with other users also. Based on the cloud services
more amount of information is centralized into cloud
servers. To secure the data privacy, the data is encrypted
and store in cloud servers. By using encryption techniques
cloud services providing confidentiality and privacy in
gives assurance to data in cloud. After storing the
encrypted data in cloud increases the usage of cloud
services. But it also a problem that more number of users is
using the cloud services for utilizing the outsourced
data[1][2].
Increasing the scalability of the users and the more
data exchanging the data there is increasing of malicious
users also. So then cloud services introduced authentication
before utilizing the services. After authentication of users
only retrieve the information from the cloud, this because
to limit the malicious users. Then search researches
introduced keyword based search that is most popular ways
that retrieve information which matches to given keyword
only. But this type of searching restricts the searching in
encrypted data in cloud. Then researchers focused on
encrypted data searching techniques. In these methods
allows keywords to search over encrypted data. There are
some techniques such as Boolean keyword search which
extract the files relevant to the searched query. This
technique applied directly in cloud services. But it has
some limitations for every search users have to wait for
pre-processing to get the query result[3].
ISSN: 2231-5381
Based on the previous work there is an establishment of the
novel techniques based on ranking. In this rank based
searching techniques rank order the relevance file in
response of the search query. This technique applied on
plain text searching inly. Then researchers work for
searchable encryption system which allows the secured
ranked search. Searchable encryption achieved best results
in retrieving the encryption data in cloud. Then next main
concept is user’s access and search patterns that are hiding
whole data but shows encapsulated data only[4].
In latest researches there is some confusion related to three
different methods for searching with privacy: searching on
private-key encrypted data which is the subject of this work
searching on public-key encrypted data and private
information retrieval. The common property to all three
models is a server sometimes called the database that stores
data and user that wish to access or search or manipulate
the data while revealing as little as possible to the server.
There are, however, important differences between these
three settings they are searching on private key, searching
on public key and private information retrieval[5].
For avoiding the cloud service from including in ranking
and believe in all the work to the user is anormal way to
prevent data leakage. Any way the limited calculative
power on the user side and more computational complexity
precludes information security. The situation ofsecure
multi-keyword topk retrieval on encrypted cloud data thus
leads to how to make the clouddo more work during the
process of retrieval without information leakage.
II. RELATED WORK
Homomorphic encryption is type of encryption that allows
particular types of calculations to worked on cipher text. It
results an encrypted text and at the time of decryption it
matches the output of the operations on the plain text. This
is new feature in modern communication system topology.
In this encryption it allows the linking together of various
services without showing the data to each of these services.
http://www.ijettjournal.org
Page 23
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 1 – Oct 2014
There are many partial homomorphic cryptographic
techniques but those are low efficient methods[6][7].
Here some examples shown below:
The message is x and encrypted message is E(X).
Unpadded RSA:
Consider modulo m and exponent e and then the cipher is
given by
E(x)=xe mod m. The homomorphic property is
E(x1).E(x2)=x1e x2e mod m
Elgamal:
In this Group g and the public key id(g,q,g,h) and h=gx and
x is the private key
then the cipher message is
r
r
E(x)=(g ,m.h ) for different values r belongs to {0,1,….,m1}.
For this the homomorphic property is E(b1).E(b2)=xb1
r12xb2r22=E(b1 Ex-OR b2)
Benaloh:
In this modulo is m and the base g with a block length c
then the cipher is E()=gxrc mod m for random values r
belongs to {0,1,…..m-1}
Then
the
homomorphic
E(x1).E(x2)=(gx1r1c)(gx2r2c)
property
is
Paillier:
In this the modulus is m and base is g then the cipher is
E(x)=gxrm mod m2 r belongs to {0,1,….m-1}. Then for this
the homomorphic property is E(x1).E(x2)=(gx1 r1m).(gx2
r2m)
Every example shown above allows the homomorphic
calculation og single operation on plaintexts. A
cryptographic technique allows both addition and
multiplication operations are referred as homomorphic
encryption and these are very strong and secure. By this
methods any circuit will solve and allows to construct
efficient programs which may be execute their encryption
technique. The presence of fully homomorphic encryption
have great practical situations in outsourcing of secret
calculations in the case of cloud computing.
ISSN: 2231-5381
The homomorphic part of a fully homomorphic encryption
scheme can also be described in terms of category theory.
If C is the category whose objects are integers (i.e., finite
streams of data) and whose morphisms include addition
and multiplication, then the encryption operation of a fully
homomorphic encryption scheme C. The categorical
approach allows for a generalization beyond the ring
structure (composition of addition and multiplication) of
the integers. If the morphisms of some wide super category
of C include the primitive recursive functions or even all
computable functions, then any encryption operation which
qualifies as an end of this super category is "more fully"
homomorphic since additional operations on encrypted data
(for example conditionals and loops) are possible. The
utility of fully homomorphic encryption has been long
recognized. The problem of constructing such a scheme
was first proposed within a year of the development of
RSA. A solution proved more elusive; for more than 30
years, it was unclear whether fully homomorphic
encryption was even possible[8][9].
Craig Gentry using lattice-based cryptography showed the
first fully homomorphic encryption scheme as announced
by IBM on June 25, 2009.His scheme supports evaluations
of arbitrary depth circuits. His construction starts from a
somewhat homomorphic encryption scheme using ideal
lattices that is limited to evaluating low-degree
polynomials over encrypted data. (It is limited because
each ciphertext is noisy in some sense, and this noise grows
as one adds and multiplies ciphertexts, until ultimately the
noise makes the resulting ciphertext indecipherable.) He
then shows how to modify this scheme to make it bootstrappable—in particular, he shows that by modifying the
somewhat homomorphic scheme slightly, it can actually
evaluate its own decryption circuit, a self-referential
property. Finally, he shows that any boots-trappable
somewhat homomorphic encryption scheme can be
converted into a fully homomorphic encryption through a
recursive self-embedding. In the particular case of Gentry's
ideal-lattice-based somewhat homomorphic scheme, this
bootstrapping procedure effectively "refreshes" the
ciphertext by reducing its associated noise so that it can be
used thereafter in more additions and multiplications
without resulting in an indecipherable ciphertext. Gentry
based the security of his scheme on the assumed hardness
of two problems: certain worst-case problems over ideal
lattices and the sparse (or low-weight) subset sum problem.
Regarding performance, ciphertexts in Gentry's
scheme remain compact insofar as their lengths do not
http://www.ijettjournal.org
Page 24
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 1 – Oct 2014
the files and generates a base index table by extracting the
unique keywords W = (w1;w2; :::;wm) from the files and
encrypts them with IDEA algorithm and those cipher
keywords and frequency of the cipher keyword can be
maintained and uploaded to service provider database.
depend at all on the complexity of the function that is
evaluated over the encrypted data. The computational time
only depends linearly on the number of operations
performed. However, the scheme is impractical for many
applications, because ciphertext size and computation time
increase sharply as one increases the security level. To
obtain 2k security against known attacks, the computation
time and ciphertext size are high-degree polynomials in
k[10].
Algorithm for Base Index table generation
1. Read thedocument or file
2. Preprocess the document
III. PROPOSED WORK
3. Extract distinct keywords and convert to cipher keyword
with IDEA algorithm
In this current research work we have proposed an
empirical model of multi keyword secure search with
simple and efficient technique. Data owner (DO) reads the
documents to uploads into server and preprocess the
documents to remove unnecessary or irrelevant information
from the documents and extracts each individual keyword
and applies cryptographic algorithm over extracted
keywords and computes the occurrences of the keywords
i.e. Term frequency and finally, this basic index table can
be uploaded to server then user searches for specific query
and retrieves rank oriented results from cloud service
provider.
3. Compute term frequency (TF) (i.e. number of
occurrences of a keyword in a document) and inverse
document frequency (IDF) (i.e. number occurrence of a
keyword in all the documents).
4. Generate base index table (Index table) and upload files to
server.
Cloud is a pay and use service ,SO DO does not
know the actual data drives are there and how the data
transferring to server. Data confidentiality and integrity are
prime concern while storing data over cloud and end user
expects relevant and interesting resultsfrom the server.
Base index table contained cipher keyword, frequency of
keyword and field. Base index table can be generated with
three attributes along with extracted file and uploads to the
cloud server. The following table shows sample baseindex
table as follows.
In our proposed technique DO uploads the data or
files to the server , before out sourcing the data at server ,
DO preprocess the documents .DO has a set of data files C
= (F1; F2….,Fn) ,Initially it can be preprocessed by
eliminating the unnecessary and irrelevant keywords from
Keyword
Cipher Keyword
Term Frequency
File ID
Mobile
Apple
Elephant
Paper
Figure 1: Base Table
$%^&*(
*(!!~*^%
##||%^$%&
$%^$%^
4
3
1
2
Abc.html
Hello.docx
Hello.docx
Main.txt
Multi Keyword Ranking:
In real time environment we cannot expect or assume the
input query is always a simple and single keyword.some
queries may contains multiple keywords. End user requests
to the service provider with input query, Service provider
process the request and compute file relevance score or
document weights in terms of terms of frequency of the
keywords and forwards results based on the order or
weight of the documents
Fscore= TF * IDF
ISSN: 2231-5381
The above computation shows the file relevance score
which is computed by the term frequency and inverse
document frequency.
Web service:
Web service is one of technology to createSOA (service
oriented architecture) with three tier architecture, it
minimizes duplication of operations by maintain the
business logic at specific one location (centralized server).
The main goal of the service oriented architecture is
language interoperability (i.e. any standard language can
http://www.ijettjournal.org
Page 25
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 1 – Oct 2014
communicate with other language even though both are
different languages) and minimizes the damage chances
from client end.
Data Cacheis a mechanism which increases the
performance from user end and reduces over head from
server end and stores frequently access results for future
retrieval when user requested for same input query it
reduces execution time i.e. (round trip over the input
request and responsetime from server during the user input
query can be minimized in terms of time complexity and
minimizes additional overhead on server to process the
same input keyword. If any user request with same input
query which is requested before, query
need not to
process by server again and no need of a round trip ,
because previoussearch results retrieved from the web
server before forwarded to user and it can be stored in data
cache ,nextsearch onwards input query results retrieved
from cachestorage instead of web server.
Database
Business
Logic
Wsdl with Soap protocol
UI (VB.Net)
UI (Java)
Fig2: Web service Architecture
Search Implementation:
User should request to DO, for key which is required for
secure search, after successful receive of key from data
ISSN: 2231-5381
UI (Android)
owner, user authenticates with credentials of user id and
key. An authorized user can search by decrypting the
cipher keyword and compute overall term frequency and
inverse document fr4equencies with respect to input
keyword for file relevance score.
http://www.ijettjournal.org
Page 26
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 1 – Oct 2014
Data Owner
Base Index table
Server
Rank Oriented Results
Request Key
Key
Query
Cache
Implementation
User
Figure 3: Architecture
Files can be retrieved based on the our novel file relevance
scores
Step1: End User requests for key from Data owner
Step2: End user receives secure key for encryption of
search keyword at server.
Step3: User searches with input query (single or multiple
keywords) and key
Step4: Serverauthorize the user with credentials and
process input query
Step5: Service extracts query based information from base
index table
Step6: Computes file relevance score or document weights
based on frequency of keywords
File_relevance_Scores[j] = Convert.ToDecimal((1
termsinfile[j]) * (1 + Math.Log(termfreqs[j]))
Math.Log(1 + (filecount / numberoffiles)));
/
*
Step7: Return search results based on decreasing order of
their file relevance score of the documents.
By the above approach only authorized user can
search and prevents the unauthorized access from personal
and private search engines and maintains data integrity and
confidentiality.
IV.CONCLUSION
Base index table reduces the time complexity by preprocessed documents, Ranking gives top search results for
input query through file relevance score ,web service
maintains the language interoperability .our experimental
results shows secure and efficient search implementation
than the traditional approach.
REFERENCES
1. ARMBRUST, M., AND ET AL. Above the clouds: A Berkeley viewof
cloud
computing.
Tech.
Rep.
UCB/EECS-2009-28,
EECS
Department,U.C. Berkeley, Feb 2009.
[2] M. Arrington, “Gmail disaster: Reports of mass email
deletions,”http://www.techcrunch.com/2006/12/28/gmail-disasterreportsof-mass-email-deletions/,December 2006.
[3] Amazon.com, “Amazon s3 availability event: July 20,
2008,”http://status.aws.amazon.com/s3-20080720.html, 2008.
[4] RAWA News, “Massive information leak shakes Washington over
Afghan
war,”http://www.rawa.org/temp/runews/2010/08/20/massiveinformation-leak-shakeswashington-over-afghan-war.html, 2010
[5] AHN, “Romney hits Obama for security information
leakage,”http://gantdaily.com/2012/07/25/romney-hits-obama-forsecurity- information-leakage/,2012
[6] Cloud Security Alliance, “Top threats to cloud computing,”
http://www.cloudsecurity alliance.org, 2010.
[7] C. Leslie, “NSA has massive database of Americans’ phone
calls,”http://usatoday30.usatoday.com/news
/washington/2006-05-10/.
[8] R. Curtmola, J. A. Garay, S. Kamara, and R. Ostrovsky, “Searchable
symmetric encryption:improveddefinitions and efficient constructions,” in
Proc. of ACM CCS, 2006.
[9] C. Wang, N. Cao, J. Li, K. Ren, and W. Lou, “Secure ranked keyword
search over encryptedcloud data,” in Proc. of ICDCS, 2010.
[10] S. Zerr, D. Olmedilla, W. Nejdl, and W. Siberski, “Zerber+r: Top-k
retrieval from aconfidential index,” in Proc. of EDBT, 2009.
We have concluded our current research with
efficient and secure multi- keyword search implementation.
ISSN: 2231-5381
http://www.ijettjournal.org
Page 27
Download