International Journal of Engineering Trends and Technology (IJETT) – Volume 17 Number 2 – Nov 2014 An Efficient and Multi Keyword Search in Service Oriented Architecture R. Venkataramana1, M.V.Satya NagaRaju2 2 1 M.Tech Student, Associate Professor 1,2 Department of CSE,GIITS Engineering College, Visakhapatnam, Andhra Pradesh, India. Abstract:Searching top k multi keywords from the out sourced data files is still an interesting research issue because out sourced data over cloud can be encrypted for confidentiality .In this paper we are proposing an efficient top k retrieval from out sourced file through service oriented application by computing the file relevance score for input multi keywords and symmetric key encryption and every manipulation comes from the server end instead of client end every the ranking of the documents based on file relevance scores. I.INTRODUCTION In could computing, the cloud services enable users to store the data in cloud server. It also enables share their data with other users also. Based on the cloud services more amount of information is centralized into cloud servers. To secure the data privacy, the data is encrypted and store in cloud servers. By using encryption techniques cloud services providing confidentiality and privacy in gives assurance to data in cloud. After storing the encrypted data in cloud increases the usage of cloud services. But it also a problem that more number of users is using the cloud services for utilizing the outsourced data. Increasing the scalability of the users and the more data exchanging the data there is increasing of malicious users also. So then cloud services introduced authentication before utilizing the services. After authentication of users only retrieve the information from the cloud, this because to limit the malicious users. Then search researches introduced keyword based search that is most popular ways that retrieve information which matches to given keyword only. But this type of searching restricts the searching in encrypted data in cloud. Then researchers focused on encrypted data searching techniques. In these methods allows keywords to search over encrypted data. There are some techniques such as Boolean keyword search which extract the files relevant to the searched query. This technique applied directly in cloud services. But it has some limitations for every search users have to wait for pre-processing to get the query result. Based on the previous work there is an establishment of the novel techniques based on ranking. In this rank based searching techniques rank order the relevance file in response of the search query. This technique applied on ISSN: 2231-5381 plain text searching inly. Then researchers work for searchable encryption system which allows the secured ranked search. Searchable encryption achieved best results in retrieving the encryption data in cloud. Then next main concept is user’s access and search patterns that are hiding whole data but shows encapsulated data only. In latest researches there is some confusion related to three different methods for searching with privacy: searching on private-key encrypted data which is the subject of this work searching on public-key encrypted data and private information retrieval. The common property to all three models is a server sometimes called the database that stores data and user that wish to access or search or manipulate the data while revealing as little as possible to the server. There are, however, important differences between these three settings they are searching on private key, searching on public key and private information retrieval. For avoiding the cloud service from including in ranking and believe in all the work to the user is a normal way to prevent data leakage. Any way the limited calculative power on the user side and more computational complexity precludes information security. The situation of secure multi-keyword topk retrieval on encrypted cloud data thus leads to how to make the cloud do more work during the process of retrieval without information leakage. II. RELATED WORK Homomorphic encryption is type of encryption that allows particular types of calculations to worked on cipher text. It results an encrypted text and at the time of decryption it matches the output of the operations on the plain text. This is new feature in modern communication system topology. In this encryption it allows the linking together of various services without showing the data to each of these services. There are many partial homomorphic cryptographic techniques but those are low efficient methods. Here some examples shown below: The message is x and encrypted message is E(X). Unpadded RSA: Consider modulo m and exponent e and then the cipher is given by http://www.ijettjournal.org Page 71 International Journal of Engineering Trends and Technology (IJETT) – Volume 17 Number 2 – Nov 2014 E(x)=xe mod m. The homomorphic property is years, it was unclear whether fully homomorphic encryption was even possible. E(x1).E(x2)=x1e x2e mod m Elgamal: In this Group g and the public key id(g,q,g,h) and h=gx and x is the private key then the cipher message is E(x)=(gr,m.hr) for different values r belongs to {0,1,….,m1}. For this the homomorphic property is E(b1).E(b2)=xb1 r12xb2r22=E(b1 Ex-OR b2) Benaloh: In this modulo is m and the base g with a block length c then the cipher is E()=gxrc mod m for random values r belongs to {0,1,…..m-1} Then the homomorphic E(x1).E(x2)=(gx1r1c)(gx2r2c) property is Paillier: In this the modulus is m and base is g then the cipher is E(x)=gxrm mod m2 r belongs to {0,1,….m-1}. Then for this the homomorphic property is E(x1).E(x2)=(gx1 r1m).(gx2 r2m) Every example shown above allows the homomorphic calculation og single operation on plaintexts. A cryptographic technique allows both addition and multiplication operations are referred as homomorphic encryption and these are very strong and secure. By this methods any circuit will solve and allows to construct efficient programs which may be execute their encryption technique. The presence of fully homomorphic encryption have great practical situations in outsourcing of secret calculations in the case of cloud computing. The homomorphic part of a fully homomorphic encryption scheme can also be described in terms of category theory. If C is the category whose objects are integers (i.e., finite streams of data) and whose morphisms include addition and multiplication, then the encryption operation of a fully homomorphic encryption scheme C. The categorical approach allows for a generalization beyond the ring structure (composition of addition and multiplication) of the integers. If the morphisms of some wide super category of C include the primitive recursive functions or even all computable functions, then any encryption operation which qualifies as an end of this super category is "more fully" homomorphic since additional operations on encrypted data (for example conditionals and loops) are possible. The utility of fully homomorphic encryption has been long recognized. The problem of constructing such a scheme was first proposed within a year of the development of RSA. A solution proved more elusive; for more than 30 ISSN: 2231-5381 Craig Gentry using lattice-based cryptography showed the first fully homomorphic encryption scheme as announced by IBM on June 25, 2009.His scheme supports evaluations of arbitrary depth circuits. His construction starts from a somewhat homomorphic encryption scheme using ideal lattices that is limited to evaluating low-degree polynomials over encrypted data. (It is limited because each cipher text is noisy in some sense, and this noise grows as one adds and multiplies cipher texts, until ultimately the noise makes the resulting cipher text indecipherable.) He then shows how to modify this scheme to make it boots-trappable—in particular, he shows that by modifying the somewhat homomorphic scheme slightly, it can actually evaluate its own decryption circuit, a selfreferential property. Finally, he shows that any bootstrappable somewhat homomorphic encryption scheme can be converted into a fully homomorphic encryption through a recursive self-embedding. In the particular case of Gentry's ideal-lattice-based somewhat homomorphic scheme, this bootstrapping procedure effectively "refreshes" the cipher text by reducing its associated noise so that it can be used thereafter in more additions and multiplications without resulting in an indecipherable cipher text. Gentry based the security of his scheme on the assumed hardness of two problems: certain worst-case problems over ideal lattices, and the sparse (or low-weight) subset sum problem. Regarding performance, cipher texts in Gentry's scheme remain compact insofar as their lengths do not depend at all on the complexity of the function that is evaluated over the encrypted data. The computational time only depends linearly on the number of operations performed. However, the scheme is impractical for many applications, because cipher text size and computation time increase sharply as one increases the security level. To obtain 2k security against known attacks, the computation time and cipher text size are high-degree polynomials in k. III. PROPOSED WORK In this current research work we have proposed an empirical model of multi keyword secure search with simple and efficient technique. Data owner (DO) reads the documents to uploads into server and preprocess the documents to remove unnecessary or irrelevant information from the documents and extracts each individual keyword and applies cryptographic algorithm over extracted keywords and computes the occurrences of the keywords i.e. Term frequency and finally, this basic index table can be uploaded to server then user searches for specific query and retrieves rank oriented results from cloud service provider. http://www.ijettjournal.org Page 72 International Journal of Engineering Trends and Technology (IJETT) – Volume 17 Number 2 – Nov 2014 In our proposed technique DO uploads the data or files to the server , before out sourcing the data at server , DO preprocess the documents .DO has a set of data files C = (F1; F2….,Fn) ,Initially it can be preprocessed by eliminating the unnecessary and irrelevant keywords from the files and generates a base index table by extracting the unique keywords W = (w1;w2; :::;wm) from the files and encrypts them with IDEA algorithm and those cipher keywords and frequency of the cipher keyword can be maintained and uploaded to service provider database. 3. Compute term frequency (TF) (i.e. number of occurrences of a keyword in a document) and inverse document frequency (IDF) (i.e number occurrence of a keyword in all the documents). 4. Generate base index table (Index table) and upload files to server. Cloud is a pay and use service ,SO DO does not know the actual data drives are there and how the data transferring to server. Data confidentiality and integrity are prime concern while storing data over cloud and end user expects relevant and interesting resultsfrom the server. Base index table contained cipher keyword, frequency of keyword and field. Base index table can be generated with three attributes along with extracted file and uploads to the cloud server. The following table shows sample baseindex table as follows. Algorithm for Base Index table generation 1. Read thedocument or file 2. Preprocess the document 3. Extract distinct keywords and convert to cipher keyword with IDEA algorithm Keyword Cipher Keyword Term Frequency File ID Mobile Apple Elephant Paper Figure 1: Base Table $%^&*( *(!!~*^% ##||%^$%& $%^$%^ 4 3 1 2 Abc.html Hello.docx Hello.docx Main.txt Multi Keyword Ranking: In real time environment we cannot expect or assume the input query is always a simple and single keyword.some queries may contains multiple keywords. End user requests to the service provider with input query, Service provider process the request and compute file relevance score or document weights in terms of terms of frequency of the keywords and forwards results based on the order or weight of the documents Fscore= TF * IDF The above computation shows the file relevance score which is computed by the term frequency and inverse document frequency. Web service: Web service is one of technology to create SOA (service oriented architecture) with three tier architecture, it minimizes duplication of operations by maintain the business logic at specific one location (centralized server). The main goal of the service oriented ISSN: 2231-5381 architecture is language interoperability (i.e. any standard language can communicate with other language even though both are different languages) and minimizes the damage chances from client end. Data Cache is a mechanism which increases the performance from user end and reduces over head from server end and stores frequently access results for future retrieval when user requested for same input query it reduces execution time i.e. (round trip over the input request and responsetime from server during the user input query can be minimized in terms of time complexity and minimizes additional overhead on server to process the same input keyword. If any user request with same input query which is requested before, query need not to process by server again and no need of a round trip , because previoussearch results retrieved from the web server before forwarded to user and it can be stored in data cache ,nextsearch onwards input query results retrieved from cachestorage instead of web server. http://www.ijettjournal.org Page 73 International Journal of Engineering Trends and Technology (IJETT) – Volume 17 Number 2 – Nov 2014 Database Business Logic Wsdl with Soap protocol UI (VB.Net) UI (Java) Search Implementation: UI (Android) Step5: Service extracts query based information from base index table User should request to DO, for key which is required for secure search, after successful receive of key from data owner, user authenticates with credentials of user id and key. An authorized user can search by decrypting the cipher keyword and compute overall term frequency and inverse document fr4equencies with respect to input keyword for file relevance score. Files can be retrieved based on the our novel file relevance scores Step6: Computes file relevance score or document weights based on frequency of keywords File_relevance_Scores[j] = Convert.ToDecimal((1 / termsinfile[j]) * (1 + Math. Log(termfreqs[j])) * Math. Log(1 + (file count / numberoffiles))); Step7: Return search results based on decreasing order of their file relevance score of the documents. Step1: End User requests for key from Data owner Step2: End user receives secure key for encryption of search keyword at server. Step3: User searches with input query (single or multiple keywords) and key Step4: Server authorize the user with credentials and process input query ISSN: 2231-5381 . http://www.ijettjournal.org Page 74 International Journal of Engineering Trends and Technology (IJETT) – Volume 17 Number 2 – Nov 2014 Data Owner Base Index table Server Rank Oriented Results Request Key Key Query Cache Implementation User Figure 3: Architecture By the above approach only authorized user can search and prevents the unauthorized access from personal and private search engines and maintains data integrity and confidentiality. IV.CONCLUSION We have concluded our current research with efficient and secure multi- keyword search implementation. Base index table reduces the time complexity by preprocesseddocuments , Ranking gives top search results for input query through file relevance score ,web service maintains the language interoperability .our experimental results shows secure and efficient search implementation than the traditional approach. REFERENCES 1. ARMBRUST, M., AND ET AL. Above the clouds: A Berkeley viewof cloud computing. Tech. Rep. UCB/EECS-2009-28, EECS Department,U.C. Berkeley, Feb 2009. [2] M. Arrington, “Gmail disaster: Reports of mass email deletions,”http://www.techcrunch.com/2006/12/28/gmail-disasterreportsof-mass-email-deletions/,December 2006. [3] Amazon.com, “Amazon s3 availability event: July 20, 2008,”http://status.aws.amazon.com/s3-20080720.html, 2008. [4] RAWA News, “Massive information leak shakes Washington over Afghan war,”http://www.rawa.org/temp/runews/2010/08/20/massiveinformation-leak-shakeswashington-over-afghan-war.html, 2010 [5] AHN, “Romney hits Obama for security information leakage,”http://gantdaily.com/2012/07/25/romney-hits-obama-forsecurity- information-leakage/,2012 [6] Cloud Security Alliance, “Top threats to cloud computing,” http://www.cloudsecurity alliance.org, 2010. [7] C. Leslie, “NSA has massive database of Americans’ phone calls,”http://usatoday30.usatoday.com/news/washington/2006-05-10/. [8] R. Curtmola, J. A. Garay, S. Kamara, and R. Ostrovsky, “Searchable symmetric encryption:improveddefinitions and efficient constructions,” in Proc. of ACM CCS, 2006. [9] C. Wang, N. Cao, J. Li, K. Ren, and W. Lou, “Secure ranked keyword search over encryptedcloud data,” in Proc. of ICDCS, 2010. [10] S. Zerr, D. Olmedilla, W. Nejdl, and W. Siberski, “Zerber+r: Top-k retrieval from aconfidential index,” in Proc. of EDBT, 2009. ISSN: 2231-5381 BIOGRAPHIES R. Venkataramana pursuing his M.Tech in Computer Science at GIITS Engineering College Visakhapatnam, Andhra Pradesh. His areas of Interest are data mining, network security. M.V.Satya NagaRaju received the M Tech degree in Computer Science from Jawaharlal Nehru Technological University, Kakinada in 2011. Currently he is working as Associate Professor and H.O.D in Department of computer science in GIITS Engineering College,Visakhapatnam, Andhra Pradesh, India. He has 6 years of experience in teaching and published many papers in international conferences http://www.ijettjournal.org Page 75