International Journal of Engineering Trends and Technology (IJETT) – Volume 27 Number 3 - September 2015 Secrecy for Anonymity Users of Crowd sourcing Database PylaJyothi1, Ch. NagabhushanaRao2 1,2 Final M.Tech Student 1,Associate Professor2 Department of Computer Science & Engineering, Dadi Institute of Engineering & Technology, Anakapalle531 002, A.P Abstract: Privacy is the major concern while storing and retrieving the data to or from crowd sourcing data base, in crowd sourcing techniques into the database engine usually crowd sourcing job requires us to publish some sensitive data to the anonymous human workers. In this paper we are proposing a novel model for authentication, group key protocol and data confidentiality. Authentication and group key generation can be implemented by the authentication group key protocol and data confidentiality can be maintained by the cryptographic algorithm, our proposed work is more efficient and less time complexity than the traditional approaches I. INTRODUCTION k-anonymity is a property controlled by certain anonymized information. "Given individual particular field-organized information, create information’s receiving with experimental assurances that the people who are the information's subjects can't be re-recognized while the information remain for all intents and purposes useful."[1][2] A receive of information is said to have the k-anonymity property if the data for every individual contained in the result can't be recognized from in any event k-1 people whose data likewise show up in the result. Society is encountering exponential development in the number and combination of information accumulations containing individual particular data as PC innovation, network availability and disk storage room turn out to be progressively reasonable. Information holders, working self and with restricted knowledge, are left with the trouble of discharging data that does not trade off protection, privacy or national hobbies. By and large the database's survival itself relies on upon the information holder's capacity to deliver unknown information in light of the fact that not discharging such data at all may reduce the requirement for the information, while on the other hand, neglecting to give legitimate assurance inside of a result may make circumstances that mischief people in general or others.[3] In a previous work presented fundamental security models termed invalid guide, k-map also, wrong-delineate give assurance by guaranteeing that resulted data guide to no, k or mistaken substances, separately. To decide what number of people each resulted tuple really matches obliges consolidating the resulted information with remotely accessible information and breaking down other conceivable ISSN: 2231-5381 attacks. Making such a determination specifically can be a to a great degree troublesome task for the information holder who results data. In spite of the fact that it can expect the information holder knows which information in PT additionally show up remotely, and thusly what constitutes a semi identifier, the particular qualities contained in outer information can't be expected. I along these lines seek to ensure the data in this work by fulfilling a marginally distinctive limitation on resulted information, termed the k-anonymity prerequisite. This is an uncommon instance of k map insurance where k is implemented on the resulted information.[4] The ideas of k-anonymity and of a quasiidentifier are straightforward. Nevertheless, care must be taken to precisely state what is meant. [5] Provides a detailed discussion of k-anonymity. A brief summary is provided in this section as background for the upcoming presentations on generalization and suppression. Unless otherwise stated, the term data refers to person-specific information that is conceptually organized as a table of rows (or records) and columns (orfields). Each row is termed a tuple. Tuples within a table are not necessarily unique. Each column is called an attribute and denotes a semantic category of information that is a set of possible values; therefore, an attribute is also a domain. Attributes within a table are unique. So by observing a table, each row is an ordered n-tuple of values <d1, d2, …,dn> such that each value dj is in the domain of the j-th column, for j=1, 2, …, n where n is the number of columns. This corresponds to relational database concepts [7]. II. RELATED WORK The issue of k-anonymity [6] always tried to get on the protected table PrT to be generated, one of the important data requirements that has been followed by the community and by companies releasing the data, and based on the data which the generated data should be differentiated by the related to no less than a certain number of users. The group of attributes consisted in the private table and also available outside and so that exclusively for linked is known as quasi-identifier. The requirement initiated is then translated in [8] in the k-anonymity considerations below, which that every tuple released cannot be related to fewer than k users Requirement 1: http://www.ijettjournal.org Page 167 International Journal of Engineering Trends and Technology (IJETT) – Volume 27 Number 3 - September 2015 Each release of data must be such that every combination of values of quasi-identifiers can be indistinctly matched to at least k respondents. It considered to be impossible or highly not a practical and limiting, to make considerations on the database which is available for combining to outer intruders or very importantly curious that the data recipients and it is essentially k-anonymity takes safer method requiring in the generated table itself and the respondents be differentiable with respect to the group of objects. To guarantee the k-anonymity requirement and the k-anonymity requires each quasiidentifier value in the released table to have at least k occurrences. The method of k-anonymity requires the initial identification of the quasi-identifier. The quasiidentifier depends on the outside information is available to the recipient and this explains her combining the ability and not all possible external tables are available to every possible data recipient and the various quasi-identifiers physically exist for a given table. For the requirement of the simplicity the actual k-anonymity proposal [6] considers that private table PT has a single quasi-identifier composed of all attributes in PT that can be externally available and contains at most one tuple for each respondent. Therefore although the identification of the correct quasi-identifier for a private table can be a difficult task it is assumed that the quasi-identifier has been properly recognized and defined.[7,8] Among the techniques proposed for providing anonymity in the release of microdata, the k-anonymity proposal focuses on two techniques in particular: generalization and suppression, which, unlike other existing techniques, such as scrambling or swapping, preserve the truthfulness of the information. We have already introduced generalization and suppression in chap. “Microdata Protection”. We now illustrate here their specific definition and use in the context of k-anonymity. Generalization consists in substituting the values of a given attribute with more general values. To this purpose, the notion of domain (i.e., the set of values that an attribute can assume) is extended to capture the generalization process by assuming the existence of a set of generalized domains. The set of original domains together with their generalizations is referred to as Dom. Each generalized domain contains generalized values and there exists a mapping between each domain and its generalizations. In the existing technique are does not provide privacy of crowd sourcing database. So that the anonymous human workers are easily get the crowd sourcing database. In the existing technique not provide more for crowd sourcing database. So that so many technique are not providing security and more efficiency for crowd sourcing database. So that to overcome those problem we implement proposed system. III. PROPOSED SYSTEM In this paper we are proposing an empirical model of privacy preserving data retrieval and storage technique of crowd storage databases, so we must ensure the privacy for sensitive data to the anonymous human workers. This model or architecture divided into two stages, one is authentication and group key generation and other is to maintain the data confidentiality .We areBlock cipher Encryption Algorithm. Before encrypt and decrypt the data the users will generate group key. The process generating group key is as follows. The following alogorithm shows the authentication process and generation of the group key and implementation of the protocol 4. Subset of P points and sig 1.send ( Nonce) 6. enc and Dec of crowd database Node users 2. Response (Points) 3. share Points 5.Auth and secret key UserAuthentication Schema and Key generation: ISSN: 2231-5381 Group Key manager 1. Each user will send request to group key manager. http://www.ijettjournal.org Page 168 International Journal of Engineering Trends and Technology (IJETT) – Volume 27 Number 3 - September 2015 2. 3. 4. 5. 6. 7. 8. 9. The group key manger will send response as id of users. Each user will send the nonce to group key manger. The nonce (Ni) value will be generating randomly. After retrieving all nonce of users the group key manager will generate a point(Xi,Yi) for each user based on random challenge. After generating points the group key manager will send to individual users. Each user will retrieve the point he/she will generate share point (Xi ,Yi^Ni)using the point and random challenge. The generation share point each user will send to group key manager. Using share point the group key manager will generate signature using hash function. Those signatures are sent individual user. After retrieving the signatures each user again generate signature and compare both are equal those users verified user. After completion of verification process the group key manager will secret key. After generating secret key the group key manager will divide secret with n number of parts. Where any subset of parts will reconstruct the polynomial function. Before generating polynomial function the group key manager will choose random number for generation of polynomial function. The polynomial function is given below. F(x)=secret+bx+ax2 10. After that the group key manger will send subset parts to individual user. 11. The users will retrieve those parts and again generate polynomial function and get same secret of each user. Example: • Let S=1234 • n=6 and k=3 and obtain random integers a1=166 and a2=94 f(x)=1234+166x+94x2 • Secret share points (1,1494),(2,1942)(3,2598)(4,3402)(5,4414)( 6,5614) • We give each participant a different single point (both x and f(x) ). Re-construction: • In order to reconstruct the secret any 3 points will be enough • Let us consider (x0,y0)=(2,1924),(x1,y1)=(4,3402),(x2,y2)=(5,4414) L0=x-x1/x0-x1*x-x2/x0-x2=x-4/2-4*x-5/2-5=(1/6)x2(3/2)x+10/3 L1=x-x0/x1-x0*x-x2/x1-x2=x-2/4-2*x-5/4-5=-(1/2)x2(7/2)x-5 L2=x-x0/x2-x0*x-x1/x2-x1=x-2/5-2*x-4/5-4=(1/3)x22x+8/3 2 f(x)= jlj(x)=1942((1/6)x -(3/2)x+10/3)+3402(2 2 (1/2)x -(7/2)x-)+4414((1/3)x -2x+8/3) f(x)=1234+166x+94x2 Group key manger receives the registration request from all the users, and generates a verification share and forwards to all the requested users for authentication purpose, generates the key using key generation process and forwards the points to extraction of the key from the equation generated by the verification points. For key generation protocol, it receives the verification shares and key as input to construct the lagranges polynomial equation f(x), which is passed through (0, key) and verification points ,after that group key manager forwards the points to data owners. Data owners again reconstruct the key from the verification points and check the authentication code which is sent by the group key manager. The completion of authentication and secret key each user will store the database into database engine. Before storing data into database engine each user will encrypt the data using block cipher encryption algorithm. The procedure block cipher encryption algorithm as follows.Block cipher encryption algorithm is a 64-bit symmetric block cipher with variable length key. The algorithm operates with two parts: i)key expansion part ii)data encryption part. The role of key expansion part is to converts a key of at most 448 bits into several sub key arrays totaling 4168 bytes. The data encryption occurs via a 16round Feistel network. It is only suitable for application where the key does not change often, like communications link or an automatic file encryption. It is significantly faster than most encryption algorithms when implemented on 32-bit microprocessors with large data caches .The nature of encryption algorithms is that, once any significant amount of security analysis is done, it is very undesirable to change the algorithm for performance reasons, thereby invalidating the results of the analysis. Thus, it is imperative to consider both security and performance together during the design phase. While it is impossible to take all future computer architectures into consideration, an understanding of general optimization guidelines, combined with exploratory software implementation on existing architectures to calibrate performance, Using lagrangeous polynomials ISSN: 2231-5381 http://www.ijettjournal.org Page 169 International Journal of Engineering Trends and Technology (IJETT) – Volume 27 Number 3 - September 2015 should help achieve higher speed in future encryption algorithms. Sub key Expansion: Block cipher encryption algorithm uses a large number of subkeys. These keys must be precomputed before any data encryption or decryption. The P-array consists of 18 32-bit subkeys: P1, P2,..., P18. There are four 32-bit S-boxes with 256 entries each: S1,0, S1,1,..., S1,255; S2,0, S2,1,..,, S2,255; S3,0, S3,1,..., S3,255; S4,0, S4,1,..,, S4,255. Pseudo Code of BF(Block Fish) Algorithm: begin itemize Block cipher encryption algorithm has 16 rounds. The input is a 64-bit data element, x. Divide x into two 32-bit halves: xL, xR. Then, for i = 1 to 16: xL = xL XOR Pi xR = F(xL) XOR xR Swap xL and xR After the sixteenth round, swap xL and xR again to undo the last swap. Then, xR = xR XOR P17 and xL = xL XOR P18. Finally, recombine xL and xR to get the ciphertext. processor for humanoperators,” in Proc. ACM SIGMOD, Athens, Greece, 2011,pp. 1315–1318. [4] X. Liu et al., “CDAS: A crowdsourcing data analytics system,”vol. 5, no. 10, pp. 1040–1051, 2012. [5] L. Sweeney, “k-anonymity: A model for protecting privacy,” Int.J. Uncertain. Fuzz. Knowl. Based Syst., vol. 10, no. 5, pp. 557–570, 2002. [6] K. LeFevre, D. J. DeWitt, and R. Ramakrishnan, “Incognito:Efficient full-domain k-anonymity,” in Proc. SIGMOD Conf.,Baltimore, MD, USA, 2005, pp. 49–60. [7] A. Meyerson and R. Williams, “On the complexity of optimal kanonymity,”in Proc. 23rd ACM PODS, New York, NY, USA, 2004, pp. 223–228. [8] K. LeFevre, D. J. DeWitt, and R. Ramakrishnan, “Mondrian multidimensionalk-anonymity,” in Proc. 22nd ICDE, Washington, DC,USA, 2006, p. 25. [9] R. J. Bayardo, Jr. and R. Agrawal, “Data privacy through optimalk-anonymization,” in Proc. 21st ICDE, Washington, DC, USA,2005, pp. 217–228. [10] V. S. Iyengar, “Transforming data to satisfy privacy constraints,”in Proc. 8th ACM SIGKDD Int. Conf. KDD, Edmonton, AB, Canada,2002, pp. 279– 288. BIOGRAPHIES Decryption is exactly the same as encryption, except that P1, P2,..., P18 are used in the reverse order. Implementations of Blowfish that require the fastest speeds should unroll the loop and ensure that all subkeys are stored in cache. IV.CONCLUSION We have been concluding our current research work with efficient authentication and privacy preserving technique of crowd source databases. Group key protocol handles authentication and group key generation ,after the key generation, individual user can encode or decode the data with blowfish cryptographic algorithm ,after encryption data can be stored into data base. If any users retrieve that and decrypt that data using secret get the original data. Our implementation results shows more performance than other traditional approaches. PylaJyothi currently pursuing,M.Tech. ,in the department of computer science and engineering,DIET College,Visakhapatnam,Andhra Pradesh. The areas of interests are Computer Networks, DWDM, Artificial Intelligence. Ch. NagabhushanaRao working as Associate Professor in Department of Computer Science & Engineering,Dadi Institute of Engineering & Technology, Anakapalle-531 002,A.P. The areas of Interest are Computer Networks, DWDM, Artificial Intelligence. REFERENCES [1] A. Feng et al. “CrowdDB: Query processing with the VLDBcrowd,” Proc. VLDB, vol. 4, no. 12, pp. 1387–1390, 2011. [2] A. Marcus, E. Wu, S. Madden, and R. C. Miller, “Crowdsourceddatabases: Query processing with people,” in Proc. 5th CIDR,2011, pp. 211–214. [3] A. Marcus, E. Wu, D. R. Karger, S. Madden, and R. C.Miller, “Demonstration of qurk: A query ISSN: 2231-5381 http://www.ijettjournal.org Page 170