International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 6 – Oct 2014 An Improved Privacy Preserving Mining over Centralized Databases Ch.AjayKumar1,K.PrasadaRao2 1 M.Tech Student, 2Sr. Assistant Professor, 1,2 Department of CSE, Aditya Institute Of Technology And Management(AITAM),Tekkali,Srikakulam,AndhraPradesh Abstract: Secure mining of association rule mining over horizontal databases is always an interesting research issue in the field of knowledge and data engineering. In horizontal partitioning or data bases, databases are integrated from various data holders or players for applying association rule mining over integrated database. In this paper we are proposing a privacy preserving mining approach with Improved LaGrange’s polynomial equation for secure key generation and Binary Matrix approach. Index Terms: Association Rule mining, Binary Matrix, LaGrange’s polynomial. I.INTRODUCTION In view of this brief description, it can be seen that all of these protocols for secure multiparty function evaluation run in unbounded "distributed time," that is, using an unbounded number of rounds of communications [1]. Even though the interaction for each gate can be implemented in a way that requires only a constant number of rounds, the total number of rounds will still be linear in the depth of the underlying circuit. For many concrete computations, the resulting number of rounds would be prohibitive; in distributed computation, the number of rounds is generally the most valuable resource quality important Secure function evaluation consists of distributive evaluating a function so as to satisfy both the correctness and privacy constraints. This task is made particularly difficult by the fact that some of the players may be maliciously faulty and try to cooperate in order to disrupt the correctness and the privacy of the computation. Secure function evaluation arises in two main settings. First, in fault-tolerant computation. In this [2] setting correctness is the main issue: we insist that the values a distributed system returns are correct, no matter how some components in the system fail. However, even if one is solely interested in correctness, privacy helps to achieve it most strongly: if one wants to maliciously influence the outcome of an election, say, it is helpful to know who plans to vote for whom. Second, secure function computation is central to protocol design, as the correctness and privacy of any protocol can be reduced to it. Here, as people may be behind their computers, correctness and privacy are Secure .function evaluation [3]. Assume we have n parties, 1 , . . . , n; each party i has a private input xi known only to him. The parties want to correctly evaluate a given function f on their inputs1, that is to compute y = f ( x l , ...,xn ), ISSN: 2231-5381 while maintaining the privacy of their own inputs. That is, they do not want to reveal more than the value y implicitly reveals. e.Bar-Ilan and Beaver were the first to investigate reducing the round complexity for secure function evaluation. They exhibited a non-cryptographic method that always saves a logarithmic factor of rounds (logarithmic in the total length of the players' inputs), while the total amount of communication grows only by a polynomial factor. Alternatively, they show that the number of rounds can be reduced to a constant, but at the expense of an exponential blowup in the message sizes. We insist that the total amount of communication be polynomial bounded. While their result shows that the depth of a circuit is not a lower bound for the number of rounds necessary for securely evaluating it, the savings is far from being substantial in a general setting. II. RELATED WORK In the traditional association rule mining, companies give their data to the analyst for finding the patterns or association rules exist between the items. Although it is advantageous to achieve sophisticated analysis on tremendous volumes of data in a cost-effective way, there exist several serious security issues of the datamining as-a-service paradigm. One of the main security issues is that the server has access to valuable data of the owner and may learn sensitive information from it. There is a loss of corporate privacy. Traditional distributing algorithm based on apriori, main disadvantage of this approach is multiple database scan and candidate set generations. Association rule mining is one of the mainly essential and fine researched methods of data mining. It aims to extort exciting correlations, common patterns, associations or informal structures amongst sets of objects in the transaction databases or additional data repositories. Association rules are broadly used in a range of areas such as telecommunication networks, market and hazard managing, inventory control etc. [1]. Different association mining methods and algorithms will be momentarily introduced and compared afterwards. Association rule mining is to locate out association rules that suit the predefined least amount support and confidence from a database [3]. http://www.ijettjournal.org Page 305 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 6 – Oct 2014 The trouble is decomposed into two sub problems. One is to discover those item sets whose occurrences go above a predefined threshold in the database; those item sets are known as frequent or large item sets. The second dilemma is to produce association rules from those large item sets with the constraints of negligible confidence [2]. The two most important approach for utilizing multiple Processors that have emerge; distributed memory within the each processor have a private memory; [6]and shared memory within the all processors right to use common memory. Shared memory structural design has many popular properties. Each processor has a straight and equal access to all memory in the scheme.[4] In distributed memory structural design each processor has its own local memory that can only be access directly by that processor. A Parallel purpose could be divided into number of subtasks and executed parallelism on disconnect processors in the system .though the presentation of a parallel application on a distributed system is mostly subject on the allocation of the tasks comprising the application onto the accessible processors in the scheme.[5] Association rule mining model amongst data mining numerous models, including Association rules, clustering and categorization models, is the mostly applied method. The Apriori algorithm is the mainly representative algorithm for association rule mining. It consists of plenty of modified algorithms that focus on civilizing its efficiency and accuracy. III. PROPOSED WORK In this approach we are proposing a privacy preserving mining approach with Binary Matrix, it reduces problem of multiple database scans and candidate set generations by constructing the Binary Matrix. Data can be integrated from multiple data holders or players, for secure transmission or distributed partitioning we are implementing an improved Lagrange’s polynomial approach for secure key generation for encryption of data from data holders with triple DES algorithm . Fig1 : Horizontal partitioning Architecture Data Holder1 Data Holder2 Data Holder1 Cipher Pattern Cipher Pattern Cipher Pattern Encoder/Decoder Centralized Server Every individual data holder or player maintains their transactions or patterns, in horizontal partitioning , every data holder forwards their patterns to centralized server after encryption of patterns which are at individual end, At centralized server received pattern can be decrypted with decoder and forwarded to binary matrix to extract frequent pattern from the received patterns. For experimental purpose we establish connection between the nodes and Central location (Key generation center) through network or socket programming, Key can be generated by using improved ISSN: 2231-5381 Binary Matrix LaGrange’s polynomial equation and key can be distributed to user Every individual node participates in key generation process and retrieves key by reconstruction. Encrypts the datasets by using triple DES and key which is generated by the LaGrange’s polynomial equation. All encrypted datasets can be forwarded to centralized location and decrypted with same symmetric key and forwards to mining process. Group key manger receives the registration request from all the users, and generates a verification share and forwards to all the requested users for http://www.ijettjournal.org Page 306 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 6 – Oct 2014 authentication purpose, generates the key using key generation process and forwards the points to extraction of the key from the equation generated by the verification points. For key generation protocol, it receives the verification shares and key as input to construct the Lagrange’s polynomial equation f(x), which is passed through (0, key) and verification points, after that group key manager forwards the points to data owners. Data owners again reconstruct the key from the verification points and check the authentication code which is sent by the group key manager. When a new user tries to download the file, new user need not to connect other data owner to decryption of the file, user connects to the group key manager he will update the group key and decrypts the files with previous key again encrypt with new key and updates the new key to all the data owners. Key Generation process The goal is to divide secret (e.g., a safe combination) into pieces of data D1,….Dn in such a way that: 1. Knowledge of any k or more Di pieces makes S easily computable. 2. Knowledge of any k-1 or fewer Di pieces leaves S completely undetermined (in the sense that all its possible values are equally likely). This scheme is called (k,n) threshold scheme. If k=n then all participants are required to reconstruct the secret. • Consider n=6 and k=3 and obtain any random integers a1=166 and a2=94 f(x)=1234+166x+94x2 • Secret share points D0= (1,1494),D1=(2,1942)D3=(3,2598)D4=(4,3402)D5 =(5,4414)D6=(6,5614) We give each participant a different single point (both x and f(x)). Because we use Dx-1 instead of Dx the points start from (1, f(1)) and not (0, f(0)). This is necessary because if one would have (0, f(0)) he would also know the secret (S=f(0)) Re-construction • In order to reconstruct the secret any 3 points will be enough • Let us consider (x0,y0)=(2,1924),(x1,y1)=(4,3402),(x2,y2)=(5,4414) Using lagrangeous polynomials L0=x-x1/x0-x1 *x-x2/x0-x2=x-4/2-4*x-5/2-5=(1/6)x2(3/2)x+10/3 L1=x-x0/x1-x0*x-x2/x1-x2=x-2/4-2*x-5/4-5=-(1/2)x2-(7/2)x-5 L2=x-x0/x2-x0 *x-x1/x2-x1=x-2/5-2*x-4/5-4=(1/3)x2-2x+8/3 2 f(x)=∑ j * lj(x) =1942((1/6)x -(3/2)x+10/3)+3402(2 2 (1/2)x -(7/2)x-)+4414((1/3)x -2x+8/3 ) f(x)=1234+166x+94x2 Recall that the secret is the free coefficient, which means that S=1234. Data owner initiate the request by sending the random challenge to the group key manager, as a response Group key manager sends a secret share, data owner authenticates and forwards the verification share, data owner receives the verification shares and generates the key using Lagrange’s polynomial equation and forwards the points to data owners for regeneration the key. Example • Let us consider S=1234 (Secret key) 4. Points(Subset of P points) 1. Request ( Rch) Node users 2.Response (Sshare) Group Key manager 3.Vshare Rch ----Random challenge P={p1,p2…pn}-------points for construction of Lagrange’s equation Sshare---Secret share Binary Matrix: Vshare----verification share ISSN: 2231-5381 http://www.ijettjournal.org Page 307 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 6 – Oct 2014 The server or service provider performing association rule mining on cipher database for finding maximum frequent item sets. Thus the research presented a new algorithm of mining maximum frequent itemsets first based on the Binary Matrix of frequent length-1 itemsets. The main idea of the algorithm is to create a Binary Matrix with frequent length-1 itemsets as row headings and transaction records’ IDs as column headings. In the matrix, there are only two type of values, ‘1’ and ‘0’, which means that the transaction record contains or not the corresponding frequent length-1 itemset. Then it is necessary to calculate the number of value 1 in each column and the count of the columns with the same number of value 1 If intersection of (i,j)==1 Counter :=+1; Next If counter ==Ii .size() then add to item list Next Step3: Set minimum threshold value (t) Step4: for k=0;k<itemlist_size ;k++ If item_list[k].count >= t Then add to frequent item list Algorithm for Binary Matrix construction: Step1: While e (true ) // patterns available Step2: Read the individual pattern Pi separated by a special character. Step3: Construct an empty matrix with I rows and j columns Where ‘i’ is item and ‘j ‘ is transaction id Step4: Set intersection (i,j)=1 if corresponding g item ‘I’ available in particular transaction ID ‘J’ .else set to 0. Step5: Continue step 2 to 4 Now we can extract frequent patterns from the matrix, to extract frequent 1 itemset, initially count number of ones in vertical columns with respect to item, if it matches minimum threshold values then treat it as frequent item else ignore, continue same process for 2 itemset, check whether two items have ‘1’ in their corresponding vertical columns then increment, continue until all transactions verified. If total count greater than threshold value then treat it as frequent item. EXPERIMENTAL ANALYSIS For experimental analysis we had implemented in Java and considered some set of transactions for mining of frequent patterns Let us consider some sample transactions as follows Transaction Pattern T1 a,b,c,d T2 a,c,e T3 b,d T4 b,c,e T5 c,d,e T6 a,b,c,d,e Fig 3: Transaction Table ISSN: 2231-5381 Step1 : read item set {I1,I2…In) and Initialize counter:=0 ,final counter :=0 Step2 : for i:=0 ;i< n ;i++ For j:=0 j<trans _size ;j++ Next Step5: return frequent pattern list Binary matrix can be constructed based on the availability of the item with respect to transaction. Initially the first transaction contains “a,b,c,d” ,So in corresponding positions of items set to ‘1’ in first transaction else ‘0’ and consider second transaction “a,c,e”, set the corresponding item positions to ‘1’ in second transaction, continue the process until all transactions get completed. Frequent patterns generation Initially frequent one item set can be generated by counting number of individual items in all transactions like, Consider item ‘a’, now count number of ‘1’s opposite to item ‘a’ in all transactions, total count of a is 3 because a available in transaction1 2 and 6.if the count equal or greater than minimum threshold value or support count ( 2 in our example) it can be treated as frequent item. To find the frequent two item set or three item set or n item set, we can follow the same procedure until frequent items found. Consider two item set {a,b},now check the corresponding ones opposite to “a,b” (both should be set to “1”),then count would be “1”. In the above table transaction 1 and 6 contains “1” in both places of a and b, so count is 2. Now {a,b} is a frequent item ,because our minim support count value is 2,by the same process you can find the remaining frequent patterns. Frequent ‘n’ item set To find the frequent two item set or three item set or n item set, we can follow the same procedure until frequent items found. Consider two item set {a,b},now check the corresponding ones opposite to “a,b” (both should be set to “1”),then count would be “1”.In the above table transaction 1 and 6 contains “1” in both places of a and b, http://www.ijettjournal.org Page 308 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 6 – Oct 2014 so count is 2. Now {a,b} is a frequent item ,because our minim support count value is 2,by the same process you can find the remaining frequent patterns. Itemset a 1 1 b 1 c Transaction IDS 2 1 3 0 4 0 5 0 6 1 0 1 1 0 1 1 1 0 1 1 1 d 1 0 1 0 1 1 e 0 1 0 1 1 1 Association rules generation: Frequent item sets are not equal to association rules, one more step required to find association rules, in order to obtain association between entities or items i.e. A B, we need to have support(A B) and support(A) All the required information for confidence computation has already been recorded in itemset generation 2. D.W.L Cheung, V.T.Y. Ng, A.W.C. Fu, and Y. Fu. Efficient mining of association rules in distributed databases. IEEE Trans. Knowl. Data Eng., 8(6):911–922, 1996. 3. R. Agrawal and R. Srikant. Privacy-preserving data mining. In SIGMOD Conference, pages 439–450, 2000. 4. M. Bellare, R. Canetti, and H. Krawczyk. Keying hash functions for message authentication. In Crypto, pages 1– 15, 1996. [5] A. Ben-David, N. Nisan, and B. Pinkas.FairplayMP - A system for secure multi-party computation. In CCS, pages 257–266, 2008. For each frequent itemset X, For each proper nonempty subset A of X, Let B = X - A A B is an association rule if [6] J.C. Benaloh. Secret sharing homomorphisms: Keeping shares of a secret. In Crypto, pages 251–260, 1986. Confidence (A B) ≥ minconf, Support (A B) = support (AB) = support(X) Confidence (A B) = support (A B) / support (A) [7] J. Brickell and V. Shmatikov.Privacy-preserving graph algorithms inthe semi-honest model. In ASIACRYPT, pages 236–252, 2005. [8] D.W.L. Cheung, J. Han, V.T.Y. Ng, A.W.C. Fu, and Y. Fu. A fastdistributed algorithm for mining association rules. In PDIS, pages 31–42, 1996. IV. CONCLUSION We are concluding our research work with efficient frequent pattern mining approach in secure manner over horizontal databases ,a secure key can be generated through efficient and improved lagranges polynomial equation and cipher data can be received and decrypted by centralized server and finds the frequent patterns from its end in an accurate and efficient manner REFERENCES [9] D.W.L Cheung, V.T.Y. Ng, A.W.C. Fu, and Y. Fu. Efficient mining of association rules in distributed databases. IEEE Trans. Knowl. DataEng., 8(6):911–922, 1996. [10] T. ElGamal. A public key cryptosystem and a signature scheme based on discrete logarithms.IEEE Transactions on Information Theory, 31:469–472, 1985 BIOGRAPHIES 1. The Round Complexity of Secure Protocols by Donald Beaver*Harvard University s ISSN: 2231-5381 Chintada Ajay Kumar completed B.Tech degree in computer science and http://www.ijettjournal.org Page 309 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 6 – Oct 2014 engineering and He is pursuing M.Tech degree in the Department of Computer Science and Engineering, from Aditya Institute of Technology And Management (AITAM), Tekkali,A.P, and India. His Interested areas are Data Mining and Computer Networks. K. Prasada Rao completed his b.tech in 2004 and completed m.tech in 2009. He pusrsuing P.hd from Acharya Nagarjuna University. He working as Sr. Assisstant Professor in the Department of Computer Science and Engineering, from Aditya Institute of Technology And Management (AITAM), Tekkali,A.P, and India. His Interested areas are Data Mining and Computer Networks. ISSN: 2231-5381 http://www.ijettjournal.org Page 310