International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014 Privacy Mining of Association Rules in Transversal Shared Databases TammineniKrushnam Raju1, Ramesh kumar behara2 1 1,2 Final M.Tech Student, 2Assistant Professor Dept of CSE, Sarada Institute of Science, Technology And Management(SISTAM), Srikakulam, Andhra Pradesh Abstract:Association rule learning is a popular and well researched method for discovering interesting relations between variables in large databases. It is intended to identify strong rules discovered in databases using different measures of interestingness. In this paper we are proposing horizontally distributed database and provides security of patterns,to provide security of transaction item sets, we are using AES cryptographic algorithm for encryption and decryption of item sets. In this we generate the shared key using Shamir secret share technique and for finding frequent item sets we are usingaprori algorithm, So by using these concepts we are providing security and reduce time complexity while finding the frequent items. I. INTRODUCTION In distributed networks or open environments nodes communicates with each other openly for transmission of data, there is a rapid research going on secure mining.Research work on privacy preserving techniques while mining of data either in classification, association rule mining or clustering. Randomization and perturbation approach available for privacy preserving process and it can be maintained in two ways, one is cryptographic approach here real data sets can be converted to unrealized datasets by encoding the real datasets and the second one imputation methods, here some fake values imputed between there real dataset and extracted while mining with some rules[1][2]. Clustering is a process of grouping similar type of objects based on distance (for numerical data) or similarity (for categorical data) between data objects. In distributed environment data holders or players maintains individual data sets and every node or vertex is connected with each other by an edge along with their quasi identifiers [3]. In association rule mining approaches initially frequent patterns can be generated based on the procedural steps of the algorithm based on support count until no ISSN: 2231-5381 frequent item set available and then finds the association rule between entities and which are supporting minimum support and confidence factors with respect to all frequent items sets. Most existing text clustering algorithms are designed for central execution. They require that clustering is performed on a dedicated node, and are not suitable for deployment over large scale distributed networks. Therefore, specialized algorithms for distributed and P2P clustering have been developed, such as [6], [7], [8], [9]. However, these approaches are either limited to a small number of nodes, or they focus on low dimensional data only. In distributed environment, nodes are represented by Privacy preserving Distributed clustering algorithm proposed by “S. Jha, L. Kruger, and P. McDaniel”, here data can be clustered by grouping the similar type of objects and secure transmission through protocols [4].Perturbation Method of string transformation proposed for privacy preserving clustering technique by using geometric techniques [10]. II. RELATED WORK In the traditional association rule mining,companies give theirdata to the analyst for finding the patterns or association rules exist between the items. Although it is advantageous to achieve sophisticated analysis on tremendous volumes of data in a cost-effective way, there exist several serious security issues of the datamining as- a-service paradigm. One of the main security issues is that the server has access to valuable data of the owner and may learn sensitive information from it. There is a loss of corporate privacy. Traditional distributing algorithm based on apriori, main disadvantage of this approach is multiple database scan and candidate set generations Association rule mining is one of the mainly essential and fine researched methods of data mining. It aims to extort exciting correlations, common patterns, associations or informal structures amongst sets of objects in the transaction databases or additional data repositories. Association rules are broadly used in a range of areas such as telecommunication networks, market and hazard managing, inventory control etc [1]. Different association mining methods and algorithms will be momentarily introduced and compared afterwards. Association rule mining is to locate out association rules that suit the http://www.ijettjournal.org Page 315 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014 predefined least amount support and confidence from a database [3]. The trouble is decomposed into two sub problems. One is to discover those item sets whose occurrences go above a predefined threshold in the database; those item sets are knownas frequent or large item sets. The second dilemma is to produce association rules from those large item sets with the constraints of negligible confidence [2]. The two most important approach for utilizing multiple Processors that have emerge; distributed memory within the each processor have a private memory; [6]and shared memory within the all processors right to use common memory. Shared memory structural design has many popular properties. Each processor has a straight and equal access to all memory in the scheme.[4] In distributed memory structural design each processor has its own local memory that can only be access directly by that processor. A Parallel purpose could be divided into number of subtasks and executed parallelism on disconnect processors in the system .though the presentation of a parallel application on a distributed system is mostly subject on the allocation of the tasks comprising the application onto the accessible processors in the scheme.[5] Categorization models, is the mostly applied method. The Apriori algorithm is the mainly representative algorithm for association rule mining. It consists of plenty of modified algorithms that focus on civilizing its efficiency and accuracy. cryptographic algorithm. In the proposed system we are using horizontally distribute data base for finding the frequent items from the different database. In this paper we are using aprori algorithm for finding frequent item sets , before finding frequent item sets, users send the item sets to third party, in the form cipher item sets, to provide cipher item sets we are using AES algorithm. The AES algorithm requires one secret key,it can be generated by Shamir shared key. So that by proving those two techniques the proposed system more secure and also reduce time for finding frequent item sets. Authentication of Users: In this module we are performing the authentication of users before sending the transaction data base. The authentication of users can done by using authenticate group key transfer protocol. The process of authentication as follows. The initiator sends a key generation request to KGC with a list of group members as M{m1,m2,…mn} KGC broadcasts the list of all participating members M{m1,m2,…mn} as aResponse Each group member M will generate random challenge Rcha and sent to KGC The KGC Random select group key and generate inter polynomial function f(x) with degree of t to pass through t+1 points (0,k) and (ai, bi+Rchg)for i=1,2,….t. III. PROPOSED SYSTEM The proposed system is used for finding frequent items and also provides the security of transaction item sets through Data Holder1 1.Authentication and key generation 3.Sent transactio n item sets 2.Read item sets and encrypt 3.Sent transactio n item sets Data Holder2 4.Find frequent item sets 1. Authentication and key generation 1.Authenticat ion and key generation 2.Read item sets and encrypt 2.Read item sets and encrypt AES Algorithm ISSN: 2231-5381 Data Holder N Centerlized server http://www.ijettjournal.org 3.Sent transactio n item sets APRIORI ALGORITHM Page 316 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014 The KGC also generate additional points and auth=h(h,k;m1,m2,…mn.; Rcha; . . .;Rcha(t);P1; . . . ; Pt) where h is hash function After generation authentication code the KGC will sent all group member. For each group member will calculate shared secret key and retrieve the additional from KGCand compute polynomial function. After generation of polynomial function the group member will recover the group key. Before recovering group key each and every group will check authentication process i.e. auth=h(k;m1,m2,…mn.; Rcha; . . .;Rchat;P1; . . . ; Pt ,) if both authentication codes are the Group member is authenticated. Shared secret key: In this module each user completion of authentication he/she will recovery the shared secret key from polynomial function. The generation shared secret key we will used Shamir secret key and Lagrange’s polynomial equation.After generation of secret key each user encrypts and decrypts the transaction items sets using this key and any cryptography technique. L1=x-x0/x1-x0*x-x2/x1-x2=x-2/4-2*x-5/4-5=-(1/2)x2-(7/2)x-5 L2=x-x0/x2 -x0 *x-x1/x2-x1=x-2/5-2*x-4/5-4=(1/3)x2-2x+8/3 2 f(x)=∑ j * lj(x) =1942((1/6)x -(3/2)x+10/3)+3402(2 2 (1/2)x -(7/2)x-)+4414((1/3)x -2x+8/3 ) f(x)=1234+166x+94x2 here we can consider the coefficient or a which means that Secret key S is1234. o as secret key Encryption and decryption of transaction item sets: In this module each user will collect all transaction item sets from the transaction database. After retrieving item sets the user will convert the plain transaction item sets into cipher using cryptography technique. In this paper we are using AES algorithm for encryption and decryption of transaction item sets. The user will encrypt the transaction item sets using shred secret key and AES algorithm. After encrypt the transaction item sets the sent those transaction items to analyst. Apriori Algorithm: The name of the algorithm is based on the fact that the algorithm uses prior knowledge of frequent itemset Key Generation Process: properties . It is an iterative approach where k-itemsets are Example: • Let us consider a secret key S=1234 • N is total number of points n=6 and ,k is minimum number of secret shares where consider k=3 and consider any two random numbersa=166 and b=94 then f(x)=1234+166x+94x2 • The points which are satisfying equation or secret shares are D 0= (1,1494),D1=(2,1942)D3=(3,2598)D4=(4,3402)D5 =(5,4414)D6=(6,5614) used to find out (k+1) itemsets.To improve the efficiency an important property Apriori property is used to reduce the search space. Algorithm: 1) L1 = {large 1-itemsets}; 2) for( k = 2; Lk-1 3) Ck = apriori-gen(Lk-1); // New candidates 4) foralltransactions t Centralized server forwards a specific point both x and y, because we use n-1 number of shares instead of n the points initiates from (1, f(1)) and not (0, f(0)). This is required because if one would have (0, f(0)) he would also know the secret key (S=f(0)) Ø; k++ ) do begin D do begin 5) Ct = subset(Ck, t); Candidates contained in t 6) forallcandidates c Ct do 7) c.count++; 8) end Re-construction of secret key: • In order to reconstruction of secret key,any three points are enough • Let us consider (x0,y0)=(2,1924),(x1,y1)=(4,3402),(x2,y2)=(5,4414) Using lagrangeous polynomials L0=x-x1/x0-x1*x-x2/x0-x2=x-4/2-4*x-5/2-5=(1/6)x2 (3/2)x+10/3 ISSN: 2231-5381 9) Lk = {c Ck | c.count minsup} 10) end 11) Answer = k Lk; In this module the analyst will retrieve the all cipher transaction item sets from users. After retrieving of all transaction item sets from the users the analyst will decrypt those transaction items using AES algorithm. After completion decryption process the analyst will find the http://www.ijettjournal.org Page 317 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014 frequent item sets using pattern mining technique. In this paper we are using approri algorithm for finding frequent item sets. [10] T. ElGamal. A public key cryptosystem and a signature scheme based ondiscrete logarithms.IEEE Transactions on Information Theory, 31:469–472, 1985. IV. CONCLUSION BIOGRAPHIES In this paper we proposed an algorithm for finding frequent item sets from the transaction databases. For finding frequent item sets we are using so many pattern mining algorithms. In this paper we are using approri algorithm for finding frequent item sets. Before finding frequent item sets each data holders will perform verification of authentication users or not. After completion authentication each user will generate secret shared key for the encryption and decryption of transaction item sets. The generation of secret key we are using Shamir shared key and lagranges polynomial equation. The encryption and decryption of transaction item sets we are using AES algorithm. By proposing those concepts we provide more security and finding most effective frequent pattern. REFERENCES TammineniKrushnamRaju is a Student in M.Tech(CSE) in Sarada Institute of science Technology And Management,Srikakulam. He Received his B.Tech(IT) from Sri Venkateswara College of Engineering & Technology ,at Etcherla in Srikakulam. His interesting areas are Data warehousing,java and oracle database. Ramesh kumarbehara is working as Asst.professor in Sarada Institute of Science, Technology And Management,Srikakulam, Andhra Pradesh. He received his M.Tech (CSE) from Sarada Institute of Science, Technology And Management,Srikakulam, Andhra Pradesh. JNTU Kakinada Andhra Pradesh. His research areas include Network Security. [1] R. Agrawal, C.Faloutsos, and A. Swami.Efficient similarity search in sequence databases.In Proc. of the Fourth International Conferenceon Foundations of Data Organization and Algo-rithms, Chicago, October 1993. [2] R. Agrawal, S. Ghosh, T. Imielinski, B. Iyer, andA. Swami.An interval classifier for databasemining applications.In Proc. of the VLDBConference, pages 560{573, Vancouver, BritishColumbia, Canada, 1992. [3] R. Agrawal, T. Imielinski, and A. Swami.Database mining: A performance perspective.IEEE Transactions on Knowledge and Data Engineering, 5(6):914{925, December 1993. SpecialIssue on Learning and Discovery in Knowledge-Based Databases. [4] M. Bellare, R. Canetti, and H. Krawczyk. Keying hash functions formessage authentication. In Crypto, pages 1– 15, 1996. [5] A. Ben-David, N. Nisan, and B. Pinkas.FairplayMP - A system forsecure multi-party computation. In CCS, pages 257–266, 2008. [6] J.C. Benaloh. Secret sharing homomorphisms: Keeping shares of a secretsecret. In Crypto, pages 251–260, 1986. [7] J. Brickell and V. Shmatikov.Privacy-preserving graph algorithms inthe semi-honest model. In ASIACRYPT, pages 236–252, 2005. [8] D.W.L. Cheung, J. Han, V.T.Y. Ng, A.W.C. Fu, and Y. Fu. A fastdistributed algorithm for mining association rules. In PDIS, pages 31–42, 1996. [9] D.W.L Cheung, V.T.Y. Ng, A.W.C. Fu, and Y. Fu. Efficient miningof association rules in distributed databases. IEEE Trans. Knowl. DataEng., 8(6):911–922, 1996. ISSN: 2231-5381 http://www.ijettjournal.org Page 318