International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 4 - Nov 2013 A Privacy Preserving Association Rule Mining Over Unrealized Datasets Sunil kumar chintada,JayanthiRaoMadina 1 Final MTech student,Assistant professeor Department of Software Engineering , SISTAM college, Srikakulam, Andhra Pradesh 2 Dept of CSE , SISTAM college, Srikakulam, Andhra Pradesh Abstract: In this paper we are proposing an efficient an empirical model of privacy preserving association rule mining approach over data mining with Boolean matrix approach and security consideration we are using RSA algorithm for Secure data transmission. In this approach we are reducing the time complexity during finding the patterns by the Boolean matrix ,Communication can be done with cipher datasets instead of plain datasets . I. INTRODUCTION Association rule mining aims at the discovery of itemsets that co-occur frequently in transactional data. Centralized mining has been well studied in the past The problem has a large worst-case complexity, a fact tha motivates business to outsource the mining process to service providers, who have developed efficient, specialized solutions. The data owner, apart from the mining cost relief, has additional motives for outsourcing. First, it requires minimal computational resources, since the owner is only required to produce and to send the transactions to the miner.[1] This makes the outsourcing model also attractive to application sin which data owners produce transactions as streams and they have limited resources to maintain them. Second, assume that the owner has multiple production sources of transactions, e.g., consider a chain of supermarkets which generate transactions at different locations. All transactions can be sent to a single provider for mining association rules. The provider could compute association rules that are local to the individual stores or global rules for the whole organization. Therefore, the cost of transferring transactions among the sources and performing the global mining in a distributed manner is saved. Generally when people talk of privacy, they say “keep information about me from being available to others”. However, their real concern is that their information not be misused. The fear is that once information is released, it will be impossible to prevent misuse. Utilizing this distinction –ensuring that a data mining project won’t enable misuse of personal information – opens opportunities that “complete privacy” would prevent. To do this, we need technical and social solutions that ensure data will not be released. ISSN: 2231-5381 Another view is corporate privacy – the release of information about a collection of data rather than an individual data item. I may not be concerned about someone knowing my birthdate, mother’s maiden name, or social security number; but knowing all of them enables identity theft. This collected information problem scales to large, multi-individual collections as well. A technique that guarantees no individual data is revealed may still release information describing the collection as a whole. Such “corporate information” is generally the goal of data mining, but some results may still lead to concerns (often termed a secrecy, rather than privacy, issue.)[4] II. RELATED WORK There are several fields where related work is occurring. We first describe other work in privacypreserving data mining then go into detail on specific background work on which this paper builds. Previous work in privacy-preserving data mining has addressed two issues. In one, the aim is preserving customer privacy by distorting the data values [4]. The idea is that the distorted data does not reveal private information, and thus is “safe” to use for mining. The key result is that the distorted data, and information on the distribution of the random data used to distort the data, can be used to generate an approximation to the original data distribution, without revealing the original data values. The distribution is used to improve mining results over mining the distorted data directly, primarily through selection of split points to “bin” continuous data. Later refinement of this approach tightened the bound son what private information is disclosed, by showing that the ability to reconstruct the distribution can be used to tighten estimates of original values based on the distorted data [5]. More recently, the data distortion approach has been applied to Boolean association rules [6], [7]. Again, the idea is to modify data values such that reconstruction of the values for any individual transaction is difficult, but the rules learned on the distorted data are still valid. One interesting feature of this work is a flexible definition of privacy; e.g., the ability to correctly guess a value of ‘1’ from the distorted data can be considered a greater threat to privacy than correctly learning a ‘0’.The data distortion approach addresses a different problem from our work. The http://www.ijettjournal.org Page 207 International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 4 - Nov 2013 assumption with distortion is that the values must be kept private from whoever is doing the mining.[9] We instead assume that some parties are allowed to see some of the data, just that no one is allowed to see all the data. In return, we are able to get exact, rather than approximate, results. III. PROPOSED APPROACH We are proposing an efficient and empirical model of privacy preserving data mining technique for finding the frequent patterns with Boolean matrix approach. In the proposed system we are provide privacy preserving mining of association rules from outsourced transaction database. By using cryptography and association rules we are provide security of transaction database. The propose system mainly contains the two modules i.e. data owner and service provider. RSA 1. Transactional Dataset /6.Mined results 2. Cipher transactions 7. patterns 5. Mined Cipher patterns Data Owner Analyst 4. Mined Cipher patterns 3. Cipher transactions BM mining The main task of data owner is convert the plain database to cipher database. Here we are using RSA algorithm for convert plain database to cipher database. The RSA algorithm involves three steps: key generation, encryption and decryption. In this module data can be gathered for association rule mining, which contains different transactions with respect to item sets. Administrator sends the data to the analyst for generating the association rules between the data items. Analyst finds the interesting patterns between the items. Key generation: In this approach we are using an asymmetric approach for the cryptography, which involves public key and private key. Public key is used for encrypting the main text to convert into cipher, Private key used for decryption of cipher.RSA follows an approach for generation of keys as follows: prime numbers randomly instead of make it static and efficiently found with primarily test. 2. Calculate prod = m*n. Prod yields the result of modulus for both public and private keys and it indicates the key length 3. Compute φ(prod) = φ(m)φ(n) = (m − 1)(n − 1), where φ is Euler's quoficient function. 4. Select an integer a such that 1 <a< φ(prod) and gcd(a, φ(prod)) = 1; i.e. a and φ(prod) both are coprime. Where ‘a’ is the public key exponent. ‘a’ contains small Hamming and a short bitlength and weight results in more efficient encryption – most commonly2 16 + 1 = 1. Select any two distinct prime numbers m and n. For optimal security consideration select the ISSN: 2231-5381 65,537. However, much smaller values http://www.ijettjournal.org Page 208 International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 4 - Nov 2013 of e (such as 3) have been shown to be less [5] By construction, g⋅a ≡ 1 (mod φ(prod)),the above public key consists of the modulus prod and the public (or encryption) exponent a. The private key consists of the modulus n and the private (or decryption) exponent g, which must be kept secret. p, q, and φ(prod) must also be kept secret because they can be used to calculate d. secure in some settings. 5. Calculate g as g−1 ≡ a (mod φ(prod)), i.e., prod is the multiplicative inverse of e (modulo φ(prod)). The above specification clearly shows for g given g⋅a ≡ 1 (mod φ(prod)) g is kept as the private key exponent. Most considerably Euclidean algorithm for computation. Encryption : Frequent1Itemset 1 1 0 1 1 Item1 Item2 Item3 Item4 2 1 1 0 0 3 0 1 0 0 Transaction Records ID 4 5 1 0 0 1 1 1 1 0 When Alice transfers her public key (prod, a) to Bob and makes the private key secret. Bob then wishes to send message M to Alice and it can be computes as follows C=ma (mod prod) Where M into an integer m, such that 0 ≤ m < n This cipher information can be forwarded to Bob Decryption : 6 0 0 1 1 7 1 1 0 0 8 0 0 1 1 9 1 1 0 0 10 1 1 0 0 the number of value 1 in each column and the count of the columns with the same number of value 1 After the construction of Boolean matrix transactions can be acquired by the sequence of items and no need to construct the frequent one item set, then every possible pattern can be compared with the pattern for presence of items by the Boolean matrix indication values of ‘0’ or ‘1’,no need to perform the multiple database scans and no need to generate the number of candidate set generations Alice can recover m from c by using her private key exponent d via computing IV. CONCLUSION d M=c (mod prod) Given m, she can recover the original message M by reversing the padding scheme. The server or service provider performing association rule mining on cipher database for finding maximum frequent item sets. Thus the research presented a new algorithm of mining maximum frequent item sets first based on the Boolean matrix of frequent length-1 item sets. The main idea of the algorithm is to create a Boolean matrix with frequent length-1 item sets as row headings and transaction records’ IDs as column headings (TABLE I). In the matrix, there are only two type of values, ‘1’ and ‘0’, which means that the transaction record contains or not the corresponding frequent length-1 item set. Then it is necessary to calculate ISSN: 2231-5381 We are concluding our approach with integrated approach of Boolean matrix for association rule mining and RSA for secure data transmission of data over network, data can be transmitted over network securely and obtains the patterns in efficient manner REFERENCES [1] R. Buyya, C. S. Yeo, and S. Venugopal, “Marketoriented cloud computing:Vision, hype, and reality for delivering it services as computingutilities,” in Proc. IEEE Conf. High Performance Comput. Commun.,Sep. 2008, pp. 5–13. [2] W. K. Wong, D. W. Cheung, E. Hung, B. Kao, and N. Mamoulis,“Security in outsourcing of association rule http://www.ijettjournal.org Page 209 International Journal of Engineering Trends and Technology (IJETT) – Volume 5 Number 4 - Nov 2013 mining,” in Proc. Int. Conf.Very Large Data Bases, 2007, pp. 111–122. [3] L. Qiu, Y. Li, and X. Wu, “Protecting business intelligence and customerprivacy while outsourcing data mining tasks,” Knowledge Inform. Syst.,vol. 17, no. 1, pp. 99–120, 2008. [4] C. Clifton, M. Kantarcioglu, and J. Vaidya, “Defining privacy for data mining,” in Proc. Nat. Sci. Found. Workshop Next Generation DataMining, 2002, pp. 126–133. [5] I. Molloy, N. Li, and T. Li, “On the (in)security and (im)practicality ofoutsourcing precise association rule mining,” in Proc. IEEE Int. Conf.Data Mining, Dec. 2009, pp. 872–877. [6] F. Giannotti, L. V. Lakshmanan, A. Monreale, D. Pedreschi, andH. Wang, “Privacy-preserving data mining from outsourced databases,”in Proc. SPCC2010 Conjunction with CPDP, 2010, pp. 411–426. [7] R. Agrawal and R. Srikant, “Privacy-preserving data mining,” in Proc.ACM SIGMOD Int. Conf. Manage. Data, 2000, pp. 439–450. [8] S. J. Rizvi and J. R. Haritsa, “Maintaining data privacy in associationrule mining,” in Proc. Int. Conf. Very Large Data Bases, 2002, pp. 682–693. [9] M. Kantarcioglu and C. Clifton, “Privacy-preserving distributed miningof association rules on horizontally partitioned data,” IEEE Trans.Knowledge Data Eng., vol. 16, no. 9, pp. 1026–1037, Sep. 2004. [10] B. Gilburd, A. Schuster, and R. Wolff, “k-ttp: A new privacy modelfor large scale distributed environments,” in Proc. Int. Conf. Very LargeData Bases, 2005, pp. 563–568. BIBLIOGRAPHY Sunil kumar chintada is working as an software developer in E-centric solutions pvt ltd, vizag. He received B.Tech from Sarada Institute of Science, Technology and Management, Srikakulam. He is pursuing M.Tech in Sarada Institute of Science, Technology and Management, Srikakulam, Andhra Pradesh. Interesting areas are Data Structures, Java and Oracle database ISSN: 2231-5381 Jayanthi Rao Madina is working as a HOD in Sarada Institute of Science, Technology And Management, Srikakulam, Andhra Pradesh. He received his M.Tech (CSE) from Aditya Institute of Technology And Management, Tekkali. Andhra Pradesh. His research areas include Image Processing, Computer Networks, Data Mining, Distributed Systems. http://www.ijettjournal.org Page 210