International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 4 - Apr 2014 A Novel Model of Secure Mining with Decision Matrix Technique Prasanthi Kolluri *, Satyanarayana Mummana# Final M.Tech Student , Assistant Professor Department of CSE, Avanthi Institute of Engineering & Technology, Visakhapatnam. Andhra Pradesh Abstract: Security in data mining is an important research issue now days. In this paper we are proposing an efficient a novel model of privacy preserving association rule mining approach over data mining with Decision matrix approach and security consideration we are using RSA algorithm for Secure data transmission. In this approach we are reducing the time complexity during finding the patterns by the Decision matrix ,Communication can be done with cipher datasets instead of plain datasets . I. INTRODUCTION Pattern mining is the process of finding the sequence of events from the large set of patterns, we have various types of pattern mining algorithms to finding the association rule mining between the events or data items based on the pattern mining, algorithms like apriori is the basic algorithm for finding the frequent patterns by generating the candidate set generations for frequent itemsets, the main drawback with the apriori algorithm is multiple database scans and candidate set generations. FP growth algorithm is one of the association rule mining algorithm, Intially construct the fp tree and find the frequent patterns through the tree by generating the suffix tress fro individual data item or event from bottom to top, but the main drawback with fp tree approach is memory and complex when data items or events are more. This paper deals with the out sourcing of data, it means data owner places the data over cloud to mine the data by the analyst in secure manner without losing the data integrity Association rule mining aims at the discovery of itemsets that co-occur frequently in transactional data. Centralized mining has been well studied in the past The problem has a large worst-case complexity, a fact that motivates business to outsource the mining process to service providers, who have developed efficient, specialized solutions. The data owner, apart from the mining cost relief ,has additional motives for outsourcing. First, it requires minimal computational resources, since ISSN: 2231-5381 the owner is only required to produce and to send the transactions to the miner.[1] global rules for the whole organization. Therefore, the cost of transferring transactions among the sources and performing the global mining in a distributed manner is saved. Another view is corporate privacy – the release of information about a collection of data rather than an individual data item. I may not be concerned about someone knowing my birthdate, mother’s maiden name, or social security number; but knowing all of them enables identity theft. This collected information problem scales to large, multi-individual collections as well. A technique that guarantees no individual data is revealed may still release information describing the collection as a whole. Such “corporate information ”is generally the goal of data mining, but some results may still lead to concerns (often termed a secrecy, rather than privacy, issue.)[4] II. RELATED WORK In recent days of technology, privacy is the primary concern while mining of data over the networks, there are mainly two types of approaches are available for the privacy preserving, those are randomization and perturbation and second one is cryptographic approach In Initial approach fake values can be injected into the real dataset and converts to unrealized dataset without disturbing the integrity of real dataset. In second approach ,it uses either symmetric or asymmetric approach for the encryption of the real datasets, at the receiver end real data sets can decrypted and mines the data and forwards the mined data by converting to cipher mining results, data owner can decrypt the results. In this paper we are proposing the cryptographic approach for secure transmission of patterns to the analyst, Initially Data owner prepares the real dataset and encrypts the real dataset with cryptographic algorithm and forwards to analyst. Analyst decrypts the data set and apply the decision making approach and generates the patterns in optimal manner and convert them to cipher and forwards the cipher results to data owner, Analyst need not to know the semantics of the data items and data owner can decrypt the cipher mined results. http://www.ijettjournal.org Page 209 International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 4 - Apr 2014 Decision making system initially reads the patterns and generates the decision matrix for individual row wise, if an event or item present, it represents ‘1’ else it represents ‘0’.After the construction of the complete matrix frequent patterns can be extracted from the decision system efficiently. There are several fields where related work is occurring. We first describe other work in privacypreserving data mining then go into detail on specific background work on which this paper builds. Previous work in privacy-preserving data mining has addressed two issues. In one, the aim is preserving customer privacy by distorting the data values [4]. The idea is that the distorted data does not reveal private information, and thus is “safe” to use for mining. The key result is that the distorted data, and information on the distribution of the random data used to distort the data, can be used to generate an approximation to the original data distribution, without revealing the original data values. The distribution is used to improve mining results over mining the distorted data directly, primarily through selection of split points to “bin” 1. Transactional Dataset /6.Mined results RSA continuous data. Later refinement of this approach tightened the bounds on what private information is disclosed, by showing that the ability to reconstruct the distribution can be used to tightenestimates of original values based on the distorted data [5]. We instead assume that some parties are allowed to see some of the data, just that no one is allowed to see all the data .In return, we are able to get exact, rather than approximate, results. III. PROPOSED APPROACH We are proposing an efficient and empirical model of privacy preserving data mining technique for finding the frequent patterns with Decision matrix approach. In the proposed system we are provide privacy preserving mining of association rules from outsourced transaction database. By using cryptography and association rules we are provide security of transaction database. The propose system mainly contains the two modules i.e. data owner and service provider. 2. Cipher transactions 7. patterns Data Owner Analyst 5. Mined Cipher patterns 3. Cipher Transactions 4. Mined Cipher patterns DM mining ISSN: 2231-5381 http://www.ijettjournal.org Page 210 International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 4 - Apr 2014 The main task of data owner is convert the plain database to cipher database. Here we are using RSA algorithm for convert plain database to cipher database. The RSA algorithm involves three steps: key generation, encryption and decryption.In this module data can be gathered for association rule mining, which contains different transactions with respect to item sets. Administrator sends the data to the analyst for generating the association rules between the data items. Analyst finds the interesting patterns between the items. 4. Select an integer a such that 1 <a< φ(prod) and gcd(a, φ(prod)) = 1; i.e. a and φ(prod) both are coprime. Where ‘a’ is the public key exponent. ‘a’ contains small Hamming and a short bitlength and weight results in more efficient encryption – most commonly2 16 + 1 = 65,537. However, much smaller values Key generation: of e (such as 3) have been shown to be less In this approach we are using an asymmetric approach for the cryptography, which involves public key and private key.Public key is used for encrypting the main text to convert into cipher,Private key used for decryption of cipher.RSA follows an approach for generation of keys as follows: secure in some settings.[5] 5. Calculate g as g−1 ≡ a (mod φ(prod)), i.e., prod is the multiplicative inverse of e (modulo φ(prod)). The above specification clearly shows for g given g⋅a ≡ 1 (mod φ(prod)) 1. Select any two distinct prime numbers m and n.For optimal security consideration select the prime numbers randomly instead of make it static and efficiently found with primalitytest. 2. Calculate prod = m*n. Prod yields the result of modulus for both public and private keys and it indicates the key length 3. Compute φ(prod) = φ(m)φ(n) = (m − 1)(n − 1), where φ is Euler's quoficient function. Encryption : Frequent1Itemset Item1 Item2 Item3 Item4 1 1 0 1 1 2 1 1 0 0 3 0 1 0 0 Transaction Records ID 4 5 1 0 0 1 1 1 1 0 Most considerably Euclidean algorithm for computation. g is kept as the private key exponent. By construction, g⋅a ≡ 1 (mod φ(prod)),the above public key consists of the modulus prod and the public (or encryption) exponent a. The private key consists of the modulus n and the private (or decryption) exponent g, which must be kept secret. p, q, and φ(prod) must also be kept secret because they can be used to calculate d. 6 0 0 1 1 7 1 1 0 0 8 0 0 1 1 9 1 1 0 0 10 1 1 0 0 When Alice transfers her public key (prod, a) to Bob and makes the private key secret. Bob then wishes to send message M to Alice and it can be computes as follows C=ma (mod prod) Where M into an integer m, such that 0 ≤ m < n This cipher information can be forwarded to Bob Decryption : ISSN: 2231-5381 http://www.ijettjournal.org Page 211 International Journal of Engineering Trends and Technology (IJETT) – Volume 10 Number 4 - Apr 2014 Alice can recover m from c by using her private key exponent d via computing M=cd (mod prod) Given m, she can recover the original message M by reversing the padding scheme.The server or service provider performing association rule mining on cipher database for finding maximum frequent item sets. Thus the research presented a new algorithm of mining maximum frequent itemsets first based on the Decision matrix of frequent length-1 itemsets. The main idea of the algorithm isto create a Decision matrix with frequent length-1 itemsets as row headings and transaction records’ IDs as column headings (TABLE I). In the matrix, there are only two type of values, ‘1’ and ‘0’, which means that the transaction record contains or not the corresponding frequent length-1 itemset. Then it is necessary to calculate the number of value 1 in each column and the count of the columns with the same number of value 1 After the construction of Decision matrix transactions can be acquired by the sequence of items and no need to construct the frequent one item set,then every possible pattern can be compared with the pattern for presence of items by the Decision matrix indication values of ‘0’ or ‘1’,no need to perform the multiple database scans and no need to generate the number of candidate set generations [7] R. Agrawal and R. Srikant, “Privacy-preserving data mining,” in Proc.ACM SIGMOD Int. Conf. Manage. Data, 2000, pp. 439–450. [8] S. J. Rizvi and J. R. Haritsa, “Maintaining data privacy in associationrule mining,” in Proc. Int. Conf. Very Large Data Bases, 2002, pp. 682–693. [9] M. Kantarcioglu and C. Clifton, “Privacy-preserving distributed miningof association rules on horizontally partitioned data,” IEEE Trans.Knowledge Data Eng., vol. 16, no. 9, pp. 1026–1037, Sep. 2004. [10] B. Gilburd, A. Schuster, and R. Wolff, “k-ttp: A new privacy modelfor large scale distributed environments,” in Proc. Int. Conf. Very LargeData Bases, 2005, pp. 563–568. Bibliography: Satyanarayana Mummana is working as an Asst. Professor in Avanthi Institute of Engineering & Technology, Visakhapatnam, Andhra Pradesh. He has received his Masters degree (MCA) from Gandhi Institute of Technology and Management (GITAM), Visakhapatnam and M.Tech (CSE) from Avanthi Institute of Engineering & Technology, Visakhapatnam. Andhra Pradesh. His research areas include Image Processing, Computer Networks, Data Mining, Distributed Systems, Cloud Computing IV. CONCLUSION I am prasanthi kolluri ,I had completed B.tech in Lakireddy Balireddy college of engineering(LBRCE), Mylavaram, krishna dst, and currently pursuing Mtech in Avanthi institute of Engg & technology, Narsipatnam,,Visakhapatnam We are concluding our approach with intergared approach of Decision matrix for association rule mining and RSA for secure data transmission of data over network,data can be transmitted over network secrely and obtains the patterns in efficient manner REFERENCES [1] R. Buyya, C. S. Yeo, and S. Venugopal, “Market-oriented cloud computing:Vision, hype, and reality for delivering it services as computingutilities,” in Proc. IEEE Conf. High Performance Comput. Commun.,Sep. 2008, pp. 5–13. [2] W. K. Wong, D. W. Cheung, E. Hung, B. Kao, and N. Mamoulis,“Security in outsourcing of association rule mining,” in Proc. Int. Conf.Very Large Data Bases, 2007, pp. 111–122. [3] L. Qiu, Y. Li, and X. Wu, “Protecting business intelligence and customerprivacy while outsourcing data mining tasks,” Knowledge Inform. Syst.,vol. 17, no. 1, pp. 99–120, 2008. [4] C. Clifton, M. Kantarcioglu, and J. Vaidya, “Defining privacy for data mining,” in Proc. Nat. Sci. Found. Workshop Next Generation DataMining, 2002, pp. 126–133. [5] I. Molloy, N. Li, and T. Li, “On the (in)security and (im)practicality ofoutsourcing precise association rule mining,” in Proc. IEEE Int. Conf.Data Mining, Dec. 2009, pp. 872–877. [6] F. Giannotti, L. V. Lakshmanan, A. Monreale, D. Pedreschi, andH. Wang, “Privacy-preserving data mining from outsourced databases,”in Proc. SPCC2010 Conjunction with CPDP, 2010, pp. 411–426. ISSN: 2231-5381 http://www.ijettjournal.org Page 212