An Efficient and Secure Horizontal Partitioning Technique between Centralized Data holders

International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 8 – Oct 2014 An Efficient and Secure Horizontal Partitioning Technique between Centralized Data holders T.Padmavathy 1,P.Suresh Babu 2 Final M.Tech Student1,Associate professor2 1,2 Department of computer science and engineering at Kaushik College of engineering, JNTUK University. Abstract: Secure mining of association rule mining over horizontal databases is always an interesting research issue in the field of knowledge and data engineering. In horizontal partitioning or data bases, databases are integrated from various data holders or players for applying association rule mining over integrated database. In this paper we are proposing a privacy preserving mining approach with Improved Shamir secret key generation for secure key generation and Bit Wise Matrix approach. Index Terms: Association Rule mining , Bit Wise Matrix , LaGrange’s polynomial. I.INTRODUCTION Association rule mining is key method of the major needed process and accurate researched methods of data mining. Its main focus to find common patterns and associations that is unformatted structures over large sets of objects in the transaction databases or additional data repositories. Association rules are very widely used in a range of areas those are communication networks, marketing and hazard managing and inventory control etc.[1] There are many different association mining methods and algorithms are introduced and researched on compared further. Association rule mining algorithms is to find association rules that are apt to the pre-declared less support and confidence factors from a database [3]. The main disadvantage is divided into two types. First one is to find those item sets occurrences go redefined threshold in the database. Such types of item sets are known as very frequent or large item sets. The next one is ambiguity to resultant association rules from large item sets with the features of very less confidence [2]. There two main approaches for utilizing multiple processors and evolved and the decentralized storage within the each processor have a secret storage; [6] and distributed shared storage within all processors that have right to use common storage. Static storage design structure has many properties. Every processor has equal access to all storage in the methods.[4] In distributed storage structural implementation of each processor has its own local storage that can only be access directly by that processor. ISSN: 2231-5381 A Parallel purpose could be divided into number of subtasks and executed parallelism on disconnect processors in the system .though the presentation of a parallel application on a distributed system is mostly subject on the allocation of the tasks comprising the application onto the accessible processors in the scheme.[5] Association rule mining model amongst data mining numerous models, including Association rules, clustering and categorization models, is the mostly applied method. The Apriori algorithm is the mainly representative algorithm for association rule mining. It includes many of manipulated methods that are mainly concentrated concept on its efficiency and accuracy. Association Rule Learning is a basic method used to find the associations over many variables. It is used by retailers, and many of them with a many transactional database.[7]Association rules are conditional statements that help to find associations rule between unrelated data in a relational database or any information storehouse. Consider an example of an association rule would be "If a customer buys a dozen breads and he is 80% likely to also purchase butter/jam." The association rules are analysing the data for frequent are conditional patterns and using the situational support and confidence is to identify the most important associations[9]. The support is a notation of how frequently the items emerge in the database. In data mining, association rules are helpful for analysing and predicting customer nature [8][9]. They play an significant role in shopping basket data analysis, item clustering, catalog design. Programmers use association rules to construct programs of machine learning [5]. Machine learning is a sort of artificial intelligence that seeks to assemble programs with the capability to develop into more competent without being explicitly programmed. Algorithms for mining association rules from relational data have been developed. numerous query languages have been planned, to assist association rule mining such as the issue of mining XML data has acknowledged very little concentration, as the data mining society has paying attention on the progress of techniques for extracting common arrangement from varied XML data. II. RELATED WORK An association rule implies definite association interaction among a set of objects in a database. An http://www.ijettjournal.org Page 358 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 8 – Oct 2014 association rule is an expression of the form A,B, where A and B are items[10]. The observant logic of such a rule is that transactions of the database which contain A be inclined to contain B Association rule is one of the data mining technique used to huge data out concealed information starting datasets that can be use by an organization decision maker to get better on the complete earnings. Apriori is an algorithm for frequent item set mining and association rule learning over transactional databases. It proceed by identifying the recurring individual items in the database as well as extending them to bigger item sets as long as those item sets come out adequately often in the database. The frequent item sets examined by Apriori and can be used to create a conclusion association rules which depict interest to general trends in the database. Apriori is designed to work on databases containing transactions (for eg. collections of objects bought by customers,)[11][12]. Other algorithms are planned for ruling association rules in data having no transactions or having no timestamps. Each transaction is seen as a set of items. Data Holder1 Data Holder2 Apriori uses a "bottom up" method, where numerous subsets are extensive one item at an instance and groups of candidates are experienced alongside the data. Pseudocode below demonstrates the process of frequent itemset generation of the Apriori algorithm. Distributed/Parallel Algorithms Databases may accumulate an enormous quantity of data to be mined. Mining association rules into such databases might involve significant processing power [13]. III. PROPOSED WORK In this approach we are proposing a privacy preserving mining approach with Bit Wise Matrix, it reduces problem of multiple database scans and candidate set generations by constructing the Bit Wise Matrix. Data can be integrated from multiple data holders or players, for secure transmission or distributed partitioning we are implementing an improved lagranges’s polynomial approach for secure key generation for encryption of data from data holders with triple DES algorithm. Data Holder1 Cipher Pattern Cipher Pattern Cipher Pattern Encoder/Decoder Centralized Server Binary Matrix . Fig1 : Horizontal partitioning Architecture Every individual data holder or player maintains their transactions or patterns, in horizontal partitioning, every data holder forwards their patterns to centralized server after encryption of patterns which are at individual end,At centralized server received pattern can be decrypted with decoder and forwarded to Bit Wise Matrix to extract frequent pattern from the received patterns. For experimental purpose we establish connection between the nodes and Central location (Key generation center) ISSN: 2231-5381 through network or socket programming, Key can be generated by using improved LaGrange’s polynomial equation and key can be distributed to user Every individual node participates in key generation process and retrieves key by reconstruction. Encrypts the datasets by using triple DES and key which is generated by the LaGrange’s polynomial equation. All encrypted datasets can be forwarded to centralized location and http://www.ijettjournal.org Page 359 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 8 – Oct 2014 decrypted with same symmetric key and forwards to mining process. Group key manger receives the registration request from all the users, and generates a verification share and forwards to all the requested users for authentication purpose, generates the key using key generation process and forwards the points to extraction of the key from the equation generated by the verification points. For key generation protocol, it receives the verification shares and key as input to construct the lagranges polynomial equation f(x), which is passed through (0, key) and verification points ,after that group key manager forwards the points to data owners. Data points and check the authentication code which is sent by the group key manager. When a new user tries to download the file, new user need not to connect other data owner to decryption of the file, user connects to the group key manager he will update the group key and decrypts the files with previous key again encrypt with new key and updates the new key to all the data owners. Data owner initiate the request by sending the random challenge to the group key manager, as a response Group key manager sends a secret share, data owner authenticates and forwards the verification share, data owner receives the verification shares and generates the key using Lagrange’s polynomial equation and forwards 4. Points(Subset of P points) 1. Request ( Rch) Group Key manager 2.Response (Sshare) Node users 3.Vshare owners again reconstruct the key from the verification Fig2 : Authentication and Key Generation Rch ----Random challenge Sshare---Secret share Vshare----verification share P={p1,p2…pn}-------points for construction of Lagrange’s equation Bit Wise Matrix: Itemset 1 1 0 1 1 Item1 Item2 Item3 Item4 2 1 1 0 0 3 0 1 0 0 the points to data owners for regeneration the key The server or service provider performing association rule mining on cipher database for finding maximum frequent item sets. Thus the research presented a new algorithm of mining maximum frequent itemsets first based on the Bit Wise Matrix of frequent length-1 itemsets. The main idea of the algorithm isto create a Bit Wise Matrix with frequent length-1 itemsets as row headings and transaction records’ IDs as column headingsIn the matrix, there are only two type of values, ‘1’ and ‘0’, which means that the transaction record contains or not the corresponding frequent length-1 itemset. Then it is necessary to calculate the number of value 1 in each column and the count of the columns with the same number of value 1 Transaction Records ID 4 5 6 1 0 0 0 1 0 1 1 1 1 0 1 7 1 1 0 0 8 0 0 1 1 9 1 1 0 0 10 1 1 0 0 Fig 3 : Bit Wise Matrix ISSN: 2231-5381 http://www.ijettjournal.org Page 360 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 8 – Oct 2014 Now we can extract frequent patterns from the matrix, to extract frequent 1 itemset, initially count number of ones in vertical columns with respect to item, ifit matches minimum threshold values then treat it as frequent item else ignore, continue same process for 2 itemset,check whether two items have ‘1’ in their corresponding vertical columns then increment, continue until all transactions verified. If total count greater than threshold value then treat it as frequent item [9] International Journal of Computer Science and Information Technology, Volume 3, Number 3, April 2010 [10] M.J. Zaki et al., Parallel Data Mining for Association Rules on Shared- Storage. ,tech. report TR 618, Computer Science Dept., Univ. of Rochester, 1997. [11] D.W. Cheung et al."Efficient Mining of Association Rules in Distributed Databases,"IEEE Trans. Knowledge and Data Eng., vol. 8, no. 6, 1996,pp.911-922; IV. CONCLUSION We are concluding our research work with efficient frequent pattern mining approach in secure manner over horizontal databases ,a secure key can be generated through efficient and improved lagranges polynomial equation and cipher data can be received and decrypted by centralized server and finds the frequent patterns from its end in an accurate and efficient manner [12] A. S and R. Wolff , "Communication-Efficient Distributed Mining of Association Rules," Proc. ACM SIGMOD Int'l Conf. Management of Data, ACM Press, 2001,pp. 47-48. REFERENCES [1] Dr .Sujni Paul, Associate Professor, Department of Computer Science, Karunya University, Coimbatore 641 , Tamil Nadu, India [14] H.Kargupta, I.Hamzaoglu, and Brian Stafford. Scalable, distributed data mining-agent architecture. In Heckerman et al. [8], page 21. [2] R. Agrawal and R. Srikant , "Fast Algorithms for Mining Association Rules in Large Distributed Database," Conf. Very Large Databases (VLDB 94), [13] T.K Imielinski and A.M Virmani. MSQL: A query language for database mining. 1999. [15] R. Meo, G. Psaila, and S. Ceri. A new SQL like operator for mining association rules. In The VLDB Journal, pages 156–161. BIOGRAPHIES [3] R.J Agrawal and J.C. Shafer , "Parallel Mining of association Rules," Distributed Systems Online March 2005 T.Padmavathy completed B.Tech in Computer Science And Engineering from Godavari Institute of Engineering and Technology. She pursuing M.Tech in Department of computer science and engineering at Kaushik College of engineering, [4] D.W.K Cheung ,"Efficient Mining of Association Rules in Distributed Databases, "IEEE Trans. Knowledge and Data Eng., vol. 9, [5] D.W.K Cheung,"A Fast Dis Distributed Information Systems, IEEE CS Press, 1997, JNTUK University [6] Albert Y.N Zomaya, Tarek El-Ghazawi, Ophir Frieder, "Distributed Computing for Data Mining", IEEE Concurrency, 1999. International Journal of Computer Science and Information Technology, Volume 3, Number 3, April P.Suresh Babu completed his B.Tech and M.E. in Computer Science and Engineering. He is currently working as Associate professor of Department of computer science and engineering at Kaushik College of engineering, JNTUK University. He is having industrial experience of 4 years and teaching experience of 15 years. His areas of interest include Artificial intelligence, Neural Networks, Cryptography & Network security, Compiler Design and Advanced Data Structures. [7] A. Prodromidis, P. Chan, and S. Stolfo. Chapter Meta learning in Parallel distributed data mining systems: Issues and approaches. AAAI/MIT Press, 2001. [8] Morgan Kaufmann, 1996, pp. 432 Proc. ACM SIGMOD 1-12. 2010- 99 Proc. 20th Int'l 16 IEEE tributed Proc. Parallel and 432-444. ISSN: 2231-5381 http://www.ijettjournal.org Page 361

An Efficient and Secure Horizontal Partitioning Technique between Centralized Data holders

Related documents

Products

Support

An Efficient and Secure Horizontal Partitioning Technique between Centralized Data holders

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib