Secrecy Conserving of Association Rule Mining from Unrealized Transaction Data Base

International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014 Secrecy Conserving of Association Rule Mining from Unrealized Transaction Data Base Using Bipartite Matrix Gangu. Dharma Raju1, Jayanthi Rao Madina2 1 1,2 Final M.Tech Student, 2Head of the Department Dept of CSE, Sarada Institute of Science, Technology And Management(SISTAM), Srikakulam, Andhra Pradesh Abstract: Now a days cloud computing is an inspire development of storing data in cloud, but in the recent interest for data mining as service for data owner. In this paper we are proposing an empirical model of privacy preserving association rule mining technique with an efficient Cryptographic algorithm and bipartite matrix. Data owner converts plain transactions to cipher and forwards to service provider, service provider decrypts the transactions and converts to plain and applies bipartite matrix for generation of association rules. We are using SDES algorithm for privacy of item set and another technique for finding frequent item sets we are using bipartite matrix, by providing those techniques we can provide more privacy and efficiency of mining over frequent items sets. I. INTRODUCTION Association rule mining guarantees identifies the itemsets that are frequently in data. In the centralized data mining is studied. The limitation is a huge complexity that provides services and developed as efficient and particular solutions. Outsourcing needs more computing services so the data owner needs data miner then sends the data to miner. This idea makes the models efficient and attractive and the data owners generate more results and it is very difficult to maintain them. [1, 2] Association rule mining is very hard task on large amount of data and the outsourcing is also hard job. In third party servers the mined data is on unsecured situation because there is increase in usage of the mined data by other data owners and more number of end users. For flexible usage and utility the data to be mined and store in centralized location because of flexible accessing. For all these issues the association rule mining is the best solution with encryption techniques. So the researchers focus on data mining and the cryptographic techniques to outsource the data in cloud server.[4] The service providers calculate association rules with their own and store in globalized server as global rules. The complexity of sending the data and performance of the central data mining is stored. It other way the service provider itself becomes the point of malicious attack. It service provider is not trusted one the data accessing is ISSN: 2231-5381 limited and the data is not associated with the private information[6,7]. Both the initial data and the resulted rules from the service provider have to protect and maintain security for the outsourced data in data mining. There are two types of methods that can provide security for complex information. The first method is data is to apply encoding function that converts the original data to a new format. The next method is to apply data perturbation that modifies the original data randomly. The perturbation method is less attractive since it can only provide approximate results; on the other hand, the use of encryption allows the exact rules to be recovered. In this paper we propose and evaluate appropriate encryption techniques for outsourcing of association rules mining. In order for an encryption to be appropriate for the problem, the following conditions should be satisfied. First, there should be a correct, complete, and deterministic decryption method that transforms the association rules found in the encrypted database to the true association rules in the original database. Second, the encryption and decryption processes must be reasonably fast; otherwise, owners may choose to apply association rules mining locally (if cost is the only concern). Third, the encryption method must be secure enough to prevent the service provider (or an attacker) from recovering the original transactions and the true association rules among the actual items by processing the encrypted data. Secure multiparty communication enables this without the trusted third party. There may be considerable communication between the parties to get the final result, but the parties don’t learn anything from this communication. The computation is secure if given just one party’s input and output from those runs, we can simulate what would be seen by the party. In this case, to simulate means that the distribution of what is actually seen and the distribution of the simulated view over many runs are computationally indistinguishable. We may not be able to exactly simulate every run, but over time we cannot tell the simulation from the real runs. II. RELATED WORK Association rule mining[5] is a famous and large researched process for finding attracting combination of http://www.ijettjournal.org Page 344 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014 relations between parameters in huge databases. It is intentionally to find strong rules found in databases using different computations of interests. According to the topics of perfect rules it is introduced association rules for finding rules between products in large scale data recorded by point-of-sale (POS) systems in supermarkets. Consider an example rule {onions, potatoes}=> {burger} found in the sales information of market would notify that if a buyer buys onions and potatoes once he or she is also buy hamburger meat. That type of information can be utilized as the basis for predictions about marketing situations such as promotional pricing or product placements. [4] contain the item set. In the example database, the item set {milk, bread, butter} has a support of 1/5=0.2. Since it occurs in 0.2 of all transactions (1 out of 5 transactions).  The confidence of a rule is defined conf(XY)=sup(X U Y). Consider an example {bread,butter}milk has confidence of 1.0. In the database, which means that for 100% of the transactions containing butter and bread the rule is correct (100% of the times a customer buys butter and bread, milk is bought as well). Be careful when reading the expression: here supp(X∪Y) means "support for occurrences of transactions where X and Y both appear", not "support for occurrences of transactions where either X or Y appears", the latter interpretation arising because set union is equivalent to logical disjunction. The argument of sup() is a set of preconditions, and thus becomes more restrictive as it grows (instead of more inclusive).  Confidence can be interpreted as an estimate of the probability P(Y|X), the probability of finding the RHS of the rule in transactions under the condition that these transactions also contain the LHS.[4,9] The above example from market basket analysis association rules are generated now in more application areas includes Web usage mining and intrusion detectionin frequentgeneration and bioinformatics. Other than that with sequential association rule mining does not include the items within transactions.[8] Consider I={i1,i2,….in} be a group of n binary objects called as items. Let D={t1,t2,t3…..tm} is a group of transactions called as the database. In every transaction in D has a distinct transaction ID and it consists of a subset of items in I. A association rule is defined as the form X=>Y where X,Y ⊆I and X⋂ Y=∮ . The group of items X and Y are left hand side and the rule is right hand side respectively. Consider an example that is the set id I ={milk, bread, butter, beer} and a small database consists of the objects that their presence is represented as 1 and the absence is represented as 0. Consider a transaction that is {bread butter}milk. That means the customer buy bread and butter there is chance to buy milk. Note: this example is extremely small. In practical applications, a rule needs a support of several hundred transactions before it can be considered statistically significant, and datasets often contain thousands or millions of transactions.[5] To select interesting rules from the set of all possible rules, constraints on various measures of significance and interest can be used. The best-known constraints are minimum thresholds on support and confidence.  Finding all frequent itemsets in a database is difficult since it involves searching all possible itemsets (item combinations). The set of possible itemsets is the power set over I and has size 2n -1 excluding the empty set which is not a valid itemset). Although the size of the powerset grows exponentially in the number of items n in I, efficient search is possible using the downward-closure property of support (also called anti-monotonicity) which guarantees that for a frequent itemset, all its subsets are also frequent and thus for an infrequent itemset, all its supersets must also be infrequent. Exploiting this property, efficient algorithms can find all frequent itemsets.[10] III. PROPOSED SYSTEM In the proposed system we provide privacy preserving mining of association rules from outsourced transaction database. By using cryptography and association rules we are provide security over transaction database. The support sup(X) of an itemset X is defined as the proportion of transactions in the data set which ISSN: 2231-5381 http://www.ijettjournal.org Page 345 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014 Bipartite Matrix Data owner Service Provider S-Des Algorithm the form cipher format. After sending the service provider will find the frequent items set and sent to data owner. The data owner will perform the decryption process and get the plain frequent transaction item sets. Data Owner: The data owner collects all the transaction item sets and forwards to service provider. Before sending all transaction items, the data owner will convert transaction items into unknown format. Because,to provide privacy over transaction item sets. So that, we can use cryptography technique for security of item sets. The conversion unknown format of transaction items sets can be done by data owner. The data owner can also convert unknown format of frequent transaction items sets into plain format after receiving of cipher patterns from service provider. Privacy of transaction item sets: The data owner collects all the transaction item sets and convert cipher format by using S-DES algorithm. By converting cipher format we can provide privacy of transaction items. The data owner will perform all encryption and decryption transaction items by providing security of item sets. The data owner performing encryption of item sets we can sent to service provider in ISSN: 2231-5381 Service provider: The service provider is third party user for provide service to all the company members. All the company members will sent transaction item sets to service provider. Before sending the transaction item sets to service provider the data owner will encrypt the item into cipher format. After converting cipher format the data owner will sent. After sending the service provider will retrieve the all cipher transaction item sets and perform the Association Rule technique for finding the frequent item sets. In this paper we are proposed bipartite matrix technique for finding frequent item sets. The process of bipartite matrix is as follows. http://www.ijettjournal.org Page 346 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014 I) Creating bipartite matrix according to frequent item sets: The service provider will generate all bipartite matrix based on the transaction item sets. The procedure for generating bipartite matrix as follows. D =transaction data bases S=item sets T=items In put: transaction data bases D Output: frequent length item sets Si if number >min_sup generate maximum length candidate iteinsets from S, for each itemsets in the candidate itemsets calculateSuppor(itemsets) if support(itemsets)>min_ sup itemset is frequent end If end for end if If maximum frequent itemsets is not null break; end if end for End Process: Begin Find all frequent length item sets Si from D If Si not null For each Sxin Si For each item t in D If ti containsSx Sx|ti| =1 Else Sx|ti|=0 Return tw|N| End for End for End if After finding maximum frequent item sets the service provider will sent the item sets to data owner. The data owner will retrieve the frequent item sets and convert into plain format. Frequent patterns generation: tw|N|={t1|N|,t2|n|……..tn|N|} end Frequent ‘n’ item set II) Extracting maximum frequent item sets from bipartite matrix: After generating bipartite matrix the service provider will find the frequent item sets from bipartite matrix.Finding frequent item sets using bipartite matrix is as follows. Input: The bipartitematrix frequent lengths-1 item sets Si Minimum support min_sup. Frequent length-1itemsets Si Output: Maximum frequentitemsets Begin For each column in the bipartite matrix compute the number of value 1 in the current column end for return max[n] sort( max[n]) for each one In the max[n] compute number of the columns with the same number value 1 ISSN: 2231-5381 Initially frequent one item set can be generated by counting number of individual items in all transactions like, Consider item ‘a’, now count number of ‘1’s opposite to item ‘a’ in all transactions, total count of a is 3 because a available in transaction1 2 and 6.if the count equal or greater than minimum threshold value or support count ( 2 in our example) it can be treated as frequent item. To find the frequent two item set or three item set or n item set, we can follow the same procedure until frequent items found. Consider two item set {a,b},now check the corresponding ones opposite to “a,b” (both should be set to “1”),then count would be “1”.In the above table transaction 1 and 6 contains “1” in both places of a and b, so count is 2. Now {a,b} is a frequent item ,because our minim support count value is 2,by the same process you can find the remaining frequent patterns IV. CONCLUSION In a cloud computing there has recent interest for considerable paradigm of mining as service for company. For mining frequent item sets from the data set we are using association rule technique. In this paper we are proposed mainly two concepts for privacy of transaction data base and find the frequent item sets. First one provide privacy transaction data base we are using S-DES algorithm. The second one bipartite matrix for finding frequent item sets. By proposing that concept we can provide more efficiency and security of transaction item sets and also provide security of frequent item sets. http://www.ijettjournal.org Page 347 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014 REFERENCES [1] D. Agrawal and C. C. Aggarwal. On the design and quantification of privacy preserving data mining algorithms. In Proceedings of the Twentieth ACM SIGACT-SIGMOD-SIGART Symposium on Principles of Database Systems, pages 247– 255, Santa Barbara, California, USA, May 21-23 2001. ACM. [2] R. Agrawal and R. Srikant. Privacy-preserving data mining. In Proceedings of the 2000 ACM SIGMOD Conference on Management of Data, pages 439–450, Dallas, TX, May 14- 19 2000. ACM. [3] M. Atallah, E. Bertino, A. Elmagarmid, M. Ibrahim, and V. Verykios. Disclosure limitation of sensitive rules. In Knowledge and Data Engineering Exchange Workshop (KDEX’99), pages 25–32, Chicago, Illinois, Nov. 8 1999. [4] Special issue on constraints in data mining. SIGKDD Explorations, 4(1), June 2002. [5] C. Clifton. Using sample size to limit exposure to data mining. Journal of Computer Security, 8(4):281–307, Nov. 2000. [6] H. S. Delugach and T. H. Hinke. Wizard: A database inference analysis and detection system. IEEE Transactions on Knowledge and Data Engineering, 8(1), Feb. 1996. [7] W. Du and M. J. Atallah. Privacy-preserving cooperative scientific computations. In 14th IEEE Computer Security Foundations Workshop, pages 273– 282, Nova Scotia, Canada, June 11-13 2001. [8] W. Du and M. J. Atallah. Privacy-preserving statistical analysis. In Proceeding of the 17th Annual Computer Security Applications Conference, New Orleans, Louisiana, USA, December 10-14 2001. [9] Directive 95/46/ec of the european parliament and of the council of 24 october 1995 on the protection of individualswith regard to the processing of personal data and on thefree movement of such data. Official Journal of the EuropeanCommunities, No I.(281):31–50, Oct. 24 1995. [10] A. Eisenberg.With false numbers, data crunchers try to minethe truth. New York Times, July 18 2002. Andhra Pradesh. He received his M.Tech (CSE) from Aditya Institute of Technology And Management(AITAM), Tekkali. Andhra Pradesh. His research areas include Data Mining, Image Processing, Computer Networks, Distributed Systems. He published six papers in international journals and he attended for three conferences. BIOGRAPHIES Gangu. DharmaRaju is student in M.Tech(CSE) in Sarada Institute of Science Technology and Management, Srikakulam. He has received his B.Tech(IT) from Sri Sivani College of Engineering(SSCE), Srikakulam. His interesting areas are Data Mining, Networking. Jayanthi Rao Madina is working as a HOD in Sarada Institute of Science, Technology And Management(SISTAM), Srikakulam, ISSN: 2231-5381 http://www.ijettjournal.org Page 348

Secrecy Conserving of Association Rule Mining from Unrealized Transaction Data Base

Related documents

Products

Support

Secrecy Conserving of Association Rule Mining from Unrealized Transaction Data Base

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib