Nested Clustering Based Rotation using Radial Basis Function for PPDM Hariharan.R ,Durairaj.K

International Journal of Engineering Trends and Technology (IJETT) – Volume 9 Number 9 - Mar 2014 Nested Clustering Based Rotation using Radial Basis Function for PPDM Hariharan.R1,Durairaj.K2 Student, Department of Information Technology, Sathyabama University, Chennai, India. ABSTRACT-Privacy preserving in data mining is most needed technique in the current world. The people they don’t want to share their sensitive information with unauthorized user, So they want to hide the information from unauthorized person. In data mining there are lot of domains are used for preserving the data’s like cryptography , k-anonymity, perturbation, and lot of technology’s are used but there are lot of drawbacks also occurring in existing method, like loss of data quality or limited preserving and so on. So in this paper we are going to seen about how the neural network is used for preserving the data’s. specially the method is called “ radial basis function ” is used to preserving the data and in this paper we are going to seen about performance of our method metrics and limitation and feature work of our concept. KEYWORDS- privacy preservation in data mining , Radial basis function , Randomization I. INTRODUCTION Data mining is extraction of interesting (non-trivial, implicit, previously unknown and potentially useful) patterns or knowledge from huge amount of data ,the data mining is otherwise called as knowledge discovery in database , knowledge extraction ,pattern analysis data archeology ,information harvesting and business intelligence etc. The simple search and query processing also data mining. Data mining is mainly applicable in market analysis and risk analysis. Data mining software is one of a number of analytical tools for analyzing data. It allows users to analyze data from many different dimensions or angles, categorize it, and summarize the relationships identified..Although data mining is a relatively new term, the technology is not. Companies have used powerful computers to sift through volumes of supermarket scanner data and analyze market research reports for years. However, continuous innovations in computer processing power, disk storage, and statistical software are dramatically increasing the accuracy of analysis while driving down the cost. Text mining, web mining ,stream data mining are some other application of data mining .The huge amount of data is stored in database . the datas are increasing day by day rapidly.so that datas are given to data mining technique the data mining is extract the data from large amount of data set. In that data base some of the attribute are very sensitive for example the shopping card and medical data base are treated as sensitive attribute of particular person so that should be hided from others or else it may create some problem or prestige issue to that particular person .so we go ISSN: 2231-5381 to the concept for preservation their sensitive data that is called as privacy preservation in data mining .each individual data can not identify by the others. The hospital data is given to some research purpose. The patient database is consider as very confidential in hospital management system. Even though its sensitive the data need to be given to research worker for analysis report of human. In the same case the hospital management want hide the individual disease. So they remove the unique attribute for the patient security issues. Name, address, phone number are all consider as unique identifiers. Even though some of the attribute will show the particular patient from the database. That is in the particular area code in the particular age with the particular salary the person will identify. Some times that may misuse by criminals. So the hospital management will change the data’s in to the nearest value it may higher or lower. But not the huge difference. Then only the analysis report will give some accurate data with the secure for the patient also. The health insurance directly claimed the money from the company using some common identification or unique identification. In this case the company need know about the amount need to be client but not the disease and some sensitive information. The medi-clime also communicate with the company database but the salary and some main attribute are need to hide from the mediclime. So the data’s are preserved using some privacy preservation technique. The online transaction is sharing the two database like bank and selling company .so the bank know only the amount of purchase but not the product of purchase . the selling company also only know amount is given by card not the balance amount of that card holder .so this sensitive information is hide using ppdm. PPDM is one of the technique in this current world to prevent the data from [7]the unauthorized person .The data base is given to the mining there are some possibility for the piracy the data and it may affect the user so need to remove the unique attribute, that is pre-processing and common attribute also change to the nearest value.From the lot of technologies we use the perturbation technique in this paper.The perturbation technique is modify the data .the number of output is equal to number of input and the out put value is nearly equal to original data but not same as original data.the data quality and preservation rate all other metric also depend up on technique applied in particular paper. http://www.ijettjournal.org Page 460 International Journal of Engineering Trends and Technology (IJETT) – Volume 9 Number 9 - Mar 2014 II. RELATED WORK IV. TYPES OF PPDM Database is cluster[3] and is again cluster in to two subcluster. All sub-clusters are analyze first about the efficient data for anonymize. The efficiency is not enough means,the sub-cluster is merge with its adjacent sub-cluster. After this adjustment all sub-cluster have enough data to modify. To choose a greater cluster calculate the amount of data in each sub-cluster. The range value should be determined for sub-cluster. If its exceeds its again divided into sub-cluster. After this checking process the centroid value is replaced in attribute value.Different approaches for privacy preservation data mining. [8]Data’s are partition in to horizontally and vertically. Mainly two approach is keep the data that approaches are generalization and bucketization. The high dimensional data is also handle this technique. Discus three phases are Attribute Partition, Column generalization and Tuple partition. Four techniques also implemented are Generalization, Bucketization, Multiset-based Generalization, One Attribute per Column Slicing.K-anonymity is proposed[1] for privacy preservation kactus[5] algorithm is proposed for ppdm. The decision tree make the original data set. In that tree structure each node have weighted value. K-anonymity[10] value fixed for the preservation. The weight value of each value has equated and check whether its equal or it. If its equal then it will mounted with parent, like this process is continued. Finally the preserved data is get.[9]Tree based data perturbation method for privacy preservation .proposing a KD tree based perturbation. The data set is partition in to subset, that also divided in to smaller subset which have the more homogenizes that is each partition has three value that value is placed by the average value of that last partitioning. It is mainly applicable for numerical attribute. The out coming data is pertibated from input data and for getting the database structure use the conquered method. [6] Proposed rotation based transformation(RBT) for privacy preservation in data mining. The numerical attributes its mainly focused on this paper. Using the method called isometric transformation. The numerical data is taken its apply in isometric formula after this process the data value should be changed nearly equal to original value.this all process are done after clustering the database. So the metrics value has to be check for this process. From the checking result the datas are present inside the cluster only after the pertibation. So this is also one of the best method for privacy preservation.This paper[4] propose how the neural network is secure the data in data mining approach. Two main methods are discussing in that paper that are back propagation algorithm and ELM algorithm .Back propagation algorithm is preserve the data in two approach secure multi party addition and secure multi party multiplication.ELM algorithm is efficient for single neuron layer learning system . There are lot of technologies are used in privacy preservation in data mining. Some of technique are discussed here, III. TYPES OF ATTRIBUTE The data base contain lot of attribute of a person but mainly is divided into two types of attribute that is numerical attribute and categorical attribute the attribute which have only integer that’s called numerical attribute and the non numerical values are categorical attribute . ISSN: 2231-5381 a .Additive Perturbation Additive noise perturbation is adding the random noise(b) to the original dataset(a) after performing the additive perturbation can estimate the probability distribution of original numeric data value. The perturbed value of an attribute can be estimated, with a confidence (c). Then the privacy is estimated by (b−a) with confidence ‘c’. b .Multiplication Perturbation This is similar as a additive perturbation but here we have to multiply the noise (b) with the original dataset (a). Here we get a accurate dataset and the noise value is in floating with the range of zero to one will give some good output. The perturbated data (c) is nearer to original data, ie c=a*b. c .Kanonymity The concept of k-anonymization[7] is introduced by Samarati and Sweeney k-anonymity is a method to privacy preserving in terms of data repetition in quasi identifiers. d .Cryptography-Based Techniques Cryptography is also used for privacy preserving in data mining. From the database select the sensitive attribute and apply the any of the cryptography technique like key generation and adding key value and get the new altered database. V. PROBLEM DEFINITION There are lot of technology are available for PPDM like anonoimzation randomization and slicing etc. there is lot of drawbacks also available in existing methodology so As there are no methodology for privacy preservation in effective manner in utility in less execution time and a control over anonymization. So we have desired to design a concept based on artificial intelligent specially in neural network for preserving the data using perturbation technique named as Nested Clustering Based Rotation using Radial Basis Function(NCBRBF) for PPDM. VI. ARTIFICIAL INTELLIGENT Artificial intelligent is top technology in current world. In today each and every field a scientist try to implement the artificial intelligent because they get some effective solutions for the problem what they research. Artificial intelligent is give more expectation result in all field. Because the designing of the system is very effective manor. For example the medical and science and some technologies the scientist have tried and get efficient result. Artificial intelligent is act based on the neural network technology.The neurons work is based on the experience or training given to that neurons. http://www.ijettjournal.org Page 461 International Journal of Engineering Trends and Technology (IJETT) – Volume 9 Number 9 - Mar 2014 Fig 1:Block Diagram for NCBRBF VII. NEURAL NETWORK Neural network are widely used for learning system and knowledge representation and they are applied in different fields including medical diagnosis, pattern recognition ,security ,fraud detection and other knowledge discover systems.NN learning system process information in the same way as biological nervous system ,using an enormous number of highly interconnected processing elements. The word “neural network” derived from neuron from human brain why because in a human system is act according to the neuron system in a human mind. If something happen in front of human it will capture by human eyes and it sends the message to brain. In the brain contains number of neurons. The brain acts depend on the neuron function. The human reaction also depends on neurons command. Like the artificial process also given the efficient output what they expect that system construction based on the neuron performance. So that system give efficient output what they expected more or less. So that system construct using number of neurons. So that system is called neural networks. i. RADIAL BASIS FUNCTION(RBF) Radial Basis Function is one of the method which is used in Neural networks. The word Radial Basis is derived from the shape of the function. After making this Radial network it will get a bell shape and if we cut in between any place we will get a round shape which have a same radius at the mid-point. So this function is called as a radial basis function. ISSN: 2231-5381 In our case this paper tried with a new concept that is PPDM is mounded with the RBF function. First design a network with the some set of neurons with the help of newrb function. In that function need to set the value for p and T value P represent input sample data and T is Target value and goal ,spread, MN,DF values are set default for our convience may change the value for attribute for newrb. Each neuron handle the multiple work. First take some sample data and give it as a input to the network while passing the data the developer trained that network for giving the input with the corresponding output. If the P value is equal to T our network is build good. In case of the non equal value for P and T try the different values for newrbf attributes till get the same value for output like a input. After this process give the data set as input and get the same as output for the demonstration purpose .but Our main motive is getting the perturbated data. So add some noise in that network. So we will get the perturbated value. But how much of amount modified is important. That data variation is need in smaller size only not in biggest level. So we have to concentrate on this place and adding the error is also like that only. In this place the data’s from database are splited in some interval with some[2] special condition. And the random number is generated based on the intervals .for particular intervals the some set of random number will generated and it will multiply with original data and get the perturbated output. That value is modified very near only so the data quality is good the main use of the network build is the the output generated is based on the input. http://www.ijettjournal.org Page 462 International Journal of Engineering Trends and Technology (IJETT) – Volume 9 Number 9 - Mar 2014 VIII. EXPERIMENTAL SETUP In our case is implemented for the Adult dataset, in this place Adult dataset refers a survey taken in particular area in US, which deals day to day activity of that area Adult, which contains 32561 records, of which 30722 are complete. There are 14 attributes in the data base, of which we have taken {Age, Work-class, Education, Hours/week, sex, race, Marital-status, Salary}. {Salary} is considered as the sensitive attribute. IX. ALGORITHM IX INPUT: FILTERED DATABASE Fig 3.1: Original Data Flow Using RBF OUTPUT: ANONYMIZED DATABASE METHOD: STEP1: TAKE ‘N’ NUMBER OF SAMPLE DATA. STEP2: BUILD THE NETWORK >>NET=NEWRB(P,T,GOAL,SPREAD,K,KI); P INPUT T TARGET OUTPUT K MAX NO OF NEURONS KI NO OF NEURONS TO ADD BETWEEN DISPLAY STEP3: GIVE THE FILTERED DATABASE AS INPUT. STEP4: ADD THE ERROR SIGNAL USING RANDOMIZATION TECHNIQUE. Fig 3.2: Flow of Sampling Data’s Fig 3.3: Flow of Perturbated Data’s Fig 2: Neural Network Building ISSN: 2231-5381 http://www.ijettjournal.org Page 463 International Journal of Engineering Trends and Technology (IJETT) – Volume 9 Number 9 - Mar 2014 Fig 4: Sample RBF Networks with Different Values TABLE 1:RBF COMPARISON WITH EXISTING METHODOLOGY Parameters Units original data AP MP IR RBF Loss of Data Cont. 20.399 50.525 16.481 17.857 15.481 Bias in Mean Cont. 39.68 39.68 41.76 39.51 25.81 Bias in Standard deviation Rate of Classification Error Cont. % 15.408 18.947 33.555 15.332 10.345 189.47 13.2 15.7 0 9.2 Computational time Sec Nil O(2n) O(2n) O(n) O(n)+1 Regression Cont. Nil 0 0.009 0.001 0.001 Privacy preservation rate % Nil 0.041 0.038 0 0.442 Rand Index % Nil 11.3 16.4 9.1 12.4 Measure of Privacy Cont. Nil 35.37 15.194 2.43 40.52 From the graph (Fig 4) shows the build of a network the test sample also given and got the expected output .so proceed the further process with this balanced network. And give the total database to the network and get the modified output with quality data. X .PERFORMANCE ANALYSIS: From table1gives the performance of RBF. Compare to existing methodology this RBF gives better output result. Additive Perturbation(AP), Multiplicative Perturbation(MP), Isometric Transformation(IR) are existing methodology which is previously used technique, which gives the best result. Privacy preservation rate and Measure of privacy is good. Very less amount of data loss and classification error. Refer to previous methods computational time is low. Over all analysis of performance is very good in this RBF technique. ISSN: 2231-5381 XI. CONCLUSION In this project, the proposed new technology is Radial Basis Function for Privacy Preservation Data Mining. This project using Neural Network technique to implement the good network. It generate the high performance accurate output.RBF metrics compares the various of the existing methods and explore the good quality. We conclude the Radial Basis Function (RBF) is the best technology for PPDM with secure manner. In future, the method can be enhanced by using a parameter for rotation which depends on the distance between the sub-cluster to improve the quality of data as well as the preservation rate. http://www.ijettjournal.org Page 464 International Journal of Engineering Trends and Technology (IJETT) – Volume 9 Number 9 - Mar 2014 ACKNOWLEDGMENT We would like to thank Sathyabama University for giving us a platform to enhance our knowledge. We would like to express our special thank of gratitude to our guide Mrs.V.Rajalakshmi M.Tech, Assistant professor in Sathyabama University who motivated us to prepare this paper and peachy guidance to this paper and also we would like to express our deepest thanks to all those who made us possible to complete this work. REFERENCES [1] [2] [3] [4] Chuang-Cheng Chiu and Chieh-Yuan Tsai “A kAnonymity Clustering Method for Effective Data Privacy Preservation” Springer-Verlag Berlin Heidelberg 2007, ADMA 2007, LNAI 4632, pp. 89– 99, 2007. Li Liu , Murat Kantarcioglu, Bhavani Thuraisingham “The applicability of the perturbation based privacy preserving data mining for real-world data” Data & Knowledge Engineering 65 (2008) 5–21. V.Rajalakshmi,G.S.AnandhaMala ANONYMIZATION BASED ON NESTED CLUSTERING FOR PRIVACY PRESERVATION IN DATA MINING” Indian Journal of Computer Science and Engineering (IJCSE) Vol. 4 No.3 JunJul 2013. Saeed Samet , Ali Miri “Privacy-preserving backpropagation and extreme learning machine algorithms” Data & Knowledge Engineering 79–80 (2012) 40–61. ISSN: 2231-5381 [5] Slava Kisilevich, Lior Rokach, Yuval Elovici, Member, IEEE, and Bracha Shapira “Efficient Multidimensional Suppression for K-Anonymity” IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 22, NO. 3, MARCH 2010. [6] Stanley R. M. Oliveira, Osmar R. Zaiane “Data Perturbation by Rotation for Privacy-Preserving Clustering” Technical Report TR 04-17 August 2004 [7] Sweeney, L., Achieving k-anonymity privacy protection using generalization and suppression. 2002. [8] Tiancheng Li, Ninghui Li, Senior Member, IEEE, Jian Zhang, Member, IEEE, and Ian Molloy “Slicing: A New Approach for Privacy Preserving Data Publishing” IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 24, NO. 3, MARCH 2012 [9] Xiao-Bai Li and Sumit Sarkar “A Tree-Based Data Perturbation Approach for Privacy-Preserving Data Mining” IEEE TRANSACTIONS ON KNOWLEDGE AND DATA ENGINEERING, VOL. 18, NO. 9, SEPTEMBER 2006 [10] Yingjie Wu, Zhihui Sun, Xiaodong Wang , “Privacy Preserving k-Anonymity for Re-publication of Incremental Datasets “, 2009 World Congress on Computer Science and Information Engineering . http://www.ijettjournal.org Page 465

Nested Clustering Based Rotation using Radial Basis Function for PPDM Hariharan.R ,Durairaj.K

Related documents

Products

Support

Nested Clustering Based Rotation using Radial Basis Function for PPDM Hariharan.R ,Durairaj.K

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib