www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 2 February 2015, Page No. 10380-10383 Survey on A Methodology for Sensitive Attribute Discrimination Prevention In Data Mining Naziya Tabassum Qazi , Prof. Ms.V.M.Deshmukh Research Scholar,Department of Information Technology, Prof. Ram Meghe Institute Of Technology Research, Badnera Amravati.(M.S) naziya.qazi@gmail.com Professor & Head, Department of Information Technology Prof. Ram Meghe Institute Of Technology & Research, Badnera, Amravati(M.S) msvmdeshmukh@rediffmail.com ABSTRACT – Today, Data mining is an increasingly important technology. It is a process of extracting useful knowledge from large collections of data. There are some negative view about data mining, among which potential privacy and potential discrimination. Discrimination means is the unequal or unfairly treating people on the basis of their specific belonging group. If the data sets are divided on the basis of sensitive attributes like gender, race, religion, etc., discriminatory decisions may ensue. For this reason, antidiscrimination laws for discrimination prevention have been introduced for data mining. Discrimination can be either direct or indirect. Direct discrimination occurs when decisions are made based on some sensitive attributes. It consists of rules or procedures that explicitly mention minority or disadvantaged groups based on sensitive discriminatory attributes related to group membership. Indirect discrimination occurs when decisions are made based on non sensitive attributes which are strongly related with biased sensitive ones. It consists of rules or procedures that, which is not explicitly mentioning discriminatory attributes, intentionally or unintentionally, could generate decisions about discrimination. Index Terms -Antidiscrimination, data mining, direct and indirect discrimination prevention, rule protection, rule generalization, privcy I. INTRODUCTION In sociology, discrimination is the prejudicial treatment of an individual based on their membership in a certain group or category. It involves denying to members of one group opportunities that are available to other groups. There is a list of antidiscrimination acts, which are laws designed to prevent discrimination on the basis of a number of attributes (e.g., race, religion, gender, nationality, disability, marital status, and age) in various settings (e.g., employment and training, access to public services, credit and insurance, etc.). For example, the European Union implements the principle of equal treatment between men and women in the access to and supply of goods and services in or in matters of employment and occupation. Although there are some laws against discrimination, all of them are reactive, not proactive. Technology can add proactively to legislation by contributing discrimination discovery and prevention techniques. Services in the information society allow for automatic and routine collection of large amounts of data. Those data are often used to train association/classification rules in view of making automated decisions, like loan granting/denial, insurance premium computation, personnel selection, Discrimination is a process of unfairly treating people on the basis of their belonging to some a specific group. For instance, individuals may be discriminated because of their race, gender ,etc.[10] or it is the treatment to an individual based on their membership in particular category or group. There are various slaws which are used to prevent discrimination on basic of various attributes such as group. Indirect Discrimination is discrimination which race, religion, nationality, disability and age. There are two consists of rules and procedures that are not mentioning types of discrimination i.e. Direct Discrimination and attributes which causes discrimination and hence it Indirect Discrimination. Direct Discrimination is direct generates discrimination which consists of procedure or some decided unintentionally discriminatory decision intentionally or rule that mention minority or disadvantaged group based on sensitive attributes to they are related to membership of II. WHAT IS DISCRIMININATION? Discrimination should be defined as one person, or a group of persons, Naziya Tabassum Qazi, IJECS Volume 4 Issue 2 February, 2015 Page No.10380-10383 Page 10380 being treated less favorably than another on the grounds of human rights laws, discrimination means making a racial or ethnic origin, religion or belief, disability, age or distinction between certain individuals or groups based on a sexual orientation (direct discrimination), or where an prohibited ground. The idea behind it is that people should apparently neutral provision is liable to disadvantage a not be placed at a disadvantage simply because of their group of persons on the same grounds of discrimination, racial and ethnic origin, religion or belief, disability, age or unless objectively justified (indirect discrimination). In other sexual orientation. That is called discrimination and is words, discrimination means treating people differently, against the law. negatively or adversely without a good reason. As used in In the year 2005, Chen et al., they proposed a rotation based III. LITERATURE SURVEY:- perturbation method [04]. The proposed method maintains In A Methodology for sensitive Attribute Discrimination prevention in data mining is used to discriminate the sensitive attributes on the basis of their race, religion, gender, nationality, disability, marital status, and age discrimination can be either direct or indirect. Direct zero loss of accuracy for many classifiers. In the year 2006, Wang et al., has used the Non-negative matrix factorization (NNMF) for data mining [05]. In this work, they combined non-negative matrix decomposition with distortion processing . discrimination occurs when decisions are made based on sensitive attributes. Indirect discrimination occurs when In the year 2008,[06] D.Pedresch et al,“ Discrimination decisions are made based on non sensitive attributes which Aware Data Mining” .In this method , system propose a are strongly correlated with biased sensitive ones. This visualization approach that can on the one hand be applied system tackle discrimination prevention in data mining and to any (classification or association) rules, but that is propose new techniques applicable for direct or indirect appropriate to bringing out characteristic of mined patterns discrimination prevention individually or both at the same that are especially important in discrimination-aware and time. privacy aware data mining. system define new interestingness proceeding for items and rules and show In the year 2000, Agrawal R. et al., [01] proposed an additive data perturbation method for building decision tree various ways in which these can help in highlighting information in communicating settings. classifiers. Every data element is randomized by adding some noise. These random noise chosen independently by a In the year 2009, F Kamiran, et al, [07] had tackled the known distribution like Gaussian distribution. The data problem of impartial classification by introducing a new miner rebuilds the distribution of the original data from its classification scheme for learning unbiased models on distorted version. biased training data in 2009. Their method is based on massaging the dataset by making the least intrusive In the year 2002, Sweeney L. et al., [02] In this method the k-Anonymity model consider the problem that a data owner wants to share a collection of person-specific data without modifications which lead to an unbiased dataset. Numerical attributes and group of attributes are not considered as sensitive attribute revealing the identity of an individual. In the year 2010 S. Ruggieri et al, [08], have presented the In the year 2003 C. Clifton et al, they Discrimination prevention has been recognized as an issue in a tutorial [03] where the danger of building classifiers capable of racial discrimination in home loans has been put forward. Data mining and machine learning models extracted from historical data may discover traditional prejudices discrimination discovery in databases in which unfair practices against minorities are hidden in a dataset of historical decisions. The DCUBE system, based on classification rule extraction and analysis implements the approach which is centering the analysis phase on an Oracle database. The proposed demonstration guides the audience through the legal issues about discrimination hidden in data, Naziya Tabassum Qazi, IJECS Volume 4 Issue 2 February, 2015 Page No.10380-10383 Page 10381 and through several legally-grounded analyses to unveil prevention methods in terms of data quality and discriminatory situations. discrimination removal for both direct and indirect discrimination. Based on the proposed measures, we present In the year 2011, B Luong et al, [09] had modeled the discrimination discovery and prevention problems by a variant of k-NN classification that implements the legal methodology of situation testing in 2011. extensive experimental results for two well known data sets and compare the different possible methods for direct or indirect discrimination prevention to find out which methods could be more successful in terms of low information loss In the year 2012 F. Kamiran et al [10] presented and high discrimination removal. The approach is based on algorithmic solutions that preprocess the data to remove mining classification rules (the inductive part) and reasoning discrimination before a classifier is learned. They have on them (the deductive part) the basis of quantitative proposed three preprocessing techniques i.e. Massaging, measures of discrimination that formalize legal definitions Reweighing and Sampling which applies on training dataset. of discrimination In the year 2013, Sara Hajian et al.,[11] they handle EXISTING SYSTEM:- In Existing system the initial discrimination protection in data mining and propose new idea of using rule protection and rule generalization for techniques applicable for direct or indirect discrimination direct protection individually or both at the same time. discuss experimental results. We introduced the use of rule how to clean training data sets and outsourced data sets in protection in a different way for indirect discrimination such a way that direct and/or indirect discriminatory prevention and we gave some preliminary experimental decision legitimate results. In this paper, we present a unified approach to direct (nondiscriminatory) classification rules. also propose new and indirect discrimination prevention, with finalized metrics to evaluate the utility of the proposed approaches algorithms and all possible data transformation methods and we compare these approaches. The experimental based on rule protection and/ or rule generalization that evaluations demonstrate that the proposed techniques are could be applied for direct or indirect discrimination effective at removing direct and/or indirect discrimination prevention. We specify the different features of each biases in the original data set while preserving data quality. method. Since methods in our earlier papers could only deal rules are converted to discrimination prevention, but we gave no with either direct or indirect discrimination, the methods PROPOSED SYSTEM : We propose new utility described in this paper are new. measures to evaluate the different proposed discrimination ACKNOWLEGEMENT:- This is to acknowledge Original Dataset (DB) Data analysis and thanks to all individuals who played defining role in Discriminatory Decision rules shaping this paper. Without their constant support, guidance and assistance this paper would not have been completed. In Discrimination Threshold List of Discrimition attributes AntiDiscrimi nation Data anal ysis Transformd Dataset(DB) addition, the authors would like to thank to our guide and the Head & Prof.MS.V.M.DESHMUKH for her guidance, support, encouragement and advice. CONCLUSION:- Along on the privacy, discrimination Measure Discrimination Data Transformatio n Non Discriminato ry Decision Rules Figure : Proposed System Architecture Design are an indistinguishable authoritative effect while believing the eligible and right looks from data mining. It is more than noticeable that most people do not want to be discriminated since by their sex, religion, nationality, age, and soon, especially when those attributes are applied as constructing conclusions around it alike applying it an job, loan, policy, Naziya Tabassum Qazi, IJECS Volume 4 Issue 2 February, 2015 Page No.10380-10383 Page 10382 etc . The aim of theses paper was to develop a new pre IEEE Conference on Data Mining, International Workshop processing on Privacy Aspects of Date Mining (PADM2006), pp.513- discrimination prevention methodology including different data transformation methods that can 517, 2006. prevent direct discrimination indirect discrimination or both of them at them at the same time. To attain this objective, the first step is measure discrimination and identify categories and groups of individuals that have been directly [06]. D. Pedreschi, S. Ruggieri and F. Turini, "Discrimination-aware Data Mining," Proc. 14th Conf. KDD 2008, pp. 560-568. ACM, 2008 and indirect discriminated in decision making process. The .[07]. second step is to transform data in proper way to remove all discriminating, Proceedings of IEEE IC4 International those discrimination biases finally, discrimination free data conference on Computer,Control and Communication. models can be produced from the transformed data set (2009a) IEEE Press F Kamiran, T Calders, Classifying without without seriously damaging data quality. .[08]. S. Ruggieri, D. Pedreschi, and F. Turini, DCUBE: REFERENCES Discrimination discovery in databases In A. K. Elmagarmid [01].L. Sweeney, “k-Anonymity: A Model for Protecting Privacy,” International Journal on Uncertainty,Fuzziness,and Knowledge-Based Systems, vol. and D.Agrawal, editors, Proc. of the ACM SIGMOD Int. Conf. on Management of Data (SIGMOD 2010), pages, 1127–1130. ACM, 2010c 10, no. 5, pp. 557-570, 2002. .[09] .S. Hajian, J. Domingo-Ferrer, and A. Marti ´nez- [02]. F. Kamiran, T. Calders, and M. Pechenizkiy, Balleste´, “Discrimination Prevention in Data Mining for “Discrimination Aware Decision Tree Learning,” Proc.IEEE Intrusion Int’lConf.DataMining(ICDM’10)2010 Computational Intelligence in Cyber Security (CICS ’11), [03].C. Clifton. Privacy preserving data mining: How do we and Crime Detection,”Proc. IEEE Symp. pp. 47-54, 2011 mine data when we aren’t allowed to see it? In Proc. of the .[10]. S. Hajian and J. Domingo-Ferrer, A methodology for ACM SIGKDD Int. Conf. on Knowledge Discovery and direct and indirect discrimination prevention in data mining. Data Mining (KDD 2003), Tutorial, Washington, DC IEEE (USA), 2003 Engineering,pagetoappear,2012. .[04]. S. Xu, J. Zhang, D. Han, J. Wang, “Data distortion for [11].Sara Hajian, Joseph Domingo-Ferrer, “A Methodology privacy system”, for Direct and Indirect DiscriminationPrevention in Data Proceeding of the IEEE International Conference on Mining”, IEEE Transactions on Knowledge And Data Intelligence and Security Informatics, pp. 459-464, 2005. Engineering, protection in a terrorist analysis Transactions vol. on 25, Knowledge No. and 7,July Data 2013 [05]. Jie Wang, Weijun Zhong, Jun Zhang, “NNMF- Based Factorization Techniques for High-Accuracy Privacy Protection on Non-negative-valued Dataset”, Proceeding of Naziya Tabassum Qazi, IJECS Volume 4 Issue 2 February, 2015 Page No.10380-10383 Page 10383 Naziya Tabassum Qazi, IJECS Volume 4 Issue 2 February, 2015 Page No.10380-10383 Page 10384