www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242

advertisement
www.ijecs.in
International Journal Of Engineering And Computer Science ISSN:2319-7242
Volume 4 Issue 2 February 2015, Page No. 10380-10383
Survey on A Methodology for Sensitive Attribute Discrimination
Prevention In Data Mining
Naziya Tabassum Qazi , Prof. Ms.V.M.Deshmukh
Research Scholar,Department of Information Technology, Prof. Ram Meghe
Institute Of Technology Research, Badnera Amravati.(M.S)
naziya.qazi@gmail.com
Professor & Head, Department of Information Technology Prof. Ram Meghe Institute
Of Technology & Research, Badnera, Amravati(M.S)
msvmdeshmukh@rediffmail.com
ABSTRACT – Today, Data mining is an increasingly important technology. It is a process of extracting useful knowledge from large collections of data. There
are some negative view about data mining, among which potential privacy and potential discrimination. Discrimination means is the unequal or unfairly
treating people on the basis of their specific belonging group. If the data sets are divided on the basis of sensitive attributes like gender, race, religion, etc.,
discriminatory decisions may ensue. For this reason, antidiscrimination laws for discrimination prevention have been introduced for data mining.
Discrimination can be either direct or indirect. Direct discrimination occurs when decisions are made based on some sensitive attributes. It consists of rules or
procedures that explicitly mention minority or disadvantaged groups based on sensitive discriminatory attributes related to group membership. Indirect
discrimination occurs when decisions are made based on non sensitive attributes which are strongly related with biased sensitive ones. It consists of rules or
procedures that, which is not explicitly mentioning discriminatory attributes, intentionally or unintentionally, could generate decisions about discrimination.
Index Terms -Antidiscrimination, data mining, direct and indirect discrimination prevention, rule protection, rule generalization, privcy
I. INTRODUCTION
In sociology, discrimination is the prejudicial treatment of
an individual based on their membership in a certain group
or category. It involves denying to members of one group
opportunities that are available to other groups. There is a
list of antidiscrimination acts, which are laws designed to
prevent discrimination on the basis of a number of attributes
(e.g., race, religion, gender, nationality, disability, marital
status, and age) in various settings (e.g., employment and
training, access to public services, credit and insurance,
etc.). For example, the European Union implements the
principle of equal treatment between men and women in the
access to and supply of goods and services in or in matters
of employment and occupation. Although there are some
laws against discrimination, all of them are reactive, not
proactive. Technology can add proactively to legislation by
contributing
discrimination
discovery
and
prevention
techniques. Services in the information society allow for
automatic and routine collection of large amounts of data.
Those data are often used to train association/classification
rules in view of making automated decisions, like loan
granting/denial, insurance premium computation, personnel
selection, Discrimination is a process of unfairly treating
people on the basis of their belonging to some a specific
group. For instance, individuals may be discriminated
because of their race, gender ,etc.[10] or it is the treatment
to an individual based on their membership in particular
category or group. There are various slaws which are used to
prevent discrimination on basic of various attributes such as
group. Indirect Discrimination is discrimination which
race, religion, nationality, disability and age. There are two
consists of rules and procedures that are not mentioning
types of discrimination i.e. Direct Discrimination and
attributes which causes discrimination and hence it
Indirect Discrimination. Direct Discrimination is direct
generates
discrimination which consists of procedure or some decided
unintentionally
discriminatory
decision
intentionally
or
rule that mention minority or disadvantaged group based on
sensitive attributes to they are related to membership of
II. WHAT IS DISCRIMININATION? Discrimination
should be defined as one person, or a group of persons,
Naziya Tabassum Qazi, IJECS Volume 4 Issue 2 February, 2015 Page No.10380-10383
Page 10380
being treated less favorably than another on the grounds of
human rights laws, discrimination means making a
racial or ethnic origin, religion or belief, disability, age or
distinction between certain individuals or groups based on a
sexual orientation (direct discrimination), or where an
prohibited ground. The idea behind it is that people should
apparently neutral provision is liable to disadvantage a
not be placed at a disadvantage simply because of their
group of persons on the same grounds of discrimination,
racial and ethnic origin, religion or belief, disability, age or
unless objectively justified (indirect discrimination). In other
sexual orientation. That is called discrimination and is
words, discrimination means treating people differently,
against the law.
negatively or adversely without a good reason. As used in
In the year 2005, Chen et al., they proposed a rotation based
III. LITERATURE SURVEY:-
perturbation method [04]. The proposed method maintains
In A Methodology for sensitive Attribute Discrimination
prevention in data mining is used to discriminate the
sensitive attributes on the basis of their race, religion,
gender, nationality, disability, marital status, and age
discrimination can be either direct or indirect. Direct
zero loss of accuracy for many classifiers. In the year 2006,
Wang et al., has used the Non-negative matrix factorization
(NNMF) for data mining [05]. In this work, they combined
non-negative
matrix
decomposition
with
distortion
processing .
discrimination occurs when decisions are made based on
sensitive attributes. Indirect discrimination occurs when
In the year 2008,[06] D.Pedresch et al,“ Discrimination
decisions are made based on non sensitive attributes which
Aware Data Mining” .In this method , system propose a
are strongly correlated with biased sensitive ones. This
visualization approach that can on the one hand be applied
system tackle discrimination prevention in data mining and
to any (classification or association) rules, but that is
propose new techniques applicable for direct or indirect
appropriate to bringing out characteristic of mined patterns
discrimination prevention individually or both at the same
that are especially important in discrimination-aware and
time.
privacy
aware
data
mining.
system
define
new
interestingness proceeding for items and rules and show
In the year 2000, Agrawal R. et al., [01] proposed an
additive data perturbation method for building decision tree
various ways in which these can help in highlighting
information in communicating settings.
classifiers. Every data element is randomized by adding
some noise. These random noise chosen independently by a
In the year 2009, F Kamiran, et al, [07] had tackled the
known distribution like Gaussian distribution. The data
problem of impartial classification by introducing a new
miner rebuilds the distribution of the original data from its
classification scheme for learning unbiased models on
distorted version.
biased training data in 2009. Their method is based on
massaging the dataset by making the least intrusive
In the year 2002, Sweeney L. et al., [02] In this method the
k-Anonymity model consider the problem that a data owner
wants to share a collection of person-specific data without
modifications which lead to an unbiased dataset. Numerical
attributes and group of attributes are not considered as
sensitive attribute
revealing the identity of an individual.
In the year 2010 S. Ruggieri et al, [08], have presented the
In the year 2003 C. Clifton et al,
they Discrimination
prevention has been recognized as an issue in a tutorial [03]
where the danger of building classifiers capable of racial
discrimination in home loans has been put forward. Data
mining and machine learning models extracted from
historical data may discover traditional prejudices
discrimination discovery in databases in which unfair
practices against minorities are hidden in a dataset of
historical decisions. The DCUBE system, based on
classification rule extraction and analysis implements the
approach which is centering the analysis phase on an Oracle
database. The proposed demonstration guides the audience
through the legal issues about discrimination hidden in data,
Naziya Tabassum Qazi, IJECS Volume 4 Issue 2 February, 2015 Page No.10380-10383
Page 10381
and through several legally-grounded analyses to unveil
prevention
methods in terms of data
quality and
discriminatory situations.
discrimination removal for both direct and indirect
discrimination. Based on the proposed measures, we present
In the year 2011, B Luong et al, [09] had modeled the
discrimination discovery and prevention problems by a
variant of k-NN classification that implements the legal
methodology of situation testing in 2011.
extensive experimental results for two well known data sets
and compare the different possible methods for direct or
indirect discrimination prevention to find out which methods
could be more successful in terms of low information loss
In the year 2012 F. Kamiran et al [10] presented
and high discrimination removal. The approach is based on
algorithmic solutions that preprocess the data to remove
mining classification rules (the inductive part) and reasoning
discrimination before a classifier is learned. They have
on them (the deductive part) the basis of quantitative
proposed three preprocessing techniques i.e. Massaging,
measures of discrimination that formalize legal definitions
Reweighing and Sampling which applies on training dataset.
of discrimination
In the year 2013, Sara Hajian et al.,[11] they handle
EXISTING SYSTEM:- In Existing system the initial
discrimination protection in data mining and propose new
idea of using rule protection and rule generalization for
techniques applicable for direct or indirect discrimination
direct
protection individually or both at the same time. discuss
experimental results. We introduced the use of rule
how to clean training data sets and outsourced data sets in
protection in a different way for indirect discrimination
such a way that direct and/or indirect discriminatory
prevention and we gave some preliminary experimental
decision
legitimate
results. In this paper, we present a unified approach to direct
(nondiscriminatory) classification rules. also propose new
and indirect discrimination prevention, with finalized
metrics to evaluate the utility of the proposed approaches
algorithms and all possible data transformation methods
and we compare these approaches. The experimental
based on rule protection and/ or rule generalization that
evaluations demonstrate that the proposed techniques are
could be applied for direct or indirect discrimination
effective at removing direct and/or indirect discrimination
prevention. We specify the different features of each
biases in the original data set while preserving data quality.
method. Since methods in our earlier papers could only deal
rules
are
converted
to
discrimination
prevention,
but
we
gave
no
with either direct or indirect discrimination, the methods
PROPOSED SYSTEM : We propose new utility
described in this paper are new.
measures to evaluate the different proposed discrimination
ACKNOWLEGEMENT:- This is to acknowledge
Original
Dataset (DB)
Data
analysis
and thanks to all individuals who played defining role in
Discriminatory
Decision rules
shaping this paper. Without their constant support, guidance
and assistance this paper would not have been completed. In
Discrimination Threshold
List of
Discrimition
attributes
AntiDiscrimi
nation
Data
anal
ysis
Transformd
Dataset(DB)
addition, the authors would like to thank to our guide and
the Head & Prof.MS.V.M.DESHMUKH for her guidance,
support, encouragement and advice.
CONCLUSION:- Along on the privacy, discrimination
Measure
Discrimination
Data
Transformatio
n
Non
Discriminato
ry Decision
Rules
Figure : Proposed System Architecture Design
are an indistinguishable authoritative effect while believing
the eligible and right looks from data mining. It is more than
noticeable that most people do not want to be discriminated
since by their sex, religion, nationality, age, and soon,
especially when those attributes are applied as constructing
conclusions around it alike applying it an job, loan, policy,
Naziya Tabassum Qazi, IJECS Volume 4 Issue 2 February, 2015 Page No.10380-10383
Page 10382
etc . The aim of theses paper was to develop a new pre
IEEE Conference on Data Mining, International Workshop
processing
on Privacy Aspects of Date Mining (PADM2006), pp.513-
discrimination
prevention
methodology
including different data transformation methods that can
517, 2006.
prevent direct discrimination indirect discrimination or both
of them at them at the same time. To attain this objective,
the first step is measure discrimination and identify
categories and groups of individuals that have been directly
[06].
D.
Pedreschi,
S.
Ruggieri
and
F.
Turini,
"Discrimination-aware Data Mining," Proc. 14th Conf.
KDD 2008, pp. 560-568. ACM, 2008
and indirect discriminated in decision making process. The
.[07].
second step is to transform data in proper way to remove all
discriminating, Proceedings of IEEE IC4 International
those discrimination biases finally, discrimination free data
conference on Computer,Control and Communication.
models can be produced from the transformed data set
(2009a) IEEE Press
F Kamiran, T Calders, Classifying without
without seriously damaging data quality.
.[08]. S. Ruggieri, D. Pedreschi, and F. Turini, DCUBE:
REFERENCES
Discrimination discovery in databases In A. K. Elmagarmid
[01].L. Sweeney, “k-Anonymity: A Model for Protecting
Privacy,”
International
Journal
on
Uncertainty,Fuzziness,and Knowledge-Based Systems, vol.
and D.Agrawal, editors, Proc. of the ACM SIGMOD Int.
Conf. on Management of Data (SIGMOD 2010), pages,
1127–1130. ACM, 2010c
10, no. 5, pp. 557-570, 2002.
.[09] .S. Hajian, J. Domingo-Ferrer, and A. Marti ´nez-
[02]. F. Kamiran, T. Calders, and M. Pechenizkiy,
Balleste´, “Discrimination Prevention in Data Mining for
“Discrimination Aware Decision Tree Learning,” Proc.IEEE
Intrusion
Int’lConf.DataMining(ICDM’10)2010
Computational Intelligence in Cyber Security (CICS ’11),
[03].C. Clifton. Privacy preserving data mining: How do we
and
Crime
Detection,”Proc.
IEEE
Symp.
pp. 47-54, 2011
mine data when we aren’t allowed to see it? In Proc. of the
.[10]. S. Hajian and J. Domingo-Ferrer, A methodology for
ACM SIGKDD Int. Conf. on Knowledge Discovery and
direct and indirect discrimination prevention in data mining.
Data Mining (KDD 2003), Tutorial, Washington, DC
IEEE
(USA), 2003
Engineering,pagetoappear,2012.
.[04]. S. Xu, J. Zhang, D. Han, J. Wang, “Data distortion for
[11].Sara Hajian, Joseph Domingo-Ferrer, “A Methodology
privacy
system”,
for Direct and Indirect DiscriminationPrevention in Data
Proceeding of the IEEE International Conference on
Mining”, IEEE Transactions on Knowledge And Data
Intelligence and Security Informatics, pp. 459-464, 2005.
Engineering,
protection
in
a
terrorist
analysis
Transactions
vol.
on
25,
Knowledge
No.
and
7,July
Data
2013
[05]. Jie Wang, Weijun Zhong, Jun Zhang, “NNMF- Based
Factorization
Techniques
for
High-Accuracy
Privacy
Protection on Non-negative-valued Dataset”, Proceeding of
Naziya Tabassum Qazi, IJECS Volume 4 Issue 2 February, 2015 Page No.10380-10383
Page 10383
Naziya Tabassum Qazi, IJECS Volume 4 Issue 2 February, 2015 Page No.10380-10383
Page 10384
Download