Incorrectly Classified Instances 55 27.5

advertisement
IMPLEMENTATION OF DOUBLE LAYER PRIVACY ON ID3
DECISION TREE ALGORITHM
Akshaya.S
Jayasre Manchari.V L
MohamedThoufeeq.A
INFORMATION
INFORMATION
INFORMATION
TECHNOLOGY
TECHNOLOGY
TECHNOLOGY
SVCE
vinaakshay@gmail.com
SVCE
jayasremanchari@gmail.com
SVCE
thoufeeq1132@gmail.com
Kiruthikadevi.K
PROFESSOR
IT
SVCE
kiruthika@svce.ac.in
ABSTRACT
Data mining presents many opportunities for enhanced services and products in
diverse areas such as healthcare, banking, traffic planning, online search, and so on.
However, its promise is hindered by concerns regarding the privacy of the
individuals whose data are being mined. Though the existing data mining techniques
such as classification, clustering, association, prediction performed on the dataset
reveal useful patterns, there is always a threat to individual’s information.
Adaptations such as randomized response, k-anonymity and differential privacy do
not always adequately protect the sensitive information in which the main concern is
only with accuracy. So, the problem of data mining is with formal privacy
guarantees. The newer implementation is with the classification model based on ID3
decision tree in which we add different layers of privacy to preserve an individual’s
identity and also achieving a balance between privacy and utility. In this paper we
propose a privacy framework for the ID3 decision tree algorithm by (1) adding noise
to the existing algorithm (2) perturbing input dataset. Since an optimum level of
balance between utility and privacy was not achieved here, a third level of privacy
framework was developed with a hybrid framework where the input data is
normalized and given as input to the noisy algorithm in which case we achieved a
better level of accuracy along with privacy guarantees.
1. INTRODUCTION
Now a days , organizations are accumulating voluminous growing amounts of data in
various formats that requires too much of time and cost to analyze and retrieve
business patterns from terabytes or even Exabyte’s of data. These data keeps
multiplying day by day. Such large scale collection of personal information is widely
distributed in medical, banking; marketing and financial records .While processing
this massive amount of data in order to discover useful patterns, there is always a
constraint that compromises the individual’s privacy leading to a trade-off between
privacy and utility. The adversary should not learn anything about the individual
information who contributes to the dataset even in the presence of certain auxiliary
information. The data gathered by the organization will be subjected to a
computational processing by the administrator with the intent of obtaining useful
information or patterns. Such analysis of data may reveal relationships or associations
between the data to bring out a unique pattern that plays a major role in decision
making or even for future use. Thus a straight forward adaptation of data mining
algorithm – ID3 decision tree classification algorithm is to work with the privacy
preserving layer that will lead to suboptimal performance.
1.1 PURPOSE
Each data mining application can have its own privacy requirements, which include
protection of personal information, statistical disclosure control and so on.
An individual should be certain that his or her data will remain private. Increasing use
of computers and networks has led to a proliferation of sensitive data. Without proper
precautions, this data could be misused.
For the data miner, however, all these individual records are very valuable. While
applying the existing data mining algorithm with intent to obtain a specific pattern that
is mainly used for decision making or even for future use, there is a chance for an
individual’s record to be leaked. Hence there lies a great value in providing a data
mining solution that offers reliable privacy guarantees without compromising its
accuracy.
1.2 SCOPE
Problem with the existing data mining models is that with the availability of nonsensitive information, one is able to infer sensitive information that is not to be
disclosed. The privacy of the individuals whose data is being mined is breached from
collusion between adversarial parties or due to repetitive access by the same property.
This calls for the need of privacy in data mining. Privacy-preserving methods of data
handling seek to provide sufficient privacy as well as sufficient utility. This helps in
protecting the individual identity as well as of the proprietary and sensitive
information. The results of the data mining will be profitable only if a sufficient
amount of privacy is enforced, since most of the publicly available data consists of
several useful information that needs to be preserved.
2. RELATED WORK
[1] Data Mining with Differential Privacy, Israel Institute of Technology Haifa
32000, Israel, KDD 2010. In this paper, Data perturbation is performed before
giving the input to the algorithm by a new approach to implement ID3 decision tree
model with differential privacy based on SuLQ based framework where the data
miner need not consider privacy requirements for enforcing privacy in which a
programmable privacy preserving layer is added to the query interface where there
was a increase in accuracy of the classifier as the privacy budget (ɛ) increased for
relatively small dataset size. However large variance in certain experimental results
is found and privacy is not well established.
[5] A Framework for Privacy Preserving Classification in Data Mining,
The University of Newcastle Callaghan, NSW 2308, Australia – ACSW 2004.
In this paper, Significant amount of noise is added to both confidential and non
confidential attributes treating all the attributes to be sensitive where the same level
of privacy can be achieved by adding less noise to confidential attributes. The
privacy framework was also extended by perturbating the leaf innocent and leaf
influential attributes. The experiment results observed shows that though the
perturbed decision tree is different from the original tree, its logical rules are
maintained with a minimum level of privacy.
[6] A Noise Addition Scheme in Decision Tree for Privacy Preserving Data Mining,
Journal of computing, volume 2, issue 1, January 2010, ISSN 2151-9617.
Various methodologies in this paper involves adding noise to sensitive attributes as
in specific noise is added to the numeric attributes after exploring the decision tree
of the original data. The obfuscated data then is presented to the second party for
decision tree analysis. The decision tree obtained on the original data and the
obfuscated data are similar. Here the perturbed classifier is good as the original
classifier but the level of privacy is low.
[7] Privacy preserving decision tree learning using unrealized data sets, IJREAS
Volume 3, Issue 3 (March 2013) ISSN: 2249-3905. In most of the previous works,
the input dataset was anonymised and then the noisy data is given as input to the
classification algorithm where in this paper, privacy technique called Dataset
Complementation, the sample from perturbed dataset was removed and then
modified dynamically. Perturbed datasets are stored to enable a modified decision
tree data mining method. The Experiment results shows that privacy measure is
Medium with time complexity of O(Ts) where Ts=Training sample.
3. EXISTING PRIVACY TECHNIQUES
Research work for adding privacy layer on decision tree data mining model was
done in [9], [10] using cryptographic techniques and randomisation. [11] Addresses
privacy layer in C4.5 classification algorithm without SMC over vertically
partitioned set of data. Various methodologies involve adding noise to sensitive
attributes as in [6] specific noise is added to the numeric attributes after exploring
the decision tree of the original data. Significant amount of noise can also be added
to both confidential and non confidential attributes treating all the attributes to be
sensitive as in [5] where the same level of privacy can be achieved by adding less
noise to confidential attributes. The privacy framework was also extended by
perturbating the leaf innocent and leaf influential attributes.
In all the previous privacy methods that was applied to ID3 data mining model, it is
observed that achieving the maximum level of accuracy was the main concern, where
the noisy decision tree was almost similar to the original tree with less variation so
that the logical rules of the original decision tree is preserved to a greater extent.
Although a layer of privacy was added either to the data by means of perturbation or
adding noise to the ID3 algorithm, when the amount of noise added was so less that
there is always a possibility for an individual identity to be leaked.
PRIVACY LAYER
ID3 DECISION
TREE
DATA MINING
ALGORITHM
INPUT
DATASET
PERTURBATED
OUTPUT
DATASET
PRIVACY LAYER
INPUT
DATASET
ID3 DECISION TREE
DATA MINING ALGORITHM
PERTURBATED
OUTPUT
DATASET
4. PROPOSED SYSTEM
The following are the new ideas to implement a privacy layer to our existing ID3
decision tree algorithm.
At first, modification of the existing algorithm by adding laplacian noise at ROOT
LEVEL and CLASS LEVEL is performed so that a level of privacy is obtained in the
algorithm. This newer algorithm modifies the output of the ID3 technique by adding
laplacian noise that recursively affects the information gain of the descendants. As a
result the original decision tree is modified and outputs a noisy classifier. Any
adversary who queries the database based on this perturbed decision tree will not get
the original sample and thus cannot correctly identify an individual.
As an extension, to evaluate the suboptimal performance levels of accuracy and
privacy, anonymization of the input dataset is performed rather than adding noise to
the existing data mining algorithm. This dataset anonymization is done as
SENSITIVE ATTRIBUTES based anonymization.
Another proposal to privacy methodology is adding a dual layer of privacy in such a
way utility of the data is also maintained. Initially the original ID3 decision tree is
converted to a binary tree by clubbing the attribute values of the input data based on
their sensitivity, such that there is a minimum difference. Hence a noisy decision tree
with at most two branches for a node is obtained. To this perturbed tree another layer
of privacy is added only at the root level, which is the most sensitive attribute.
5. DATASET DESCRIPTIONS
The data set can be divided into – Training set, used to build the model and Test set,
used to determine the accuracy of the model. Given the training set as input, the
decision tree is constructed based on the ID3 algorithm. Two different datasets are
considered: Realistic data (Bank dataset) and Synthetic data (Adult dataset).
5.1 REAL DATASET
Real dataset are not anonymized and are present realistically. Thus we consider the
bank dataset with 600 instances; which includes training data
of 400
instances and testing data of 200 instances. The classifier is pep (yes/no) in this bank
dataset and the other attributes are age, sex, region, income, married, children, car,
save_act, current_act, mortgage.
Since ID3 algorithm does not take continuous attributes, classification is as:
 age class as teen, midage, oldies
 income class as low, medium, high.
5.2 SYNTHETIC DATASET
Synthetic data are generated to meet specific needs or certain conditions that may not
be found in the original, real data. This can be useful when designing any type of
system because the synthetic data are used as a simulation or as a theoretical value,
situation, etc. The creation of synthetic data is an involved process of
data anonymization. We consider the adult dataset with 48842 instances; from which
45222 instances are considered removing the missing values. The training data is
of 30162 instances and testing data is of 15060 instances. The classifier is
stategovt(<=50k,>50k) in this adult dataset and the other attributes are Age,
workclass, fnlwgt, education, education Num, marital status, occupation, relationship,
race, sex, capital-gain, capital-loss, hours-per-week, native-country.
Since ID3 algorithm does not take continuous attributes, classification is as:
 age as young, mid-age, oldies
 fnlwgt as low, medium, high
 education_no as first, second, third
 cap_gain as 1,2,3
 cap_loss as 1st, 2nd, 3rd
 hours_per_week as min, medium, max
6. SINGLE LAYER PRIVACY
6.1. ID3 ALGORITHM ANONYMIZATION
The basic idea of ID3 algorithm is to construct the decision tree using the concept of
Information Gain through the given sets to test each attribute at every tree node.
In the process of implementing ID3, computation of entropy is done which is used to
determine how informative a particular input attribute is about the output attribute for
a subset of the training data. In order to minimize the decision tree depth, when
traversing the tree path, there is a need to select the optimal attribute for splitting the
tree node. Attribute with highest gain is selected as the root node and thus
information gain is calculated as the expected reduction in entropy related to
specified attribute when splitting a decision tree node.
6.1.1 ROOT LEVEL ANONYMIZATION
Original decision tree without noise predicts the sensitive attribute with the highest
information gain as the root node and constructs the decision tree.
In this root level anonymization, the root node of the decision tree is modified to give
an inaccurate answer to the adversary. That is, the actual root node is changed to the
second highest information gain attribute. This is done by adding laplacian noise at
the root level. Privacy parameter, epsilon is chosen as any value above 0.75 to ln3.
6.1.2 CLASS LEVEL ANONYMIZATION
Here the decision tree is modified by adding noise at the sub-nodes level. That is,
noise is being added to the input sensitive attributes.
6.2 DATA PERTURBATION
In order to evaluate the suboptimal performance levels of accuracy and privacy,
anonymization of the input dataset is performed rather than adding noise to the
existing data mining algorithm. The anonymization of input data is done as
sensitive attributes based anonymization. That is, the instances that satisfy these
sensitive attributes are identified and the corresponding classifier is modified
thus providing an inaccurate answer to the adversary. Thus the ID3 decision
tree algorithm is considered with an ANONYMIZED synthetic dataset as input
training data for various cases possible and the testing data is predicted to
output a modified classifier.
7. DOUBLE LAYER PRIVACY
A dual layer of privacy is achieved in this implementation. The first layer of
privacy includes anonymizing the input dataset by converting all the attributes
into binary. This anonymization technique completely depends on the input
dataset. That is, only on the presence of non-binary variables in a dataset, the
dataset is perturbated and converted into binary after which it is subjected to a
second layer of privacy called the root node anonymization of the decision tree.
This root node perturbation technique modifies the decision tree by changing its
root node and thus providing an incorrect classifier for the testing dataset. The
advantage of our implementation is that: even in the case of no non-binary
values in the input dataset, at least this second layer of privacy is applied to it.
Hence in either case, privacy is achieved - either by root node modification or a
combination of both data perturbation and root node modification.
INPUT
DATASET
NORMALISED
INPUT
7.1 FIRST LEVEL
PRIVACY AT ROOT
ID3 DECISION
TREE
DATA MINING
ALGORITHM
PERTURBATED
OUTPUT
DATASET
The non binary attributes are split in such a way that its corresponding values
fall into just two categories and the decision tree has at most two nodes .
Step 1: Non binary attributes are identified.
Step 2: local sensitivity of the attributes are found.
Step 3: Attribute values are clubbed such that the sensitivity of the split binary
attributes are maintained.
Step 4 :To the normalized dataset, addition of laplacian noise at the root level is
done and the decision tree is modified.
8. EXPERIMENTAL RESULTS
8.1 ROOT LEVEL ANONYMIZATION RESULTS
BANK DATASET-Original tree vs Root Anonymized tree
ORIGINAL DECISION TREE
ROOT MODIFIED DECISION TREE
children->
income->
region->
children->
age->
mortgage->
sex->
YES=NO
FEMALE=NO
region->
mortgage->
INNER_CITY=NO
YES=YES
TOWN=NO
NO=NO
current_act->
Correctly Classified Instances
145
72.5 %
149
74.5 %
Incorrectly Classified Instances
27.5
Correctly Classified Instances
55
%
Incorrectly Classified Instances
25.5
=== Confusion Matrix ===
24 | YES
31
81 | NO
%
=== Confusion Matrix ===
original YES NO <-- classified as
64
51
predicted
original YES NO <-- classified as
68
24 | YES predicted
27
81 | NO
ADULT DATASET-Original tree vs Root Anonymized tree
ORIGINAL DECISION TREE
ROOT MODIFIED DECISION TREE
relationship->
marital_status->
education->
occupation->
cap_gain->
Transport-moving= <=50K
3= >50K
Protective-serv= <=50K
occupation->
hours_per_week->
Transport-moving= <=50K
min= <=50K
Handlers-cleaners= <=50K
medium= >50K
Other-service= <=50K
Exec-managerial= >50K
Exec-managerial= >50K
Farming-fishing= >50K
marital_status->
hours_per_week->
Correctly classified instances: 12680
Correctly classified instances: 12659
(84.20%)
(84.05%)
Incorrectly classified instances: 2380
Incorrectly classified instances: 2401
(15.80%)
(15.94%)
=== Confusion Matrix ===
original >50k
2292
1408
<=50k <-- classified as
972 | >50k predicted
10388| <=50k
=== Confusion Matrix ===
original >50k <=50k <-- classified as
2280
1420
981 | >50k predicted
10379| <=50k
8.2 CLASS LEVEL ANONYMIZATION RESULTS
BANK DATASET -Original tree vs Class Anonymized tree
ORIGINAL DECISION TREE
CLASS LEVEL ANONYMIZED
DECISION TREE
children->
age->
region->
mortgage->
age->
current_act->
sex->
save_act->
FEMALE=NO
car->
mortgage->
children->
YES=YES
married->
NO=NO
income->
oldies=NO
region->
teen=NO
sex->
Correctly Classified Instances
145
72.5 %
Correctly Classified Instances
153
76.5 %
Incorrectly Classified Instances
27.5 %
55
=== Confusion Matrix ===
original YES NO <-- classified as
Incorrectly Classified Instances 47
23.5
%
=== Confusion Matrix ===
original YES NO <-- classified as
64
24 | YES predicted
66
18 | YES predicted
31
81 | NO
29
87 | NO
ADULT DATASET-Original tree vs Class Anonymised tree
ORIGINAL DECISION TREE
CLASS LEVEL ANONYMIZED
DECISION TREE
relationship->
education->
cap_gain->
3= >50K
occupation->
occupation->
native_country->
Haiti= <=50K
hours_per_week->
min= <=50K
Transport-moving= <=50K
max= >50K
Handlers-cleaners= <=50K
cap_loss->
Other-service= <=50K
2nd= <=50K
Exec-managerial= >50K
cap_gain->
marital_status->
Widowed= >50K
Divorced= <=50K
2= >50K
sex->
race->
Never-married= >50K
relationship->
.........
.........
Correctly classified instances: 12680
Correctly classified instances: 12686
(84.20%)
(84.23%)
Incorrectly classified instances: 2380
Incorrectly classified instances: 2374
(15.80%)
(15.76%)
=== Confusion Matrix ===
=== Confusion Matrix ===
original >50k <=50k <-- classified as
original >50k <=50k <-classified as
2292 972 | >50k predicted
2282
956 | >50k predicted
1418
10404| <=50k
1408 10388| <=50k
8.3 SENSITIVE ATTRIBUTES BASED ANONYMIZATION RESULTS
BANK DATASET -Original tree vs Anonymised tree
ORIGINAL DECISION TREE
SENSITIVE
ATTRIBUTE
BASED
ANONYMIZED ID3
children->
region->
region->
children->
age->
3=YES
sex->
income->
FEMALE=NO
car->
married->
mortgage->
YES=YES
mortgage->
NO=NO
current_act->
Correctly Classified Instances
145 Correctly Classified Instances
72.5 %
57 %
Incorrectly Classified Instances
27.5
114
55
%
Incorrectly Classified Instances
43
86
%
=== Confusion Matrix ===
=== Confusion Matrix ===
original YES NO <-- classified as
original YES NO <-- classified as
64
24 | YES predicted
53
44 | YES
31
81 | NO
42
61 | NO
predicted
ADULT DATASET-Original tree vs Anonymized tree
ORIGINAL DECISION TREE
SENSITIVE
ATTRIBUTE
BASED
ANONYMIZED ID3
relationship->
education->
education->
relationship->
cap_gain->
cap_gain->
3= >50K
3= >50K
occupation->
occupation->
Transport-moving= <=50K
Transport-moving= <=50K
Handlers-cleaners= <=50K
Handlers-cleaners= <=50K
Other-service= <=50K
Other-service= <=50K
Exec-managerial= >50K
Exec-managerial= >50K
marital_status->
marital_status->
Widowed= >50K
Widowed= >50K
Correctly classified instances: 12680
(84.20%)
Correctly classified instances: 9425
(62.58%)
Incorrectly classified instances: 2380
(15.80%)
Incorrectly classified instances: 5635
(37.41%)
=== Confusion Matrix ===
original >50k <=50k <-- classified as
=== Confusion Matrix ===
original >50k <=50k <- classified as
2292
1408
972 | >50k predicted
10388| <=50k
2603 4538 |
>50k predicted
1097 6822| <=50k
8.4 DOUBLE LAYERED PRIVACY RESULTS
Here we consider only the real dataset for our experiment with a total number of 600
records because of the significant overhead in time complexity while running such a
large synthetic dataset with a total of 48842 records .
BANK DATASET -Original tree vs Double layer Privacy tree
ORIGINAL DECISION TREE
DOUBLE LAYER PRIVACY
IMPLEMENTED DECISION TREE
children->
married->
region->
children->
age->
age->
sex->
income->
FEMALE=NO
car->
mortgage->
current_act->
YES=YES
mortgage->
NO=NO
region->
oldies=NO
second=YES
teen=NO
first=NO
married->
NO=NO
YES=NO
region->
Correctly Classified Instances
72.5 %
Incorrectly Classified Instances
27.5 %
145
55
=== Confusion Matrix ===
original YES NO <-- classified as
Correctly Classified Instances
64 %
Incorrectly Classified Instances
36 %
128
72
=== Confusion Matrix ===
original YES NO <-- classified as
64
24 | YES predicted
60
37 | YES predicted
31
81 | NO
35
68 | NO
9. RESULTS EVALUATION
9.1 ACCURACY EVALUATION
The experiments are performed with the adult dataset and bank dataset from the UCI
repository. From these experiments, a better privacy is achieved in our model
compared to the previously existing privacy preserving techniques on decision tree.
One of the best evaluation technique followed in most of the machine learning is the
confusion matrix. The below table is obtained from the confusion matrix showing
various accuracy values for both the datasets. Thus in the first module
implementation, the root node and class node modification techniques achieved a
better level of accuracy of around 75% for bank dataset and 84% for adult dataset. In
the second module, data anonymization technique achieves a better privacy and
reduced accuracy compared to the first method. The accuracy achieved here is
around 57% for bank dataset and 62% for adult dataset. As our final implementation
for privacy preserving ID3, we perform a double layer privacy anonymization where
we try to maintain a balance between the privacy and accuracy parameters. The
accuracy here is 64% for bank dataset.
DATASETS
BANK
DATASET
ADULT
DATASET
IMPLEMENTATION
ACCURACY PRECISION
RECALL/SENSITIVITY
ORIGINAL TREE
0.725
0.673
0.727
ROOT MODIFIED ID3
0.745
0.715
0.739
CLASS MODIFIED
ID3
0.765
0.695
0.786
SENSITIVE
ATTRIBUTES
ANONYMIZED ID3
0.570
0.558
0.546
DOUBLE LAYER
ANONYMIZED ID3
0.640
0.631
0.619
ORIGINAL TREE
0.842
0.619
0.702
ROOT MODIFIED ID3
0.840
0.616
0.699
CLASS MODIFIED
ID3
0.842
0.616
0.704
SENSITIVE
ATTRIBUTES
ANONYMIZED ID3
0.625
0.703
0.365
9.2 PRIVACY EVALUATION
Privacy means that anything that can be learnt about a respondent from a statistical
database should not be learnt without access to the database. The risk to the privacy
should not substantially increase as a result of participating in a statistical database.
In the above privacy preserving techniques, calibrated noise is added carefully where
the magnitude of the noise is chosen in order to mask the influence of any particular
record on the outcome. In order to evaluate the privacy for the above methods, a
simple aggregate query is chosen. Based on the query, we compare the deviation of
the output pattern predicted from the original decision tree to that predicted from the
perturbated decision tree for the above privacy techniques.
For the bank dataset, we have chosen the query: age=teen, income=medium.
Similarly for the adult dataset, privacy is measured using the following query:
relationship= Husband, education= HS-Grad, occupation= Exec-Managerial, race=
Black.
Since privacy is the measure of confidentiality, query based privacy evaluation is
done here.
Privacy Evaluation for both the datasets
DATASETS
BANK
DATASET
IMPLEMENTATION
PRIVACY
ROOT NODE MODIFIED ID3
0.285
CLASS NODE MODIFIED ID3
0.285
SENSITIVE ATTRIBUTES ANONYMIZED ID3
0.285
DOUBLE LAYER ANONYMIZED ID3
0.571
ROOT NODE MODIFIED ID3
0.5
CLASS NODE MODIFIED ID3
0.25
SENSITIVE ATTRIBUTES ANONYMIZED ID3
0.25
ADULT
DATASET
10. CONCLUSION
Adding a privacy layer to the existing classification algorithm (ID3 DECISION
TREE) either by adding noise to the attributes or by perturbing the input data such
that the original tree and the modified tree are almost accurate in releasing the
required pattern while protecting the leakage of individual’s record. From the above
experiments, it is inferred that adding noise at algorithm level (root and class)
achieves better accuracy but fails to preserve the individual’s identity. While adding
noise at data level achieves better privacy than the previous implementation, with a
fall in accuracy. Finally with the implementation of a double layered privacy in ID3
with magnitude of noise bounded to sensitivity, an optimum level of privacy and
accuracy is maintained. Thus privacy of the individual contributing to any statistical
database is preserved.
11. FUTURE ENHANCEMENTS
Now a days, organizations are accumulating voluminous growing amounts of data in
various formats .These data keeps multiplying day by day. Big Data plays a major
role in any organization where the goal is to maintain the privacy of every
individual’s information present in their dataset. Hence we are working to extend the
proposed single and dual layer privacy technique to big data on HADOOP using
map reduce framework for the existing ID3 decision tree algorithm and also to other
family of decision tree algorithms such as C4.5,an extension of ID3 algorithm and
random forests. Map reduce frame work is used here to parallelize the process on
input dataset reducing the time complexity that usually occurs when millions of data
are processed using a single node. Thus a better efficiency, privacy and reduced time
complexity is achieved with our work.
REFERENCES
[1]
Arik Friedman and Assaf Schuster Technion (2010) ‘Data Mining with
Differential Privacy’, Israel Institute of Technology Haifa 32000, Israel, KDD
2010.
[2]
Cynthia Dwork (2008) ‘Differential privacy -A survey of results’, In TAMC,
pages1-19, 2008.
[3]
Cynthia Dwork, F. McSherry, K. Nissim, and A. Smith (2006) ‘Calibrating noise
to sensitivity in private data analysis’, In TCC, pages 265-284, 2006.
[4]
J. R. Quinlan (1986) ‘Induction of decision trees’, Machine Learning,
1(1):81-106, 1986.
[5]
Md. Zahidul Islam and Ljiljana Brankovic, (2004) ‘A Framework for Privacy
Preserving Classification in Data Mining’, The University of Newcastle
Callaghan, NSW 2308, Australia – ACSW 2004.
[6]
Mohammad Ali Kadampur, Somayajulu D.V.L.N, (2010) ‘A Noise Addition
Scheme in Decision Tree for Privacy Preserving Data Mining’, volume 2, issue
1, January 2010, ISSN 2151-9617.
[7]
M. R. Pawar ,Mampi Bhowmik, (2013) ‘ Privacy Preserving Decision Tree
Learning using unrealized data sets’, IJREAS Volume 3, Issue 3 (March 2013)
ISSN: 2249-3905.
[8]
Wei Peng, Juhua Chen and Haiping Zhou (2010) ‘An Implementation of
ID3 --Decision Tree Learning Algorithm’, Project of Comp 9417: Machine
Learning University of New South Wales, Sydney, NSW 2032, Australia.
[9]
Rakesh agarwal ,Ramakrishnan Srikant , Privacy Preserving Data Mining,
IBM Almaden Research Centre.
[10] Yehuda Lindell,Benny Pinkas, Privacy Preserving Data Mining.
Download