IEEE Paper Template in A4 (V1) - the Journal of Information

advertisement
JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN COMPUTER
SCIENCE AND APPLICATIONS
CLASSIFICATION OF RELEVANT AND
REDUNDANT INTRUSION DETECTION DATA USING
MACHINE LEARNING APPROACHES
1
1 Asst.
Ms. J. R. PATEL
Professor, Department of Computer Science, Veer Narmad South Gujarat
University, Surat
jayshri.r@gmail.com
ABSTRACT:
The development of data-mining applications such as classification and clustering has shown the need for
machine learning algorithms to be applied to intrusion detection data. In this paper we present the different
classification techniques for classifying intrusion detection data. The aim of this paper is to investigate the
performance of different classification methods for relevant and redundant intrusion detection data. The
Correlation based feature selection method is used to select relevant and redundant features from intrusion
detection data. The classification algorithms tested are Decision Tree, Naïve Bayes, OneR, Partial Decision tree
and Nearest Neighbors Algorithm.
Keywords— Intrusion detection, Correlation Based Feature Selection, Naïve bayes, Decision Tree, Nearest
Neighbor, OneR, Partial Decision Tree
I: INTRODUCTION
The Internet, along with its core benefit, also provides
numerous snaps to violate the stability and security of
the systems connected to it. Although static defense
mechanisms such as firewalls, software updates etc.
can provide a reasonable level of security, more
dynamic mechanisms such as IDS are also suggested
for better security. Intrusion Detection is defined as a
set of activities that attempt to distinguish the intrusive
and normal activities.
Intrusion detection is classified as host based or
network based. A host based IDS will monitor
resources such as system logs, file systems and disk
resources; whereas a network based IDS monitors the
data passing through the network.
The network intrusion detection has raw network
traffic which should be summarized into higher-level
objects such as connection records or audit record. The
audit record capture various features of the network
connections like duration, protocol type, source and
destination bytes of a TCP connection. Not all the
features of intrusion detection dataset are useful for
classification. Effective feature selection for intrusion
detection identifies some of the important features for
detecting anomalous network connections. Feature
selection reduces memory requirement and increases
the speed of execution thereby increases the overall
performance. We have used the Correlation based
feature selection method which will select 10+1 (class
label) features among 42 features for classification. We
apply Decision Tree, Naïve Bayes, OneR, Partial
Decision Tree (PART) and K-Nearest Neighbors
classification algorithm.
II: RELATED WORK
In [1], Stein et al. uses Decision Tree classifier for
Intrusion detection with GA based feature selection to
improve the classification abilities of the decision tree
classifier. They use a genetic algorithm to select a
subset of input features for decision tree classifiers to
increase the detection rate and decrease the false alarm
rate in network intrusion detection. In [2], Mukkhmala
et al. uses decision tree and Support Vector Machine
(SVM) to model IDS. They compare the performance
of SVM and Decision tree and found that Decision tree
gives better overall performance than the SVM. In [3],
Lee et al. has provided a data mining framework for
constructing intrusion detection models. They compute
activity patterns from system audit data and extracts
predictive features from the patterns. They apply then
machine learning algorithms to the audit records that
are processed according to the feature definitions to
generate intrusion detection rules. They extend the
basic association rules and frequent episodes
algorithms to in analyzing audit data.
III: CORRELATION BASED FEATURE
SELECTION TO INTRUSION
DETECTION DATA
As discussed by Hall in [4], Correlation Based Feature
Selection (CFS) evaluates the worth of a subset of
attributes by considering the individual predictive
ability of each feature along with the degree of
redundancy between them. It gives high scores to
subsets that include features that are highly correlated
to the class attribute but have low correlation to each
other. Correlation coefficient is used to estimate
correlation between subset of attributes and class, as
ISSN: 0975 – 6728| NOV 12TO OCT 13 | VOLUME – 02, ISSUE - 02
Page 103
JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN COMPUTER
SCIENCE AND APPLICATIONS
well as inter-correlations between the features.
Relevance of a group of features grows with the
correlation between features and classes, and decreases
with growing inter-correlation. CFS is used to
determine the best feature subset. Equation for CFS is
given is equation 1.
rzc  k rzi
k  k (k  1)rii
(1)
Where rzc is the correlation between the summed
feature subsets and the class variable, k is the number
of subset features, rzi is the average of the correlations
between the subset features and the class variable, and
rii is the average inter-correlation between subset
features. The CFS method takes the subset evaluation
approach, which handles feature redundancy with
feature relevance.
IV: MACHINE LEARNING APPROACHES FOR
THE CLASSIFICATION OF INTRUSION
DETECTION
Intrusion detection can be considered as classification
problem where each connection record is identified as
normal or intrusive based on some existing data.
Classification for intrusion detection is an important
challenge because it is very difficult to detect several
new attacks, as the attackers are continuously changing
their attack patterns. Several machine learning
approaches are used for the classification of intrusion
detection data. The classification algorithms are
discussed in the following section.
1) DECISION TREE ALGORITHM
A decision tree is a tree structure comprising a set of
conditions organized in a hierarchical structure. It is
consisting of internal and external nodes connected by
branches. An internal node is a decision making until
that evaluates a decision function to determine which
child node to visit next. The external node or leaf node
has no child nodes and is associated with a class label.
A decision tree can easily be converted to a set of
classification rules. Many decision tree construction
algorithms involve a two-step process. First, a decision
tree is constructed and then the given tree is pruned.
The pruned decision tree that is used for classification
purposes is called the classification tree. [5]
2) NAÏVE BAYES ALGORITHM
Bayesian classifiers are statistical classifiers and are
based on Bayes’ theorem. Naïve Bayesian classifiers
assume that the effect of an attribute value given on a
given class is independent of the values of the other
attributes [5]. Using Naïve Bayes for intrusion
detection, we can calculate the probability that an
attack is occurring based on some data by first
calculating the probability that some previous data was
part of that type of attack and then multiplying by the
probability of that type of attack occurring.
3) OneR ALGORITHM
OneR generates a one-level decision tree expressed in
the form of a set of rules that all test one particular
attribute. OneR is a simple, cheap method that often
comes up with quite good rules for characterizing the
structure in data. Sometimes the simple rules
frequently achieve surprisingly high accuracy. The
Pseudocode for OneR is as follows [6]:
For each attribute,
For each value of that attribute, make a rule as follows:
count how often each class appears
find the most frequent class
make the rule assign that class to this attribute-value.
Calculate the error rate of the rules.
Choose the rules with the smallest error rate.
Fig.1 Pseudocode for 1R.
4) PARTIAL DECISION TREE ALGORITHM
PART combines the divide-and-conquer strategy for
decision tree learning with the separate-and-conquer
one for rule learning. It adopts the separate-andconquer strategy in that it builds a rule, removes the
instances it covers, and continues creating rules
recursively for the remaining instances until none are
left. To generate such a tree, the construction and
pruning operations are integrated in order to find a
“stable” subtree that can be simplified no further. Once
this subtree has been found, tree building ceases and a
single rule is read off. The tree-building algorithm [6]
is summarized in Fig. 2:
Expand-subset (S):
Choose a test T and use it to split the set of examples
into subsets
Sort subsets into increasing order of average entropy
while (there is a subset X that has not yet been
expanded
AND all subsets expanded so far are leaves)
expand-subset(X)
if (all the subsets expanded are leaves
AND estimated error for subtree
node)
undo expansion into subsets and make node a leaf
Fig. 2 Algorithm for expanding examples into a partial
tree.
5) K-NEAREST NEIGHBOR ALGORITHM
Nearest neighbor classifiers is based on learning by
analogy, that is, by comparing a given test tuple with
training tuples (which are described by n attributes)
that are similar to it. Each tuple represents a point in an
n-dimensional space. When given an unknown tuple, a
K-Nearest neighbor classifier searches the pattern
space for the k training tuples that are closet to the
unknown tuple. These k-training tuples are the k
“nearest neighbors” of the unknown tuple. The
unknown tuple is assigned the most common class
among its k nearest neighbors. “Closeness” is defined
in terms of a distance metric, such as Euclidean
distance metric [5].
V: EXPERIMENTS AND RESULTS
The data for the experiments were prepared by the
KDDCUP 1999 DARPA intrusion detection evaluation
program by MIT Lincoln Laboratory. As given in [7],
the data set contains 4 main attack categories namely
ISSN: 0975 – 6728| NOV 12TO OCT 13 | VOLUME – 02, ISSUE - 02
Page 104
JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN COMPUTER
SCIENCE AND APPLICATIONS
Denial of Service (DoS), Remote to User (R2L), User
to Root (U2R) and Probing. It includes total 24
different attacks types among the 4 main categories.
The original data set has 41 attributes for each
connection record plus one class label. Example of
features is protocol type, duration of each connection
etc.
For performing the experiments I have used the open
source package WEKA taken from [8]. I have first
preprocessed the original dataset by creating only 4
main attack categories instead on 24 different attack
types. Then CFS feature selection method with
BestFirst is applied to the preprocessed dataset. I have
generated preprocessed dataset with selected 10
features. Using this preprocessed data set, the Decision
Tree (J48), Naïve Bayes, OneR, PART and K-Nearest
Neighbor classification algorithm constructs a model
using 10 fold cross validation. The result of the
classification is given in the figure 3, 4, and 5. From
the below figures it is clearly said that PART and J48
classifiers gives the highest accuracy of 99.96% and
99.95% respectively. The lowest accuracy is provided
by naïve bayes which is of 93.93%. The OneR and
PART also provides 99.05% and 99.87% accuracy.
The Kappa stastics measure also shows the value
which is very nearer to one. For intrusion detection
classification if we provide the relevant and redundant
features to the various classifiers it provides the
considerable efficient results.
CLASSI-
CORRECTLY
INCORRECTLY
FIER
CLASSIFIED
INSTANCES
CLASSIFIED
KAPPA
STATIST-
INSTANCES
(%)
ICS
(%)
J48
99.95
0.0420
0.969
NAÏVE
BAYES
93.93
6.0684
0.8805
ONER
99.05
0.9472
0.9805
PART
99.96
0.0378
0.9992
KNN
99.87
0.1277
0.9974
Fig. 3 Results of various classifiers
Correctly classified instances (%)
101. 00
100. 00
99. 00
98. 00
97. 00
96. 00
95. 00
94. 00
93. 00
92. 00
91. 00
Incorrectly classified Instances(%)
7. 0000
6. 0000
5. 0000
4. 0000
3. 0000
2. 0000
1. 0000
0. 0000
J48
Naïve B ayes
OneR
P A RT
K NN
Fig. 5 Comparison of various classifiers
VI: CONCLUSION
From the above experiment & result analysis it is very
clear that to evaluate and investigate five selected
classification algorithms for relevant and redundant
intrusion detection data. The best algorithm based on
the pre-processed intrusion detection data is PART
with an accuracy of 99.96%. These results suggest that
among the machine learning algorithm tested, PART
and Decision tree classifier has the potential to
significantly improve classification results for intrusion
detection.
REFERENCES
[1] Gary Stein, Bing Chen, Annie S. Wu, Kien A. Hua
“Decision tree classifier for network intrusion
detection with GA based feature selection” ACM-SE
43: Proceedings of the 43rd annual Southeast regional
conference- Vol. 2, 136-141, March [2005].
[2] Mukkamala S., Sung A.H. and Abraham A.,
“Intrusion Detection Using Ensemble of Soft
Computing
Paradigms”,
Third
International
Conference on Intelligent Systems Design and
Applications, Springer Verlag Germany, 239-248,
[2003].
[3] Wenke Lee and Salvatore J.Stolfo, "A Framework
for constructing features and models for intrusion
detection systems”, ACM transactions on Information
and system security (TISSEC), vol.3, November 227261, [2000].
[4] M. A. Hall “Correlation-based feature selection for
discrete and numeric class machine learning.” In
Proceedings of the Seventeenth International
Conference on Machine Learning, 359–366, [2000].
[5] Jiawei Han,Micheline Kamber, “Data Mining:
Concepts and Techniques”, 2nd Edition, Morgan
Kaufmann [2006]
[6] Ian H. Witten and Eibe Frank. “Data Mining:
Practical machine learning tools and techniques”. 2nd
Edition, Morgan Kaufmann [2005]
[7]
KDD
cup
99,
http://kdd.ics.uci.edu/database/kddcup99/kddcup.data
10 percent.gz.
[8] http://www.cs.waikato.ac.nz/~ml/weka.
90. 00
J48
Naïve B ayes
OneR
P A RT
K NN
Fig. 4 Comparison of various classifiers
ISSN: 0975 – 6728| NOV 12TO OCT 13 | VOLUME – 02, ISSUE - 02
Page 105
Download