JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN COMPUTER SCIENCE AND APPLICATIONS CLASSIFICATION OF RELEVANT AND REDUNDANT INTRUSION DETECTION DATA USING MACHINE LEARNING APPROACHES 1 1 Asst. Ms. J. R. PATEL Professor, Department of Computer Science, Veer Narmad South Gujarat University, Surat jayshri.r@gmail.com ABSTRACT: The development of data-mining applications such as classification and clustering has shown the need for machine learning algorithms to be applied to intrusion detection data. In this paper we present the different classification techniques for classifying intrusion detection data. The aim of this paper is to investigate the performance of different classification methods for relevant and redundant intrusion detection data. The Correlation based feature selection method is used to select relevant and redundant features from intrusion detection data. The classification algorithms tested are Decision Tree, Naïve Bayes, OneR, Partial Decision tree and Nearest Neighbors Algorithm. Keywords— Intrusion detection, Correlation Based Feature Selection, Naïve bayes, Decision Tree, Nearest Neighbor, OneR, Partial Decision Tree I: INTRODUCTION The Internet, along with its core benefit, also provides numerous snaps to violate the stability and security of the systems connected to it. Although static defense mechanisms such as firewalls, software updates etc. can provide a reasonable level of security, more dynamic mechanisms such as IDS are also suggested for better security. Intrusion Detection is defined as a set of activities that attempt to distinguish the intrusive and normal activities. Intrusion detection is classified as host based or network based. A host based IDS will monitor resources such as system logs, file systems and disk resources; whereas a network based IDS monitors the data passing through the network. The network intrusion detection has raw network traffic which should be summarized into higher-level objects such as connection records or audit record. The audit record capture various features of the network connections like duration, protocol type, source and destination bytes of a TCP connection. Not all the features of intrusion detection dataset are useful for classification. Effective feature selection for intrusion detection identifies some of the important features for detecting anomalous network connections. Feature selection reduces memory requirement and increases the speed of execution thereby increases the overall performance. We have used the Correlation based feature selection method which will select 10+1 (class label) features among 42 features for classification. We apply Decision Tree, Naïve Bayes, OneR, Partial Decision Tree (PART) and K-Nearest Neighbors classification algorithm. II: RELATED WORK In [1], Stein et al. uses Decision Tree classifier for Intrusion detection with GA based feature selection to improve the classification abilities of the decision tree classifier. They use a genetic algorithm to select a subset of input features for decision tree classifiers to increase the detection rate and decrease the false alarm rate in network intrusion detection. In [2], Mukkhmala et al. uses decision tree and Support Vector Machine (SVM) to model IDS. They compare the performance of SVM and Decision tree and found that Decision tree gives better overall performance than the SVM. In [3], Lee et al. has provided a data mining framework for constructing intrusion detection models. They compute activity patterns from system audit data and extracts predictive features from the patterns. They apply then machine learning algorithms to the audit records that are processed according to the feature definitions to generate intrusion detection rules. They extend the basic association rules and frequent episodes algorithms to in analyzing audit data. III: CORRELATION BASED FEATURE SELECTION TO INTRUSION DETECTION DATA As discussed by Hall in [4], Correlation Based Feature Selection (CFS) evaluates the worth of a subset of attributes by considering the individual predictive ability of each feature along with the degree of redundancy between them. It gives high scores to subsets that include features that are highly correlated to the class attribute but have low correlation to each other. Correlation coefficient is used to estimate correlation between subset of attributes and class, as ISSN: 0975 – 6728| NOV 12TO OCT 13 | VOLUME – 02, ISSUE - 02 Page 103 JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN COMPUTER SCIENCE AND APPLICATIONS well as inter-correlations between the features. Relevance of a group of features grows with the correlation between features and classes, and decreases with growing inter-correlation. CFS is used to determine the best feature subset. Equation for CFS is given is equation 1. rzc k rzi k k (k 1)rii (1) Where rzc is the correlation between the summed feature subsets and the class variable, k is the number of subset features, rzi is the average of the correlations between the subset features and the class variable, and rii is the average inter-correlation between subset features. The CFS method takes the subset evaluation approach, which handles feature redundancy with feature relevance. IV: MACHINE LEARNING APPROACHES FOR THE CLASSIFICATION OF INTRUSION DETECTION Intrusion detection can be considered as classification problem where each connection record is identified as normal or intrusive based on some existing data. Classification for intrusion detection is an important challenge because it is very difficult to detect several new attacks, as the attackers are continuously changing their attack patterns. Several machine learning approaches are used for the classification of intrusion detection data. The classification algorithms are discussed in the following section. 1) DECISION TREE ALGORITHM A decision tree is a tree structure comprising a set of conditions organized in a hierarchical structure. It is consisting of internal and external nodes connected by branches. An internal node is a decision making until that evaluates a decision function to determine which child node to visit next. The external node or leaf node has no child nodes and is associated with a class label. A decision tree can easily be converted to a set of classification rules. Many decision tree construction algorithms involve a two-step process. First, a decision tree is constructed and then the given tree is pruned. The pruned decision tree that is used for classification purposes is called the classification tree. [5] 2) NAÏVE BAYES ALGORITHM Bayesian classifiers are statistical classifiers and are based on Bayes’ theorem. Naïve Bayesian classifiers assume that the effect of an attribute value given on a given class is independent of the values of the other attributes [5]. Using Naïve Bayes for intrusion detection, we can calculate the probability that an attack is occurring based on some data by first calculating the probability that some previous data was part of that type of attack and then multiplying by the probability of that type of attack occurring. 3) OneR ALGORITHM OneR generates a one-level decision tree expressed in the form of a set of rules that all test one particular attribute. OneR is a simple, cheap method that often comes up with quite good rules for characterizing the structure in data. Sometimes the simple rules frequently achieve surprisingly high accuracy. The Pseudocode for OneR is as follows [6]: For each attribute, For each value of that attribute, make a rule as follows: count how often each class appears find the most frequent class make the rule assign that class to this attribute-value. Calculate the error rate of the rules. Choose the rules with the smallest error rate. Fig.1 Pseudocode for 1R. 4) PARTIAL DECISION TREE ALGORITHM PART combines the divide-and-conquer strategy for decision tree learning with the separate-and-conquer one for rule learning. It adopts the separate-andconquer strategy in that it builds a rule, removes the instances it covers, and continues creating rules recursively for the remaining instances until none are left. To generate such a tree, the construction and pruning operations are integrated in order to find a “stable” subtree that can be simplified no further. Once this subtree has been found, tree building ceases and a single rule is read off. The tree-building algorithm [6] is summarized in Fig. 2: Expand-subset (S): Choose a test T and use it to split the set of examples into subsets Sort subsets into increasing order of average entropy while (there is a subset X that has not yet been expanded AND all subsets expanded so far are leaves) expand-subset(X) if (all the subsets expanded are leaves AND estimated error for subtree node) undo expansion into subsets and make node a leaf Fig. 2 Algorithm for expanding examples into a partial tree. 5) K-NEAREST NEIGHBOR ALGORITHM Nearest neighbor classifiers is based on learning by analogy, that is, by comparing a given test tuple with training tuples (which are described by n attributes) that are similar to it. Each tuple represents a point in an n-dimensional space. When given an unknown tuple, a K-Nearest neighbor classifier searches the pattern space for the k training tuples that are closet to the unknown tuple. These k-training tuples are the k “nearest neighbors” of the unknown tuple. The unknown tuple is assigned the most common class among its k nearest neighbors. “Closeness” is defined in terms of a distance metric, such as Euclidean distance metric [5]. V: EXPERIMENTS AND RESULTS The data for the experiments were prepared by the KDDCUP 1999 DARPA intrusion detection evaluation program by MIT Lincoln Laboratory. As given in [7], the data set contains 4 main attack categories namely ISSN: 0975 – 6728| NOV 12TO OCT 13 | VOLUME – 02, ISSUE - 02 Page 104 JOURNAL OF INFORMATION, KNOWLEDGE AND RESEARCH IN COMPUTER SCIENCE AND APPLICATIONS Denial of Service (DoS), Remote to User (R2L), User to Root (U2R) and Probing. It includes total 24 different attacks types among the 4 main categories. The original data set has 41 attributes for each connection record plus one class label. Example of features is protocol type, duration of each connection etc. For performing the experiments I have used the open source package WEKA taken from [8]. I have first preprocessed the original dataset by creating only 4 main attack categories instead on 24 different attack types. Then CFS feature selection method with BestFirst is applied to the preprocessed dataset. I have generated preprocessed dataset with selected 10 features. Using this preprocessed data set, the Decision Tree (J48), Naïve Bayes, OneR, PART and K-Nearest Neighbor classification algorithm constructs a model using 10 fold cross validation. The result of the classification is given in the figure 3, 4, and 5. From the below figures it is clearly said that PART and J48 classifiers gives the highest accuracy of 99.96% and 99.95% respectively. The lowest accuracy is provided by naïve bayes which is of 93.93%. The OneR and PART also provides 99.05% and 99.87% accuracy. The Kappa stastics measure also shows the value which is very nearer to one. For intrusion detection classification if we provide the relevant and redundant features to the various classifiers it provides the considerable efficient results. CLASSI- CORRECTLY INCORRECTLY FIER CLASSIFIED INSTANCES CLASSIFIED KAPPA STATIST- INSTANCES (%) ICS (%) J48 99.95 0.0420 0.969 NAÏVE BAYES 93.93 6.0684 0.8805 ONER 99.05 0.9472 0.9805 PART 99.96 0.0378 0.9992 KNN 99.87 0.1277 0.9974 Fig. 3 Results of various classifiers Correctly classified instances (%) 101. 00 100. 00 99. 00 98. 00 97. 00 96. 00 95. 00 94. 00 93. 00 92. 00 91. 00 Incorrectly classified Instances(%) 7. 0000 6. 0000 5. 0000 4. 0000 3. 0000 2. 0000 1. 0000 0. 0000 J48 Naïve B ayes OneR P A RT K NN Fig. 5 Comparison of various classifiers VI: CONCLUSION From the above experiment & result analysis it is very clear that to evaluate and investigate five selected classification algorithms for relevant and redundant intrusion detection data. The best algorithm based on the pre-processed intrusion detection data is PART with an accuracy of 99.96%. These results suggest that among the machine learning algorithm tested, PART and Decision tree classifier has the potential to significantly improve classification results for intrusion detection. REFERENCES [1] Gary Stein, Bing Chen, Annie S. Wu, Kien A. Hua “Decision tree classifier for network intrusion detection with GA based feature selection” ACM-SE 43: Proceedings of the 43rd annual Southeast regional conference- Vol. 2, 136-141, March [2005]. [2] Mukkamala S., Sung A.H. and Abraham A., “Intrusion Detection Using Ensemble of Soft Computing Paradigms”, Third International Conference on Intelligent Systems Design and Applications, Springer Verlag Germany, 239-248, [2003]. [3] Wenke Lee and Salvatore J.Stolfo, "A Framework for constructing features and models for intrusion detection systems”, ACM transactions on Information and system security (TISSEC), vol.3, November 227261, [2000]. [4] M. A. Hall “Correlation-based feature selection for discrete and numeric class machine learning.” In Proceedings of the Seventeenth International Conference on Machine Learning, 359–366, [2000]. [5] Jiawei Han,Micheline Kamber, “Data Mining: Concepts and Techniques”, 2nd Edition, Morgan Kaufmann [2006] [6] Ian H. Witten and Eibe Frank. “Data Mining: Practical machine learning tools and techniques”. 2nd Edition, Morgan Kaufmann [2005] [7] KDD cup 99, http://kdd.ics.uci.edu/database/kddcup99/kddcup.data 10 percent.gz. [8] http://www.cs.waikato.ac.nz/~ml/weka. 90. 00 J48 Naïve B ayes OneR P A RT K NN Fig. 4 Comparison of various classifiers ISSN: 0975 – 6728| NOV 12TO OCT 13 | VOLUME – 02, ISSUE - 02 Page 105