KDD cup 99 Dataset

ANFIS Classifier for Network Intrusion Detection System ‫دكترمحسن كاهاني‬ http://www.um.ac.ir/~kahani/ Network Intrusion Detection  Widespread use of computer networks  Number of attacks and New hacking tools and Intrusive methods  An Intrusion Detection System (IDS) is one way of dealing with suspicious activities within a network.  IDS  Monitors the activities of a given environment  Decides whether these activities are malicious (intrusive) or legitimate (normal). ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ Soft Computing and IDS  Many soft computing approaches have been applied to the intrusion detection field.  Our Novel Network IDS includes  Neuro-Fuzzy  Fuzzy  Genetic algorithms  Key Contributions  Utilization of outputs of neuro-fuzzy network as linguistic variables which expresses how reliable current output is. ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ KDD cup 99 Dataset  Comparison of different works in IDS area  Needs of Standard dataset for evaluation of computer network IDSes.  Fifth ACM SIGKDD International Conference on Knowledge Discovery and Data Mining Collected and generated TCP dump data of simulated network in the form of train-and-test sets of features defined for the connection records.  We name this standard Dataset as KDD cup 99 dataset and will use it for our experiments. ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ KDD cup 99 Dataset  41 features derived for each connection.  A label which specifies the status of connection records as either normal or specific attack type.  Features fall in four categories  The intrinsic features e.g. duration of the connection , type of the protocol (tcp, udp, etc), network service (http, telnet, etc), etc.  The content feature e.g. number of failed login attempts etc.  The same host features examine established connections in the past two seconds that have the same destination host as the current connection, and calculate statistics related to the protocol behavior, service, etc  The similar same service features examine the connections in the past two seconds that have the same service as the current connection. ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ Basic features of individual TCP connections feature name description type duration length (number of seconds) of the connection continuous protocol_type type of the protocol, e.g. tcp, udp, etc. discrete service network service on the destination, e.g., http, telnet, etc. discrete src_bytes number of data bytes from source to destination continuous dst_bytes number of data bytes from destination to source continuous flag normal or error status of the connection discrete land 1 if connection is from/to the same host/port; 0 otherwise discrete wrong_fragment number of ``wrong'' fragments continuous urgent number of urgent packets continuous ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ Content features within a connection suggested by domain knowledge feature name description type hot number of ``hot'' indicators continuous num_failed_logins number of failed login attempts continuous logged_in 1 if successfully logged in; 0 otherwise discrete num_compromised number of ``compromised'' conditions continuous root_shell 1 if root shell is obtained; 0 otherwise discrete su_attempted 1 if ``su root'' command attempted; 0 otherwise discrete num_root number of ``root'' accesses continuous num_file_creations number of file creation operations continuous num_shells number of shell prompts continuous num_access_files number of operations on access control files continuous num_outbound_cmds number of outbound commands in an ftp session continuous is_hot_login 1 if the login belongs to the ``hot'' list; 0 otherwise discrete is_guest_login 1 if the login is a ``guest''login; 0 otherwise discrete ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ Traffic features computed using a two-second time window feature name description type count number of connections to the same host as the current connection in the past two seconds continuous Note: The following features refer to these same-host connections. serror_rate % of connections that have ``SYN'' errors continuous rerror_rate % of connections that have ``REJ'' errors continuous same_srv_rate % of connections to the same service continuous diff_srv_rate % of connections to different services continuous srv_count number of connections to the same service as the current connection in the past two seconds continuous Note: The following features refer to these same-service connections. srv_serror_rate % of connections that have ``SYN'' errors continuous srv_rerror_rate % of connections that have ``REJ'' errors continuous srv_diff_host_rate % of connections to different host continuous ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ KDD CUP 99 Sample Data 0,tcp,http,SF,200,4213,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,15,15,0.00,0.00,0.00,0.00,1.00,0.00,0.00,31,255,1.00,0.00,0.03,0.02,0. 00,0.00,0.00,0.00,normal. 0,tcp,http,SF,293,4203,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,4,255,1.00,0.00,0.25,0.02,0.00,0.00,0.00,0.00,normal. 0,tcp,http,SF,296,6903,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,2,0.00,0.00,0.00,0.00,1.00,0.00,1.00,2,255,1.00,0.00,0.50,0.03,0.00,0.00,0.00,0.00,normal. 0,udp,domain_u,SF,104,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,0.00,0.00,0.00,0.00,1.00,0.00,1.00,56,56,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,normal. 0,udp,domain_u,SF,103,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,0.00,0.00,0.00,0.00,1.00,0.00,1.00,66,66,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,normal. 0,udp,domain_u,SF,89,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,2,0.00,0.00,0.00,0.00,1.00,0.00,1.00,76,76,1.00,0.00,1.00,0.00,0.00,0.00,0.00,0.00,normal. 0,udp,domain_u,SF,79,32,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,86,85,0.99,0.02,0.99,0.00,0.00,0.00,0.00,0.00,normal. 0,tcp,smtp,SF,1367,335,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,21,72,0.90,0.10,0.05,0.04,0.00,0.00,0.00,0.00,normal. 184,tcp,telnet,SF,1511,2957,0,0,0,3,0,1,2,1,0,0,1,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,1,3,1.00,0.00,1.00,0.67,0.00,0.00,0.00,0.00,buffer_overflow. 305,tcp,telnet,SF,1735,2766,0,0,0,3,0,1,2,1,0,0,1,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,2,4,1.00,0.00,0.50,0.50,0.00,0.00,0.00,0.00,buffer_overflow. 0,tcp,smtp,SF,1518,405,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,4,0.00,0.00,0.00,0.00,1.00,0.00,1.00,42,108,0.74,0.07,0.02,0.04,0.05,0.00,0.00,0.00,normal. 0,tcp,smtp,SF,1173,403,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,52,116,0.75,0.06,0.02,0.03,0.04,0.00,0.00,0.00,normal. 257,tcp,telnet,SF,181,1222,0,0,0,0,0,1,0,0,0,0,2,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,62,15,0.21,0.05,0.02,0.13,0.03,0.13,0.00,0.00,normal. 0,tcp,smtp,SF,2302,410,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,72,117,0.76,0.04,0.01,0.03,0.03,0.00,0.00,0.00,normal. 1,tcp,smtp,SF,1587,332,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,3,120,1.00,0.00,0.33,0.04,0.00,0.00,0.00,0.00,normal. 0,tcp,smtp,SF,1552,333,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,2,0.00,0.00,0.00,0.00,1.00,0.00,1.00,13,121,0.85,0.15,0.08,0.04,0.00,0.00,0.00,0.00,normal. 0,tcp,finger,SF,10,223,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,23,14,0.22,0.13,0.04,0.29,0.00,0.00,0.00,0.00,normal. 0,tcp,smtp,SF,971,335,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,16,120,0.94,0.12,0.06,0.03,0.00,0.00,0.00,0.00,normal. 1,tcp,smtp,SF,2007,335,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,3,0.00,0.00,0.00,0.00,1.00,0.00,1.00,26,129,0.92,0.12,0.04,0.03,0.00,0.00,0.00,0.00,normal. 0,tcp,finger,SF,8,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,3,16,0.67,0.67,0.33,0.31,0.00,0.00 ,0.00,0.00,normal. 0,tcp,smtp,SF,880,327,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,18,195,0.89,0.11,0.06,0.03,0.00,0.00,0.00,0.00,normal. 0,tcp,smtp,SF,4031,322,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,28,205,0.93,0.07,0.04,0.03,0.00,0.00,0.00,0.00,normal. 27,tcp,ftp,SF,916,2720,0,0,0,19,0,1,0,0,0,0,0,0,0,0,0,1,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,5,5,1.00,0.00,0.20,0.00,0.00,0.00,0.00,0.00,normal. 0,tcp,smtp,SF,2012,325,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,15,207,0.27,0.13,0.07,0.03,0.00,0.00,0.00,0.00,normal. 20,tcp,ftp,SF,239,774,0,0,0,4,0,1,0,0,0,0,0,0,0,0,0,1,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,55,34,0.62,0.04,0.02,0.00,0.00,0.00,0.00,0.00,normal. 23,tcp,ftp,SF,342,1072,0,0,0,6,0,1,0,0,0,0,0,0,0,0,0,1,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,65,40,0.62,0.03,0.02,0.00,0.00,0.00,0.00,0.00,normal. 1,tcp,smtp,SF,1609,364,0,0,0,0,0,1,0,0,0,0,0,0,0,0,0,0,1,4,0.00,0.00,0.00,0.00,1.00,0.00,1.00,75,187,0.37,0.03,0.01,0.03,0.00,0.00,0.00,0.00,normal. 21,tcp,ftp,SF,227,766,0,0,0,4,0,1,0,0,0,0,0,0,0,0,0,1,1,1,0.00,0.00,0.00,0.00,1.00,0.00,0.00,85,50,0.59,0.02,0.01,0.00,0.00,0.00,0.00,0.00,normal. 0,tcp,http,SF,54540,8314,0,0,0,2,0,1,1,0,0,0,0,0,0,0,0,0,2,2,0.00,0.00,0.00,0.00,1.00,0.00,0.00,111,111,1.00,0.00,0.01,0.00,0 .00,0.00,0.01,0.01,back. 0,tcp,http,RSTR,53452,2920,0,0,0,1,0,1,0,0,0,0,0,0,0,0,0,0,3,3,0.00,0.00,0.33,0.33,1.00,0.00,0.00,112,112,1.00,0.00,0.01,0.00 ,0.00,0.00,0.02,0.02,back. 0,tcp,http,SF,54540,8314,0,0,0,2,0,1,1,0,0,0,0,0,0,0,0,0,3,3,0.00,0.00,0.33,0.33,1.00,0.00,0.00,113,113,1.00,0.00,0.01,0.00,0 .00,0.00,0.02,0.02,back. 0,icmp,ecr_i,SF,1480,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,19,19,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,19,0.07,0.02,0.07,0.00,0.00,0.00,0.00,0.00,pod. 0,icmp,ecr_i,SF,1480,0,0,1,0,0,0,0,0,0,0,0,0,0,0,0,0,0,20,20,0.00,0.00,0.00,0.00,1.00,0.00,0.00,255,20,0.08,0.02,0.08,0.00,0.00,0.00,0.00,0.00,pod. 0,tcp,private,RSTR,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,0,1,1,0.00,0.00,1.00,1.00,1.00,0.00,0.00,255,1,0.00,0.02,0.00,0.00,0.00,0.00,0.00,1.00,portsweep. ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ KDD cup 99 Dataset  Attacks fall into four main categories  DOS (Denial of service): making some computing or memory resources too busy so that they deny legitimate users access to these resources.  R2L (Root to local): unauthorized access from a remote machine according to exploit machine's vulnerabilities.  U2R (User to root): unauthorized access to local super user (root) privileges using system's susceptibility.  PROBE: host and port scans as precursors to other attacks. An attacker scans a network to gather ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ information or find known vulnerabilities. KDD Cup 99 Dataset cont.  KDD dataset is divided into following record sets:  Training  Testing  Original training dataset was too large for our purpose10% training dataset, was employed here for training phase. ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ KDD Cup 99 Sample Distribution THE SAMPLE DISTRIBUTIONS ON THE SUBSET OF 10% DATA OF KDD CUP 99 DATASET Class Number of Samples Samples Percent Normal Probe DoS U2R R2L 97277 4107 391458 52 1126 19.69% 0.83% 79.24% 0.01% 0.23% 492021 100% THE SAMPLE DISTRIBUTIONS ON THE TEST DATA WITH THE CORRECTED LABELS OF KDD CUP 99 DATASET Class Number of Samples Samples Percent Normal Probe DoS U2R R2L 60593 4166 229853 228 16189 311029 19.48% 1.34% 73.90% 0.07% 5.20% 100% ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ ANFIS  ANFIS as an adaptive neuro-fuzzy inference system  Ability to construct models solely based on the target system sample (Learning)  Adopt itself through repeated training (Adaptation)  Above abilities among others qualifies ANFIS as a fuzzy classifier for IDS  Here we use ANFIS as Neuro-fuzzy classifier to detect intrusions in computer networks based on KDD cup 99 datasets. ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ Generating Target fuzzy Inference System  Grid partitioning  all the possible rules are generated based on the number of MFs for each input  For example in a two dimensional input space, with three MFs in the input sets, the number of rules in grid partitioning will result in 9 rules.  Subtractive clustering  Subtractive Clustering is a fast, one-pass algorithm for estimating the number of clusters and the cluster centers in a set of data.  The clusters’ information obtained by this method is used for determining the initial number of rules and antecedent membership functions, which is used for identifying the FIS. ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ Initial SYSTEM ARCHITECTURE  Features of KDD had all forms continuous, discrete, and symbolic.  Preprocessing: mapping symbolic valued attributes to numeric ones.  150000 randomly selected points of the subset of 10% of data is used as training.  Randomly 40000 records of data selected as the checking data (used for validating model).  Five trails of 40000 sampled connections from the source of training dataset that does not overlap neither with training set nor each others, have been carried out as the testing data. ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ Initial SYSTEM ARCHITECTURE  Subtractive Clustering Method with ra=0.5 (neighborhood radius) partitions the training data and generates an FIS structure.  Then for further fine-tuning and adaptation of membership functions, training dataset was used for training ANFIS while the checking dataset was used for validating the model identified.  The final ANFIS contains 212 nodes and a total number of 284 fitting parameters, of which 164 are premise parameters and 84 are consequent parameters. ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ Initial SYSTEM ARCHITECTURE  Training ANFIS causes further fine-tuning and adaptation of initial membership functions. Initial and final membership functions of some input features are illustrated here. ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ Initial SYSTEM ARCHITECTURE  ANFIS structure has one output, basically.  We need to gain an approximate class number by rounding off the output number of ANFIS. Γ is the parameter for rounding off which gives us the integer value. ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ Standard metrics for evaluating network IDSes  Some Definition  Detection rate is computed as the ratio between the number of correctly detected attacks and the total number of attacks,  False alarm (false positive) rate is computed as the ratio between the number of normal connections that is incorrectly misclassified as attacks and the total number of normal connections.  Classification rate is defined as ratio between number of test instances correctly classified and the total number of test instances classified. ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ Results  False Alarm, Detection and classification rate for training and checking data, Γ=0.5 Data False Alarm Rate% Detection Rate% Classification Rate% Training 0.61 99.75 99.68 Checking 1.6 91.00 92.44  Error measures vs. epoch numbers for the training dataset ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ Results  Experiment 1  All the records of labeled test dataset (corrected) as the testing data to evaluate our classifiers  False Alarm, Detection and Classification Rate for test data of first experiment; Γ=0.5 Data False Alarm Rate % Detection Rate% Classification Rate% Test 1.6 91.07 92.48 ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ Results  Experiment 2     5 trials of 40000 randomly selected 40000 samples. Average of the resulting. We compare our classifiers with different fuzzy algorithms. Comparing False Alarm, Detection and complexity of different algorithms. Algorithm False Alarm Rate% Detection Rate% Complexity Neuro-Fuzzy Classifier 0.59 99.54 O(n) SRPP [1] 3.58 99.08 O(n) EFRID [7] 7 98.96 O(n) RIPPER[5] 2.02 94.26 O(n × log2n) ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ ‫‪Final System architecture‬‬ ‫سيستمهاي خبره و مهندسي دانش‪-‬دكتر كاهاني‬ Proposed System(Data Sources)  The distribution of the samples in the two subsets that were used for the training SAMPLE DISTRIBUTIONS ON THE FIRST TRAINING AND CHECKING DATA RANDOMLY SELECTED OF 10% DATA OF KDD CUP 99 DATASET OF 10% DATA OF KDD CUP 99 DATASET ANFIS-N ANFIS-P ANFIS-D ANFIS-U ANFIS-R Training Checking Training Checking Training Checking Training Checking Training Checking Normal 20000 2500 10000 1000 25000 6000 200 100 4000 2000 Probe 4000 107 4000 107 4000 107 50 25 1000 500 DoS 15000 2000 5000 500 20000 5000 50 25 2000 1000 U2R 40 12 40 12 40 12 46 6 40 12 R2L 1000 126 1000 126 1000 126 50 25 1000 126 ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ Proposed System(Data Sources) cont. SAMPLE DISTRIBUTIONS ON THE SECOND TRAINING AND CHECKING DATA RANDOMLY SELECTED OF 10% DATA OF KDD CUP 99 DATASET OF 10% DATA OF KDD CUP 99 DATASET ANFIS-N ANFIS-P ANFIS-D ANFIS-U ANFIS-R Training Checking Training Checking Training Checking Training Checking Training Checking Normal 1500 1500 1500 1500 1500 1500 1500 1500 1500 1500 Probe 500 500 500 500 500 500 500 500 500 500 DoS 500 500 500 500 500 500 500 500 500 500 U2R 52 0 52 0 52 0 46 6 52 0 R2L 500 500 500 500 500 500 500 500 500 500 ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ Proposed System(ANFIS Classifiers)     The subtractive clustering method with ra=0.5 (neighborhood radius) has been used to partition the training sets and generate an FIS structure for each ANFIS. For further fine-tuning and adaptation of membership functions, training sets were used for training ANFIS. Each ANFIS trains at 50 epochs of learning and final FIS that is associated with the minimum checking error has been chosen. All the MFs of the input fuzzy sets were selected in the form of Gaussian functions with two parameters. ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ Proposed System(The Fuzzy Decision Module)       A five-input, single-output of Mamdani fuzzy inference system Centroid of area defuzzification Each input output fuzzy set includes two MFs All the MFs are Gaussian functions which are specified by four parameters. The output of the fuzzy inference engine, which varies between -1 and 1, Sspecifies how intrusive the current record is,  1 to show completely intrusive and -1 for completely normal FUZZY ASSOCIATIVE MEMORY FOR THE PROPOSED FUZZY INFERENCE RULES High Low - PROBE ¬High High Low DoS ¬High High Low U2R ¬High High Low R2L ¬High High Low Output Normal Normal Attack Attack Attack Attack Attack Normal ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ Proposed System(Genetic Algorithm Module)   A chromosome consists of 320 bits of binary data. 8 bits of a chromosome determines one parameter out of the four parameters of an MF. ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ Proposed System(Some Metrics)  Cost Per Example 1 m m CPE   CM (i, j ) * C (i, j ) N i 1 j 1  Where CM is a confusion matrix  Each column corresponds to the predicted class, while rows correspond to the actual classes. An entry at row i and column j, CM (i, j), represents the number of misclassified instances that originally belong to class i, although incorrectly identified as a member of class j. The entries of the primary diagonal, CM (i,i), stand for the number of properly detected instances.  C is a cost matrix  As well as CM,Entry C(i,j) represents the cost penalty for misclassifying an instance belonging to class i into class j.  N represents the total number of test instances,  m is the number of the classes in classification. ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ Proposed System(Fitness Function For GA)  Two different fitness functions  Cost Per Example with equal misclassification costs Actual PROBE DoS U2R R2L Normal 0 1 2 3 4 PROBE 1 0 1 2 2 Predicted DoS 2 2 0 2 2 U2R 2 2 2 0 2 R2L 2 2 2 2 0 Actual  cost per examples used for evaluating results of the KDD'99 competition Normal PROBE DoS U2R R2L Normal 0 1 1 1 1 PROBE 1 0 1 1 1 Predicted DoS 1 1 0 1 1 U2R 1 1 1 0 1 R2L 1 1 1 1 0 ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ Proposed System(Data Sources For GA) THE SAMPLE DISTRIBUTIONS ON THE SELECTED SUBSET OF 10% DATA OF KDD CUP 99 DATASET FOR THE OPTIMIZATION PROCESS WHICH IS USED BY GA Number of Samples Normal Probe DoS U2R R2L 200 104 200 52 104 ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ Results     10 subsets of training data for both series were used for the classifiers. The genetic algorithm was performed three times, each time for one of the five series of selected subsets. Totatally 150 different structures were used and the result is the average of the results of this 150 structures. Two different training datasets for training the classifiers and two different fitness functions to optimize the fuzzy decision-making module were used. ABBREVIATIONS USED FOR OUR APPROACHES Abbreviation ESC-KDD-1 Approach First Training set with fitness function of KDD ESC-EQU-1 First Training set with fitness function of equal misclassification cost ESC-KDD-2 Second Training set with fitness function of KDD ESC-EQU-2 Second Training set with fitness function of equal misclassification cost ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬ Results cont. CLASSIFICATION RATE, DETECTION RATE(DTR), FALSE ALARM RATE (FA) AND COST PER EXAMPLE OF KDD(CPE) FOR THE DIFFERENT APPROACHES OF ESC-IDS ON THE TEST DATASET WITH CORRECTED LABELS OF KDD CUP 99 DATASET Model ESC-KDD-1 ESC-EQU-1 ESC-KDD-2 ESC-EQU-2 Normal 98.2 98.4 96.5 96.9 Probe 84.1 89.2 79.2 79.1 DoS 99.5 99.5 96.8 96.3 U2R 14.1 12.8 8.3 8.2 R2L 31.5 27.3 13.4 13.1 DTR 95.3 95.3 91.6 88.1 FA 1.9 1.6 3.4 3.2 CPE 0.1579 0.1687 0.2423 0.2493 CLASSIFICATION RATE, DETECTION RATE (DTR), FALSE ALARM RATE (FA) AND COST PER EXAMPLE OF KDD (CPE) FOR THE DIFFERENT ALGORITHMS PERFORMANCES ON THE TEST DATASET WITH CORRECTED LABELS OF KDD CUP 99 DATASET (N/R STANDS FOR NOT REPORTED) Model ESC-IDS RSS-DSS Parzen-Window Multi-Classifier Winner of KDD Runner Up of KDD PNrule Normal 98.2 96.5 97.4 n/r 99.5 99.4 99.5 Probe 84.1 86.8 99.2 88.7 83.3 84.5 73.2 DoS 99.5 99.7 96.7 97.3 97.1 97.5 96.9 U2R 14.1 76.3 93.6 29.8 13.2 11.8 6.6 R2L 31.5 12.4 31.2 9.6 8.4 7.3 10.7 DTR 95.3 94.4 n/r n/r 91.8 91.5 91.1 FA 1.9 3.5 2.6 n/r 0.6 0.6 0.4 CPE 0.1579 n/r 0.2024 0.2285 0.2331 0.2356 0.2371 ‫دكتر كاهاني‬-‫سيستمهاي خبره و مهندسي دانش‬

KDD cup 99 Dataset

Related documents

Products

Support

KDD cup 99 Dataset

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib