www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242 Volume 4 Issue 2 February 2015, Page No. 10567-10569 Detecting Attack Packets by Using Darpa Dataset on Intrusion Detection System Ms. Sarika Rameshwar Rathi Dept of Computer Enginerring, MGM Polytechnic College,Aurangabad. Maharshtra,India. email:rathisarika11@gmail.com Abstract— today lot of valuable data is generated using many computers based application and stored back to the company database. But unfortunately, the threat to the same data is also increasing rapidly. So, development of a proper Intrusion Detection System which provides a right alarm is a hot topic today. A set of rules are used by Signature based Network Intrusion Detection Systems (NIDS) to detect hostile traffic in network segments or packets, which are so important in detecting malicious and anomalous behavior over the network like known attacks that hackers look for new techniques to go unseen.Theproblem of network intrusion detection is not just to identify theattacks connections, but also to know what type of an attack theconnection belongs to. The paper aims to build an innovative functional framework to NIDS. This framework can be used to audit NIDS.This framework shows that a proof of concept showing how to categorize the attacks. Keywords-Intrusion detection, Network intrusion detection system, Network security. 1. INTRODUCTION Many sites install an Intrusion Detection System (IDS) to monitor their hosts and networks for suspicious events. Many IDSs use a database of known events for comparison, sending alerts when a match is detected. Nowadays, many organizations and companies use Internet services as their communication and marketplace to do business such as at EBay and Amazon.com website. Together with the growth of computer network activities, the growing rate of network attacks has been advancing, impacting to the availability, confidentiality, and integrity of critical information data. Therefore a network system must use one or more security tools such as firewall, antivirus, IDS and Honey Pot to prevent important data from criminal enterprises. A network system using a firewall only is not enough to prevent networks from all attack types. The firewall cannot defense the network against intrusion attempts during the opening port. Hence a Real-Time Intrusion Detection System (RT-IDS) is a prevention tool that gives an alarm signal to the computer user or network administrator for 2. OVERVIEW In earlier method, C4.5 algorithm is used to this this algorithm KDD dataset was used. In this paper we are taking directly training and testing as first step for this step DARPA dataset as input. To reduce the complexity of C4.5 algorithm, we are using directly training phase. Because in training first it labelled so no need of C4.5 algorithm. Intrusion detection to identify attacks on computersystems has been a challenging problem in the domain of network security for quite some time. Software to detect network intrusions protects a computer network from unauthorized antagonistic activity on the opening session, by inspecting hazardous network activities. Intrusion detection is a set of techniques and methods that are used to detect suspicious activity both at the network and host level. A network based IDS (NIDS) processes any clear-text traffic that crosses the monitored network without degrading performance on the host computers; since a single NIDS can monitor many hosts, less maintenance and monitoring effort is required. Network- based IDS cannot precisely know the target's machine state; it must instead deduce the effects of traffic on the target system. In contrast, a host-based IDS (HIDS) is installed on individual hosts, which grants knowledge of the target machine's state and the ability to detect attacks from any point of entry. Networkbased intrusion detection systems continue to be more prevalent and mature than their host-based counterparts, although personal firewalls such as Zone Alarm on Windows computers have host-based intrusion detection capability and are frequently in use. Frequently, NIDS will only report whether a known attack was launched, without being able to determine whether the attack succeeded, or indeed whether the attack even applied to the target's operating system. uses thereby preventing malicious activities. The intrusion detector- learning task is to build a classifier (i.e. a predictive model) capable of distinguishing between attack/intrusion (“bad”connections), and normal or good connections. Considering the growing problems in network security and the need to develop sophisticated and robust solutions, theKDD Cup was organized in 1999 inviting researchers across theworld to design innovative methods to construct an IDS on atraining and testing data set, popularly referred to as the KDDCup 99 data set [3]. Since then, different machine-learningtechniques such as Bayesian Ms. Sarika Rathi, IJECS Volume 4 Issue 2 February, 2015 Page No.10567-10569 Page 10567 Classifiers and Decision Trees havebeen trained on the KDD Cup 99 data set to learn normal andinconsistent patterns from the testing data and thus generateclassifiers that are able to detect an intrusion attack [1]. Theproblem of network intrusion detection is not just to identify theattacks connections, but also to know what type of an attack theconnection belongs to [3]. The limitation of these classifiers isthat they generate a unified rule for all the attack types. As aresult, although the algorithms perform well in segregating attacksfrom the normal connections, their performance is not soappreciative when it comes to identifying what type of an attackthe connection is/was. This can be intuitively explained by thefact that a single rule cannot accurately classify all the attacktypes. This essentially forms the problem statement and themotivation behind this project. We would like to have an intrusiondetection system that has the power to also predict the type ofincoming attacks other than identifying attack connections. Amongst the different machine learning techniques, we selectedAprioriAlgorithm as a good way to find an efficient solution tothe problem.Apriori Algorithms are another machine learningapproach based on the principles of evolutionary computation [4].They incorporate the concept of Darwin’s theory and naturalselection to generate a set of rules that can be applied on a testingset to classify intrusions. Researchers have explored the use ofApriori in intrusion detection, and reported very high success rates,but on data sets other than the KDD 99 Cup, such as the DARPAdata set [2][4][5]. 3. PROPOSED WORK The main aim is to develop a network based intrusion detection system based on modified Apriori approach for attack detection and test the input thus produced by the Apriori algorithm with the well- known snort intrusion detection system, once a candidate sets for detecting different attacks are generated. These candidates in turn will be passed as inputs to the snort intrusion detection system for detecting different attacks. In figure the proposed system flow is given where, the input to the IDS is Darpa dataset. After that, training and testing. Training phase will contain initialization of parameters. Testing phase will contain real identification attack packets and classifying each detected attack under its category (Such as Dos attack, probe attack, U2R attack, R2Lattack). After that detection result and false alarm rate will get displayed. Figure 1Proposed System Structure After this step modified apriori algorithm is used, which contain process of creation of rules for detecting attacks? After creating rules they are passed to snort. Snort is an open source IDS. Now this method will detect the packets in the network. It evades the packets by changing the rules. Detection output will get stored in text files. The workflow is depicted in the block diagram. 4. WORKFLOW 3.1. DARPA Dataset: All the researchers have implemented their genetic algorithm on the offline data such as DARPA1998 data or KDD CUP 99 data. MIT Lincoln Laboratory, under Defense Advanced Research Projects Agency (DARPA) and Air Force Research Laboratory (AFRL) sponsorship, has collected and distributed the first standard data for evaluation of computer network intrusion detection systems. This Data is DARPA 1998 data. It consist of tcpdump and BSM list files. Each line in a list file corresponds to a separate session. Each session corresponds to an individual TCP/IP connection between two computers. The first nine columns in list files provide information which identifies the TCP/IP connection. The current data set does not have in the transaction format or does not have in the precise time information. We create the data records in the transaction format. Each transaction contains some records. In the Data set transactions are alienated by ‘###’. The dataset in which an association rules is to be found is viewed as a set tuples, where each tuple consist a set of items. For example tuple {Dos, R2L, U2R} which comprises the three items, which are Smurf, R2L and U2R. Keeping the attack record register in mind, each item represents an attack happened in particular time period. 4.2. Trainingand Testing Data Set We have used the KDDCUP 99 data set to train and testthe system classifier. The dataset has been provided by MITLincoln Labs. It contains a wide variety of intrusions simulated ina military network environment set up to acquire nine weeks ofraw TCP/IP dump data for a local-area network (LAN) simulatinga typical U.S. Air Force LAN. The LAN was operated as if itwere a true Air Force environment, peppered with multipleattacks. Hence, this is a high confidence and high quality data set. They set up an environment to collect TCP/IP dumpraws from a host located on a simulated military network. EachTCP/IP connection is described by 41 discrete and continuousfeatures (e.g. duration, protocol type, flag, etc.) and labeled aseither normal, or as an attack, with exactly one specific attack type(e.g. Smurf, Perl, etc.). Attacks fall into four main categories: (i)Denial of Service Attacks (DOS) in which an attacker overwhelmsthe victim host with a huge number of requests. (ii) User to RootAttacks (U2R) in which an attacker or a hacker tries to get theaccess rights from a normal host in order, for instance, to gain theroot access to the system. (iii) Remote to User Attacks (R2L) inwhich the intruder tries to exploit the system vulnerabilities inorder to control the remote machine through the network as alocal user. (iv) Probing in which an attacker attempts to gatheruseful information about machines and services available on thenetwork in order to Ms. Sarika Rathi, IJECS Volume 4 Issue 2 February, 2015 Page No.10567-10569 Page 10568 look for exploits. For our system, we haveused 10% of the training set containing as published by Lincoln Labs which contains 494,021 connections. Our testing set is theentire set of labeled connections consisting of around 4.9 millionconnections. Thus, using the entire data set we have been able totest our system on unseen connections. Till the current stage of implementation, we have beenable to generate a rule set comprising of six rules, each tocorrectly classify six different attack labels. We selected the topthree distributions of attack labels from the two classes of attacks:DOS and Probe, in the 10% training data set. These six attacklabels are: smurf, Neptune, land: type of DOS attacks and satan,ipsweep and portsweep: type of Probe attacks. generation the algorithm has to scan the entire database, and as we are aware that the larger the database the difficult it is to scan completely, therefore candidate generation and candidate pruning are considered to be tedious as it involves bringing new data, after random unexpected intervals of time. We have to also take care that less memory should be utilized during the scanning process. 5. CONCLUSION AND FUTURE SCOPE Currently Network-based intrusion detection detects intrusions based on signatures. In this paper we present a new framework to look for IDS. The aim of using our framework is to reduce complexity andget the high detection rate. For this purpose we are using the Darpa Dataset and applying the signature Apriori algorithm which is well known and widely used for intrusion detection. This framework used to detect the unknown attacks with high accuracy rate and high efficiency. This type of NIDS has very vast scope in future like one is to create our own dataset. The other is to analyze if these techniques can be applied straightly to model a commercial NIDS. REFERENCES Figure 2Categorization of attacks 4.3. Apriori Algorithm To detect attack packets on the network we are using Apriori algorithm. We got the categorization of attack packets by using training and testing phase, for generating rules we are using Apriori algorithm. One of the most popular data mining approaches is to find frequent item sets from a transaction dataset and derive association rules. A finding frequent item set (item sets with frequency larger than or equal to a user specified minimum support) is not trivial because of its combinatorial explosion [14]. Once frequent item-sets are obtained, it is straightforward to generate association rules with confidence larger than or equal to a user specified minimum confidence.Apriori is a seminal algorithm for finding frequent item-sets using candidate generation [1]. It is characterized as a level-wise complete search algorithm using anti-monotonicity of item-sets, “if an item-set is not frequent, any of its superset is never frequent”. By convention, Apriori assumes that items within a transaction or item-set are sorted in lexicographic order. Let the set of frequent item-sets of size k be Fkand their candidates be Ck. Apriori first scans the database and searches for frequent item-sets of size 1 by accumulating the count for each item and collecting those that satisfy the minimum support requirement. It then iterates on the following three steps and extracts all the frequent item-sets. Our approach is to implement the Apriori algorithm with minimum resources including hardware and less computational heads. Because apriori algorithm suffers from data complexity problems, i.e. for every step of candidate [1] Sergio Pastrana ,Agustin Orfila ,Arturo Ribagorda, “A functional framework to evade NIDS”, Hawaii International conference on System Sciences, 2011. [2]S. Pastrana, A. Orfila, and A. Ribagorda, “Modeling NIDS evasion with Genetic Programming”, on the Proceedings of The 2010 International Conference on Security and Management, SAM 2010, Las Vegas, Nevada, USA, July 11-15, 2010 [3]Martuza Ahmed, Rima Pal “NIDS: A network based approach to intrusion detection and prevention “ IEEE 2009. [4]R. Bace and P. Mell, "NIST Special Publication on Intrusion Detection Systems," 800-31, 2001. [5]M. Roesch, "Snort Lightweight Intrusion Detection for Networks," in LISA '99: Proceedings of the 13th USENIX conference on System administration, Seattle, Washington, 1999,pp.229--238. [6] Weiming Hu, Senior Member, IEEE, Wei Hu, and Steve Maybank, Senior Member, IEEE“AdaBoost-Based Algorithm for Network Intrusion Detection” IEEE Transactions On Systems, Man, And Cybernetics—Part B: Cybernetics, VOL. 38, NO. 2, APRIL 2008 [7] D. Watson, M. Smart, R. G. Malan, and F. Jahanian, "Protocol scrubbing: network security through transparent flow modification," IEEE/ACM Transactions on Networking, vol. 12, pp.261-273,2004. [8] M. Handley, C. Kreibich, and V. Paxson,”Network instrusion detection: Evasion, traffic normalization and end-to-end protocol semantics”, in Proceedings of the 10th Conference on USENIX Security Symposium, Volume 10, 2001,p.9. [9]M. Roesch, "Snort - Lightweight Intrusion Detection for Networks," in th LISA '99: Proceedings of the 13 USENIX conference on System administration, Seattle, Wash [10] R. Kohavi, ''A Study of Cross-Validation and Bootstrap for Accuracy Estimation and Model Selection'', in IJCAI: International Joint Conference on Artificial Intelligence, Volume 2, Issue 1, 1137--1143,1995 [11] J.R. Quinlan, ''C4.5: Programs for Machine Learning (Morgan Kaufmann Series in Machine Learning)'', Morgan Kaufmann, 1993ington,1999,pp.229--238. [12] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P.Reutemann, I. H. Witten, ''The WEKA Data Mining Software: An Update'', in SIGKDD Explorations, Volume 11,Issue1,2009 [13] N. Friedman, D. Geiger, M. Goldszmidt, “Bayesian Network Classifiers”, Machine Learning, vol. 29, issue 2,pp131-163,1997 [14] V´ aclavNov´ akMagdaRaz´ ımov´ a Unsupervised Detection of Annotation Inconsistencies Using Apriori Algorithm IEEE conf. 2007 Ms. Sarika Rathi, IJECS Volume 4 Issue 2 February, 2015 Page No.10567-10569 Page 10569