www.ijecs.in International Journal Of Engineering And Computer Science ISSN:2319-7242

advertisement
www.ijecs.in
International Journal Of Engineering And Computer Science ISSN:2319-7242
Volume 4 Issue 2 February 2015, Page No. 10567-10569
Detecting Attack Packets by Using Darpa Dataset
on Intrusion Detection System
Ms. Sarika Rameshwar Rathi
Dept of Computer Enginerring,
MGM Polytechnic College,Aurangabad. Maharshtra,India.
email:rathisarika11@gmail.com
Abstract— today lot of valuable data is generated using many computers based application and stored back to the company database. But
unfortunately, the threat to the same data is also increasing rapidly. So, development of a proper Intrusion Detection System which provides
a right alarm is a hot topic today. A set of rules are used by Signature based Network Intrusion Detection Systems (NIDS) to detect hostile
traffic in network segments or packets, which are so important in detecting malicious and anomalous behavior over the network like known
attacks that hackers look for new techniques to go unseen.Theproblem of network intrusion detection is not just to identify theattacks
connections, but also to know what type of an attack theconnection belongs to. The paper aims to build an innovative functional framework
to NIDS. This framework can be used to audit NIDS.This framework shows that a proof of concept showing how to categorize the attacks.
Keywords-Intrusion detection, Network intrusion detection
system, Network security.
1. INTRODUCTION
Many sites install an Intrusion Detection System (IDS) to
monitor their hosts and networks for suspicious events.
Many IDSs use a database of known events for comparison,
sending alerts when a match is detected. Nowadays, many
organizations and companies use Internet services as their
communication and marketplace to do business such as at
EBay and Amazon.com website. Together with the growth
of computer network activities, the growing rate of network
attacks has been advancing, impacting to the availability,
confidentiality, and integrity of critical information data.
Therefore a network system must use one or more security
tools such as firewall, antivirus, IDS and Honey Pot to
prevent important data from criminal enterprises.
A network system using a firewall only is not enough to
prevent networks from all attack types. The firewall cannot
defense the network against intrusion attempts during the
opening port. Hence a Real-Time Intrusion Detection
System (RT-IDS) is a prevention tool that gives an alarm
signal to the computer user or network administrator for
2. OVERVIEW
In earlier method, C4.5 algorithm is used to this this
algorithm KDD dataset was used. In this paper we are taking
directly training and testing as first step for this step
DARPA dataset as input. To reduce the complexity of C4.5
algorithm, we are using directly training phase. Because in
training first it labelled so no need of C4.5 algorithm.
Intrusion detection to identify attacks on computersystems
has been a challenging problem in the domain of network
security for quite some time. Software to detect network
intrusions protects a computer network from unauthorized
antagonistic activity on the opening session, by inspecting
hazardous network activities.
Intrusion detection is a set of techniques and methods that
are used to detect suspicious activity both at the network and
host level. A network based IDS (NIDS) processes any
clear-text traffic that crosses the monitored network without
degrading performance on the host computers; since a single
NIDS can monitor many hosts, less maintenance and
monitoring effort is required. Network- based IDS cannot
precisely know the target's machine state; it must instead
deduce the effects of traffic on the target system. In contrast,
a host-based IDS (HIDS) is installed on individual hosts,
which grants knowledge of the target machine's state and the
ability to detect attacks from any point of entry. Networkbased intrusion detection systems continue to be more
prevalent and mature than their host-based counterparts,
although personal firewalls such as Zone Alarm on
Windows computers have host-based intrusion detection
capability and are frequently in use. Frequently, NIDS will
only report whether a known attack was launched, without
being able to determine whether the attack succeeded, or
indeed whether the attack even applied to the target's
operating system.
uses thereby preventing malicious activities. The intrusion
detector- learning task is to build a classifier (i.e. a
predictive model) capable of distinguishing between
attack/intrusion (“bad”connections), and normal or good
connections. Considering the growing problems in network
security and the need to develop sophisticated and robust
solutions, theKDD Cup was organized in 1999 inviting
researchers across theworld to design innovative methods to
construct an IDS on atraining and testing data set, popularly
referred to as the KDDCup 99 data set [3]. Since then,
different machine-learningtechniques such as Bayesian
Ms. Sarika Rathi, IJECS Volume 4 Issue 2 February, 2015 Page No.10567-10569
Page 10567
Classifiers and Decision Trees havebeen trained on the
KDD Cup 99 data set to learn normal andinconsistent
patterns from the testing data and thus generateclassifiers
that are able to detect an intrusion attack [1]. Theproblem of
network intrusion detection is not just to identify theattacks
connections, but also to know what type of an attack
theconnection belongs to [3]. The limitation of these
classifiers isthat they generate a unified rule for all the
attack types. As aresult, although the algorithms perform
well in segregating attacksfrom the normal connections,
their performance is not soappreciative when it comes to
identifying what type of an attackthe connection is/was. This
can be intuitively explained by thefact that a single rule
cannot accurately classify all the attacktypes. This
essentially forms the problem statement and themotivation
behind this project. We would like to have an
intrusiondetection system that has the power to also predict
the type ofincoming attacks other than identifying attack
connections.
Amongst the different machine learning techniques, we
selectedAprioriAlgorithm as a good way to find an efficient
solution tothe problem.Apriori Algorithms are another
machine learningapproach based on the principles of
evolutionary computation [4].They incorporate the concept
of Darwin’s theory and naturalselection to generate a set of
rules that can be applied on a testingset to classify
intrusions. Researchers have explored the use ofApriori in
intrusion detection, and reported very high success rates,but
on data sets other than the KDD 99 Cup, such as the
DARPAdata set [2][4][5].
3. PROPOSED WORK
The main aim is to develop a network based intrusion
detection system based on modified Apriori approach for
attack detection and test the input thus produced by the
Apriori algorithm with the well- known snort intrusion
detection system, once a candidate sets for detecting
different attacks are generated. These candidates in turn will
be passed as inputs to the snort intrusion detection system
for detecting different attacks.
In figure the proposed system flow is given where, the input
to the IDS is Darpa dataset. After that, training and testing.
Training phase will contain initialization of parameters.
Testing phase will contain real identification attack packets
and classifying each detected attack under its category (Such
as Dos attack, probe attack, U2R attack, R2Lattack). After
that detection result and false alarm rate will get displayed.
Figure 1Proposed System Structure
After this step modified apriori algorithm is used, which
contain process of creation of rules for detecting attacks?
After creating rules they are passed to snort. Snort is an
open source IDS. Now this method will detect the packets in
the network. It evades the packets by changing the rules.
Detection output will get stored in text files. The workflow
is depicted in the block diagram.
4. WORKFLOW
3.1. DARPA Dataset:
All the researchers have implemented their genetic
algorithm on the offline data such as DARPA1998 data or
KDD CUP 99 data. MIT Lincoln Laboratory, under Defense
Advanced Research Projects Agency (DARPA) and Air
Force Research Laboratory (AFRL) sponsorship, has
collected and distributed the first standard data for
evaluation of computer network intrusion detection systems.
This Data is DARPA 1998 data. It consist of tcpdump and
BSM list files. Each line in a list file corresponds to a
separate session. Each session corresponds to an individual
TCP/IP connection between two computers. The first nine
columns in list files provide information which identifies the
TCP/IP connection. The current data set does not have in the
transaction format or does not have in the precise time
information. We create the data records in the transaction
format. Each transaction contains some records. In the Data
set transactions are alienated by ‘###’. The dataset in which
an association rules is to be found is viewed as a set tuples,
where each tuple consist a set of items. For example tuple
{Dos, R2L, U2R} which comprises the three items, which
are Smurf, R2L and U2R. Keeping the attack record register
in mind, each item represents an attack happened in
particular time period.
4.2. Trainingand Testing Data Set
We have used the KDDCUP 99 data set to train and testthe
system classifier. The dataset has been provided by
MITLincoln Labs. It contains a wide variety of intrusions
simulated ina military network environment set up to
acquire nine weeks ofraw TCP/IP dump data for a local-area
network (LAN) simulatinga typical U.S. Air Force LAN.
The LAN was operated as if itwere a true Air Force
environment, peppered with multipleattacks. Hence, this is a
high confidence and high quality data set.
They set up an environment to collect TCP/IP dumpraws
from a host located on a simulated military network.
EachTCP/IP connection is described by 41 discrete and
continuousfeatures (e.g. duration, protocol type, flag, etc.)
and labeled aseither normal, or as an attack, with exactly
one specific attack type(e.g. Smurf, Perl, etc.). Attacks fall
into four main categories: (i)Denial of Service Attacks
(DOS) in which an attacker overwhelmsthe victim host with
a huge number of requests. (ii) User to RootAttacks (U2R)
in which an attacker or a hacker tries to get theaccess rights
from a normal host in order, for instance, to gain theroot
access to the system. (iii) Remote to User Attacks (R2L)
inwhich the intruder tries to exploit the system
vulnerabilities inorder to control the remote machine
through the network as alocal user. (iv) Probing in which an
attacker attempts to gatheruseful information about
machines and services available on thenetwork in order to
Ms. Sarika Rathi, IJECS Volume 4 Issue 2 February, 2015 Page No.10567-10569
Page 10568
look for exploits. For our system, we haveused 10% of the
training set containing as published by Lincoln
Labs which contains 494,021 connections. Our testing set is
theentire set of labeled connections consisting of around 4.9
millionconnections. Thus, using the entire data set we have
been able totest our system on unseen connections.
Till the current stage of implementation, we have beenable
to generate a rule set comprising of six rules, each
tocorrectly classify six different attack labels. We selected
the topthree distributions of attack labels from the two
classes of attacks:DOS and Probe, in the 10% training data
set. These six attacklabels are: smurf, Neptune, land: type of
DOS attacks and satan,ipsweep and portsweep: type of
Probe attacks.
generation the algorithm has to scan the entire database, and
as we are aware that the larger the database the difficult it is
to scan completely, therefore candidate generation and
candidate pruning are considered to be tedious as it involves
bringing new data, after random unexpected intervals of
time. We have to also take care that less memory should be
utilized during the scanning process.
5. CONCLUSION AND FUTURE SCOPE
Currently Network-based intrusion detection detects
intrusions based on signatures. In this paper we present a
new framework to look for IDS. The aim of using our
framework is to reduce complexity andget the high detection
rate. For this purpose we are using the Darpa Dataset and
applying the signature Apriori algorithm which is well
known and widely used for intrusion detection. This
framework used to detect the unknown attacks with high
accuracy rate and high efficiency. This type of NIDS has
very vast scope in future like one is to create our own
dataset. The other is to analyze if these techniques can be
applied straightly to model a commercial NIDS.
REFERENCES
Figure 2Categorization of attacks
4.3. Apriori Algorithm
To detect attack packets on the network we are using
Apriori algorithm. We got the categorization of attack
packets by using training and testing phase, for generating
rules we are using Apriori algorithm.
One of the most popular data mining approaches is to find
frequent item sets from a transaction dataset and derive
association rules. A finding frequent item set (item sets with
frequency larger than or equal to a user specified minimum
support) is not trivial because of its combinatorial explosion
[14]. Once frequent item-sets are obtained, it is
straightforward to generate association rules with
confidence larger than or equal to a user specified minimum
confidence.Apriori is a seminal algorithm for finding
frequent item-sets using candidate generation [1]. It is
characterized as a level-wise complete search algorithm
using anti-monotonicity of item-sets, “if an item-set is not
frequent, any of its superset is never frequent”. By
convention, Apriori assumes that items within a transaction
or item-set are sorted in lexicographic order. Let the set of
frequent item-sets of size k be Fkand their candidates be Ck.
Apriori first scans the database and searches for frequent
item-sets of size 1 by accumulating the count for each item
and collecting those that satisfy the minimum support
requirement. It then iterates on the following three steps and
extracts all the frequent item-sets.
Our approach is to implement the Apriori algorithm with
minimum resources including hardware and less
computational heads. Because apriori algorithm suffers from
data complexity problems, i.e. for every step of candidate
[1] Sergio Pastrana ,Agustin Orfila ,Arturo Ribagorda, “A functional
framework to evade NIDS”, Hawaii International conference on System
Sciences, 2011.
[2]S. Pastrana, A. Orfila, and A. Ribagorda, “Modeling NIDS evasion with
Genetic Programming”, on the Proceedings of The 2010 International
Conference on Security and Management, SAM 2010, Las Vegas, Nevada,
USA, July 11-15, 2010
[3]Martuza Ahmed, Rima Pal
“NIDS: A network based approach to intrusion detection and prevention “
IEEE
2009.
[4]R. Bace and P. Mell, "NIST Special Publication on Intrusion Detection
Systems," 800-31, 2001.
[5]M. Roesch, "Snort Lightweight Intrusion Detection for Networks," in LISA '99: Proceedings
of the 13th USENIX conference on System administration, Seattle,
Washington,
1999,pp.229--238.
[6] Weiming Hu, Senior Member, IEEE, Wei Hu, and Steve Maybank,
Senior Member, IEEE“AdaBoost-Based Algorithm for Network Intrusion
Detection” IEEE Transactions On Systems, Man, And Cybernetics—Part
B: Cybernetics, VOL. 38, NO. 2, APRIL 2008
[7] D. Watson, M. Smart, R. G. Malan, and F.
Jahanian, "Protocol scrubbing: network security through transparent flow
modification," IEEE/ACM Transactions on Networking, vol. 12, pp.261-273,2004.
[8] M. Handley, C. Kreibich, and V. Paxson,”Network instrusion detection:
Evasion, traffic normalization and end-to-end protocol semantics”, in
Proceedings of the 10th Conference on USENIX Security Symposium,
Volume
10,
2001,p.9.
[9]M. Roesch, "Snort - Lightweight Intrusion Detection for Networks," in
th
LISA '99: Proceedings of the 13 USENIX conference on System
administration, Seattle, Wash
[10] R.
Kohavi, ''A Study of Cross-Validation and Bootstrap for Accuracy
Estimation and Model Selection'', in IJCAI: International Joint Conference
on Artificial Intelligence, Volume 2, Issue 1, 1137--1143,1995
[11] J.R. Quinlan, ''C4.5: Programs for Machine Learning (Morgan
Kaufmann Series in Machine Learning)'', Morgan Kaufmann,
1993ington,1999,pp.229--238.
[12] M. Hall, E. Frank, G. Holmes, B. Pfahringer, P.Reutemann, I. H.
Witten, ''The WEKA Data Mining Software: An Update'', in SIGKDD
Explorations,
Volume
11,Issue1,2009
[13] N. Friedman, D. Geiger, M. Goldszmidt, “Bayesian Network
Classifiers”, Machine Learning, vol. 29, issue 2,pp131-163,1997
[14] V´ aclavNov´ akMagdaRaz´ ımov´ a Unsupervised Detection of
Annotation Inconsistencies Using Apriori Algorithm IEEE conf. 2007
Ms. Sarika Rathi, IJECS Volume 4 Issue 2 February, 2015 Page No.10567-10569
Page 10569
Download