An Empirical Model of Intrusion Detection System with ID3 Classification PavitraGunna

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014
An Empirical Model of Intrusion Detection System
with ID3 Classification
PavitraGunna1,K. Prasada Rao2
1
1,2
Final M.techstudent,2Senior Assistant Professor
Computer science and engineering, Aditya Institute of Technology And Management (AITAM), Tekkali,A.P
Abstract: Identification and prevention of unauthorized
access (Intrusion Detection) is still an important research
issue in the field of information security. In mobile adhoc
networks every node independently communicate with
other node through intermediate nodes in multi-hop and
directly in single hop without intermediate servers, but for
this dynamic nature , security is the primary issue while
transmission of data over the nodes. We are proposing a
hybrid approach for identification of unauthorized access
by ID3 and authentication by the digital signature
(Empirical Signature algorithm).
.Index Terms : ID3 Classification, Digital Signature,
Intrusion Detection System.
I.INTRODUCTION
A parallel Signature based technique proposed by
the some researchers, in this approach, they are analyzing
the network traffic with parallel processing, “In this
method, complete rule groups are spread across nodes. It is
possible to use a packet duplicator to send every packet to
every node for processing, or a traffic splitter to route each
packet to the appropriate node. In this case, rules are
clustered into rule groups based on source and destination
ports. So, a traffic splitter could route packets based on port
number [1][2].
Port Based classification:
The simplest way to classify Internet traffic is by using
UDP orTCP port numbers. The reason is that some traffic
uses wellknown port numbers, and the port numbers can be
found on Internet Assigned Numbers Authority (IANA).
For example, HTTP uses port 80, POP3 uses port 110, and
SMTP uses port 25.We can set up rules to classify the
applications that are assignedto the port numbers [3].
Various anomaly detection and prevention
mechanisms available in traditional approaches like
signature based approaches, trust based approaches,
statistical based approaches and probability based
approaches[4]. Traditional approaches not optimal while
comparing with static attributes and retrieval of trust
ISSN: 2231-5381
computational values from third parties or data rating
calculated by intermediate nodes. Even though
classification based techniques analyzes the behavior of
incoming node, they are suffering from mismatched feature
set selection and major issue is, semantic comparison is not
possible.
Various approaches available for identifying the
unauthorized behavior of the incoming nodes like with
their trust measures like direct trust, indirect trust and
reputation metric, these metrics always maintained
globally, so network cannot directly depend on third party.
Main drawback with the Signature based IDS mechanisms
are pattern based and these must be continuously updated
and difficult to identify the new pattern. Direct
classification techniques make more time complexity while
classifying the network traffic of in and out data
flows[5][6].
II. RELATED WORK
Various IDS(Intrusion detection systems)
approaches developed by the various researchers like
Watchdog, Static measures and other, but ever approach
has their advantages and disadvantages. Watch dog
identifies the number of failures by the intermediate nodes,
if it exceeds the minimum threshold values, it can treated
as misbehavior node[7]. Static measure just compares the
source node parameters while communicating with
destination node. Two ack checks the acknowledgement
from two consecutive nodes for success or failure of the
nodes.



There is a chance of false misbehavior node
identification, during the network failures
Static measure may not give optimal results,
cannot estimate the node with single parameters
Two ack efficiently works but overhead when
number of packets and nodes are more
There is traditional approach uses model checking
techniques to compute attack graphs. We encountered
significant scalability problems in applying this tool. One
http://www.ijettjournal.org
Page 319
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014
reason for the blow up is that there are many duplicate
attack paths in the graph that differ only in the order in
which independent attack steps are attempted. Partial-order
reduction can remove such duplicate paths, but it has not
been shown that the technique can significantly improve
the scalability for attack graphs. Even after removing such
duplicate paths, the resulting graphs could still be
exponential. We also find it is hard to decode the meaning
of the Boolean values in node, and logical correlation
among nodes is not always obvious.
Another traditional approach developed a tool for
generating attack graphs [7][8] in 1998. Like the model
checking approach, the nodes in their attack graphs
represent the state of the network in the form of a
collection of variables, and the edges represent an
attacker’s actions that change the state. Instead of using a
model checker, A customized search engine to conduct the
analysis. This state-based attack graph representation has
inherent exponential problems, and such explosion was
indeed reported by the authors. They hence used technique
similar to partial-order reduction to eliminate the duplicate
attack paths that contributed to the explosion, but it is not
clear from the paper how effective this method has been
and no performance data was given.
This is also noticed the scalability problem in the
model checking-based attack graph tool, and proposed
graph search-based algorithm, which was then used in the
Topological Vulnerability Analysis tool. They pointed out
that for most computer attacks, one can assume
themonotonicity property, where an attacker does not
decrease his ability by launching attacks, and hence does
not need to relinquish privileges he already gained. Under
this assumption, an attacker’s privileges always increase
during the analysis. Since there are only a polynomial
number of privileges an attacker can gain, the analysis
algorithm will terminate in polynomial time. Our logical
attack graph gives another perspective for this
monotonicity property [9].
We observe that most attacks, whether monotonic
or non-monotonic, have rooted causes in configuration
information. Thus, at an appropriate level all those attacks’
preconditions can be specified using propositional formulas
on configuration information. In some sense nonmonotonic attacks can be treated as monotonic if one
ignores the low-level details on how the attack can happen.
For this reason simple catalog rules can capture almost all
kinds of attack conditions in a network. It gave a
theoretical upper bound for their algorithmsO(|A|2 · |E|),
where |A| is the number of “attributes” (describing attack
pre- and post-conditions) and |E| is the number of
“exploits”. The paper stated that typically an exploit
involves two hosts, yielding a quadratic number of
concrete exploits [12].
ISSN: 2231-5381
III. PROPOSED WORK
In this paper we are proposing an integrated
model of Intrusion detection system along with
Authentication of the data packets with ID3 algorithm and
Digital signature for the authentication purpose to identify
the data packet which is received from the authorized user
or not. ID3 algorithm classifies the source node
information with training data which has the previous
visited information and analyzes the node after the tree
construction, our Experimental result gives optimal results
than the previous approach because approach was
developed Intrusion detection system with combined
IDS(here with ID3) and digital signature . An Efficient IDS
with ID3 classifier to identify the malicious nodes and
authentication can be provided by the hash codes of the
data packets andIntegrated approach gives optimal
performance in dynamic nature.
Intrusion detection system can be developed with the
efficient ID3 classification algorithm by classifying the
testing sample and with training samples by calculating the
initial and conditional probability with respect to individual
attributes and status and finally classifies the node as
anonymous or un-anonymous node.
1) Establish Classification Attribute
2) Compute Classification Entropy.
3) For every attribute in R set, compute Information Gain
using classification attribute.
4) Choose Attribute with the highest information gain to be
the next Node in the tree (starting from the main root
node).
5) Eliminate or remove Node Attribute, creating reduced
table RS set.
6) Repeat steps 3 to 5 until all attributes have been used or
the same classification value remains for all rows in the
reduced table.
ID3 builds a decision tree from a fixed set of
examples and the resulting tree is used to classify future
samples and the example has several attributes and belongs
to a class (like yes or no) and the leaf nodes of the decision
tree contain the class name whereas a non-leaf node is a
decision node and the decision node is an attribute test
with each branch (to another decision tree) being a possible
value of the attribute and ID3 uses information gain to help
it decide which attribute goes into a decision node and the
advantage of learning a decision tree is that a program
rather than a knowledge engineer that elicits knowledge
from a final expert.
Gain measures how well a given attribute
separates training examples into targeted classes. The only
one with the highest information (information being the
http://www.ijettjournal.org
Page 320
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014
most useful for classification) is selected to define gain, we
first borrow an idea from information theory called entropy
and Entropy measures the amount of information in an
attribute.
Authentication of the data packets can be verified
by the efficient signature algorithm, in this module sender
applies digital signature algorithm on the data packets
which is transmitting and at the receiver end receiver
verifies the data packet authentication by the same
signature algorithm by comparing the signatures generated
over the data packets.
Empirical Signature algorithm
Algorithm: Generate file with Signatures
Input: User File in ASCII (Fo)
Output: User File with Signature appended at end of (Fn)
Method: In order to apply hash function on each n byte
block of file which is corrupted? If we consider it with
thefile we perform the following steps to make (m mod n)=
0 of Fo
1. M Calculate Length of (F0)
2. n Length of Block (any one of128/ 256 /512/ 1024
/204/4096/ 8192) bytes
res reserved 16 bytes
P m mod n
Q  n- (P + res)
3. if(Q > 0)
FAppend Q zeros at the end of F0
Else if(Q < 0)
R n+ Q
F1 Append R zeros at the end of F0
F1 Append res at the end of F0
4. In order to generate Signatures of Fl, perform the
following steps
I Calculate_ Length of (Fl)
countl/n
For j1 to count
S0
S reverse[∑ nA=1((A XOR B) v (A ∩ B))]
Source ip
Destination ip
Port no
Type
protocol
192.168.1.10
192.168.1.20
8081
TCP
192.168.1.12
192.168.1.21
8082
TCP/IP
192.168.1.11
192.168.1.20
8081
smtp
192.168.1.19
192.168.1.28
8083
http
192.168.1.16
192.168.1.25
8084
TCP
Fig1: Sample Dataset
ISSN: 2231-5381
Where B <- to_Integer (to_Char (A))
5. Sig Sig+ to-Binary (S)
FnF1 + Sig
For Implemental purpose we are using a
synthetic data set which includes the previous nodes details
which are anonymous or un anonymous and fields includes
in training dataset are node name or ip-address,type of
protocol and number of packets transmitted and input
sample can be retrieved from the node which connected.
Every node in the network acts as independent
node, it means, can receive, transmit and classifies the
nodes .Every individual node itself maintains the training
dataset to classify the anonymous behavior of the node
which is connected
Experimental analysis
For experimental purpose we implemented
this Authentication based IDS mechanism in java, by
considering a synthetic dataset (training Dataset) which
contains Source ip address , Destination ip address, port
number, type of protocol used and number of packets
transmitted. Testing sample can be forwarded towards
training dataset to compute information gain in terms of
entropy to construct the decision tree of training dataset,
tree can be constructed based on the highest to lowest
entropy of the attribute, Entropy can be calculated as
follows
Entropy(S)=-
log2(
)-
log2
Information gain can be given as difference of
current set entropy and integration of entropy of child sets
Gain(A)=E(Current set)-∑ E(all child sets)
Our experimental results shows more accurate results than
the traditional approaches of intrusion detection system in
terms of anonymous behavior and secure transmission of
data after classification .Sample training dataset can be
taken as follows
of Number of packets
Status
(in bytes)
56
Malicious
120
Not Malicious
35
Malicious
56
Malicious
56
Not Malicious
http://www.ijettjournal.org
Page 321
International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014
20
18
16
14
12
Hybrid
10
Classification
8
Signature Based
6
Trust Metric Based
4
2
0
Time Complexity
Accuracy
False Positives False Negatives
Fig2: Comparative Analysis
IV. CONCLUSION
We are concluding our research work with efficient
classification mechanism by constructing decision tree of
the training sample andanalyze the behavior of the testing
sample and authentication of data packets can be verified
by the intermediate nodes while transmitting the data from
source to destination node. Our experimental results show
efficient results than the traditional approaches.
REFERENCES
1)
Internet
assigned
numbers
authority
(IANA),http://www.iana.org/assignments/port-number (last accessed
October, 2009)
2) A. Madhukar, C. Williamson, A longitudinal study of p2ptraffic
classification, in: MASCOTS ’06: Proceedings of the14th IEEE
International Symposium on Modeling, Analysis,and Simulation, IEEE
Computer Society, Washington, DC,USA, 2006, pp. 179–
188.doi:http://dx.doi.org/10.1109/MASCOTS.2006.6.
3) J. Klensin, SIMPLE MAIL TRANSFER PROTOCOL, IETFRFC 821,
April 2001; http://www.ietf.org/rfc/rfc2821.txt
[4] Bro intrusion detection system - Bro overview, http://broids. org, as of
August 14, 2007.
[5] V. Paxson, “Bro: A system for detecting network intruders in realtime,” Computer Networks, no. 31(23-24), pp. 2435–2463, 1999.
[6] Azzouna, Nadia Ben and Guillemin, Fabrice, Analysis of ADSL
Trafficon
an
IP
Backbone
Link,
IEEE
IEEE
GlobalTelecommunicationsConference
2003,
San
Francisco,
USA,December 2003.
[7] Cho, Kenjiro, Fukuda, Kenshue, Esaki, Hiroshi and Kato, Akira, The
Impact and Implications of the Growth inResidential User-to-User
Traffic, ACM SIGCOMM 2006, Pisa, Italy, September 2006.
ISSN: 2231-5381
[8]. J. Joshi et al., “Access Control Language for MultidomainEnvironments,” IEEE Internet Computing, vol.8, no. 6, 2004, pp.
40–50.
[9]. M. Blaze et al., “Dynamic Trust Management,” Computer,vol. 42, no.
2, 2009, pp. 44–52.
[10]. Y. Zhang and J. Joshi, “Access Control and Trust Managementfor
Emerging Multi-domain Environments,”Annals of Emerging Research in
Information Assurance, Securityand Privacy Services, S. Upadhyaya and
R.O. Rao,eds., Emerald Group Publishing, 2009, pp. 421–452.
BIOGRAPHIES
pavitragunna
pursuing
mtech
in
computer science Aditya Institute of
Technology
And
Management
(AITAM). Her interesting areas are
cloud computing, hadoop, information
security.
K. PrasadaRao completed his M.Tech and
pursuing Ph. D in AcharyaNargarjuna
University. Present he is working as
Senior Assistant Professor in Department
of Computer Science and Engineering,
from Aditya Institute of Technology And
Management (AITAM), Tekkali,A.P. His
Interested areas are Data Mining and
Computer Networks.
http://www.ijettjournal.org
Page 322
Download