International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014 An Empirical Model of Intrusion Detection System with ID3 Classification PavitraGunna1,K. Prasada Rao2 1 1,2 Final M.techstudent,2Senior Assistant Professor Computer science and engineering, Aditya Institute of Technology And Management (AITAM), Tekkali,A.P Abstract: Identification and prevention of unauthorized access (Intrusion Detection) is still an important research issue in the field of information security. In mobile adhoc networks every node independently communicate with other node through intermediate nodes in multi-hop and directly in single hop without intermediate servers, but for this dynamic nature , security is the primary issue while transmission of data over the nodes. We are proposing a hybrid approach for identification of unauthorized access by ID3 and authentication by the digital signature (Empirical Signature algorithm). .Index Terms : ID3 Classification, Digital Signature, Intrusion Detection System. I.INTRODUCTION A parallel Signature based technique proposed by the some researchers, in this approach, they are analyzing the network traffic with parallel processing, “In this method, complete rule groups are spread across nodes. It is possible to use a packet duplicator to send every packet to every node for processing, or a traffic splitter to route each packet to the appropriate node. In this case, rules are clustered into rule groups based on source and destination ports. So, a traffic splitter could route packets based on port number [1][2]. Port Based classification: The simplest way to classify Internet traffic is by using UDP orTCP port numbers. The reason is that some traffic uses wellknown port numbers, and the port numbers can be found on Internet Assigned Numbers Authority (IANA). For example, HTTP uses port 80, POP3 uses port 110, and SMTP uses port 25.We can set up rules to classify the applications that are assignedto the port numbers [3]. Various anomaly detection and prevention mechanisms available in traditional approaches like signature based approaches, trust based approaches, statistical based approaches and probability based approaches[4]. Traditional approaches not optimal while comparing with static attributes and retrieval of trust ISSN: 2231-5381 computational values from third parties or data rating calculated by intermediate nodes. Even though classification based techniques analyzes the behavior of incoming node, they are suffering from mismatched feature set selection and major issue is, semantic comparison is not possible. Various approaches available for identifying the unauthorized behavior of the incoming nodes like with their trust measures like direct trust, indirect trust and reputation metric, these metrics always maintained globally, so network cannot directly depend on third party. Main drawback with the Signature based IDS mechanisms are pattern based and these must be continuously updated and difficult to identify the new pattern. Direct classification techniques make more time complexity while classifying the network traffic of in and out data flows[5][6]. II. RELATED WORK Various IDS(Intrusion detection systems) approaches developed by the various researchers like Watchdog, Static measures and other, but ever approach has their advantages and disadvantages. Watch dog identifies the number of failures by the intermediate nodes, if it exceeds the minimum threshold values, it can treated as misbehavior node[7]. Static measure just compares the source node parameters while communicating with destination node. Two ack checks the acknowledgement from two consecutive nodes for success or failure of the nodes. There is a chance of false misbehavior node identification, during the network failures Static measure may not give optimal results, cannot estimate the node with single parameters Two ack efficiently works but overhead when number of packets and nodes are more There is traditional approach uses model checking techniques to compute attack graphs. We encountered significant scalability problems in applying this tool. One http://www.ijettjournal.org Page 319 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014 reason for the blow up is that there are many duplicate attack paths in the graph that differ only in the order in which independent attack steps are attempted. Partial-order reduction can remove such duplicate paths, but it has not been shown that the technique can significantly improve the scalability for attack graphs. Even after removing such duplicate paths, the resulting graphs could still be exponential. We also find it is hard to decode the meaning of the Boolean values in node, and logical correlation among nodes is not always obvious. Another traditional approach developed a tool for generating attack graphs [7][8] in 1998. Like the model checking approach, the nodes in their attack graphs represent the state of the network in the form of a collection of variables, and the edges represent an attacker’s actions that change the state. Instead of using a model checker, A customized search engine to conduct the analysis. This state-based attack graph representation has inherent exponential problems, and such explosion was indeed reported by the authors. They hence used technique similar to partial-order reduction to eliminate the duplicate attack paths that contributed to the explosion, but it is not clear from the paper how effective this method has been and no performance data was given. This is also noticed the scalability problem in the model checking-based attack graph tool, and proposed graph search-based algorithm, which was then used in the Topological Vulnerability Analysis tool. They pointed out that for most computer attacks, one can assume themonotonicity property, where an attacker does not decrease his ability by launching attacks, and hence does not need to relinquish privileges he already gained. Under this assumption, an attacker’s privileges always increase during the analysis. Since there are only a polynomial number of privileges an attacker can gain, the analysis algorithm will terminate in polynomial time. Our logical attack graph gives another perspective for this monotonicity property [9]. We observe that most attacks, whether monotonic or non-monotonic, have rooted causes in configuration information. Thus, at an appropriate level all those attacks’ preconditions can be specified using propositional formulas on configuration information. In some sense nonmonotonic attacks can be treated as monotonic if one ignores the low-level details on how the attack can happen. For this reason simple catalog rules can capture almost all kinds of attack conditions in a network. It gave a theoretical upper bound for their algorithmsO(|A|2 · |E|), where |A| is the number of “attributes” (describing attack pre- and post-conditions) and |E| is the number of “exploits”. The paper stated that typically an exploit involves two hosts, yielding a quadratic number of concrete exploits [12]. ISSN: 2231-5381 III. PROPOSED WORK In this paper we are proposing an integrated model of Intrusion detection system along with Authentication of the data packets with ID3 algorithm and Digital signature for the authentication purpose to identify the data packet which is received from the authorized user or not. ID3 algorithm classifies the source node information with training data which has the previous visited information and analyzes the node after the tree construction, our Experimental result gives optimal results than the previous approach because approach was developed Intrusion detection system with combined IDS(here with ID3) and digital signature . An Efficient IDS with ID3 classifier to identify the malicious nodes and authentication can be provided by the hash codes of the data packets andIntegrated approach gives optimal performance in dynamic nature. Intrusion detection system can be developed with the efficient ID3 classification algorithm by classifying the testing sample and with training samples by calculating the initial and conditional probability with respect to individual attributes and status and finally classifies the node as anonymous or un-anonymous node. 1) Establish Classification Attribute 2) Compute Classification Entropy. 3) For every attribute in R set, compute Information Gain using classification attribute. 4) Choose Attribute with the highest information gain to be the next Node in the tree (starting from the main root node). 5) Eliminate or remove Node Attribute, creating reduced table RS set. 6) Repeat steps 3 to 5 until all attributes have been used or the same classification value remains for all rows in the reduced table. ID3 builds a decision tree from a fixed set of examples and the resulting tree is used to classify future samples and the example has several attributes and belongs to a class (like yes or no) and the leaf nodes of the decision tree contain the class name whereas a non-leaf node is a decision node and the decision node is an attribute test with each branch (to another decision tree) being a possible value of the attribute and ID3 uses information gain to help it decide which attribute goes into a decision node and the advantage of learning a decision tree is that a program rather than a knowledge engineer that elicits knowledge from a final expert. Gain measures how well a given attribute separates training examples into targeted classes. The only one with the highest information (information being the http://www.ijettjournal.org Page 320 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014 most useful for classification) is selected to define gain, we first borrow an idea from information theory called entropy and Entropy measures the amount of information in an attribute. Authentication of the data packets can be verified by the efficient signature algorithm, in this module sender applies digital signature algorithm on the data packets which is transmitting and at the receiver end receiver verifies the data packet authentication by the same signature algorithm by comparing the signatures generated over the data packets. Empirical Signature algorithm Algorithm: Generate file with Signatures Input: User File in ASCII (Fo) Output: User File with Signature appended at end of (Fn) Method: In order to apply hash function on each n byte block of file which is corrupted? If we consider it with thefile we perform the following steps to make (m mod n)= 0 of Fo 1. M Calculate Length of (F0) 2. n Length of Block (any one of128/ 256 /512/ 1024 /204/4096/ 8192) bytes res reserved 16 bytes P m mod n Q n- (P + res) 3. if(Q > 0) FAppend Q zeros at the end of F0 Else if(Q < 0) R n+ Q F1 Append R zeros at the end of F0 F1 Append res at the end of F0 4. In order to generate Signatures of Fl, perform the following steps I Calculate_ Length of (Fl) countl/n For j1 to count S0 S reverse[∑ nA=1((A XOR B) v (A ∩ B))] Source ip Destination ip Port no Type protocol 192.168.1.10 192.168.1.20 8081 TCP 192.168.1.12 192.168.1.21 8082 TCP/IP 192.168.1.11 192.168.1.20 8081 smtp 192.168.1.19 192.168.1.28 8083 http 192.168.1.16 192.168.1.25 8084 TCP Fig1: Sample Dataset ISSN: 2231-5381 Where B <- to_Integer (to_Char (A)) 5. Sig Sig+ to-Binary (S) FnF1 + Sig For Implemental purpose we are using a synthetic data set which includes the previous nodes details which are anonymous or un anonymous and fields includes in training dataset are node name or ip-address,type of protocol and number of packets transmitted and input sample can be retrieved from the node which connected. Every node in the network acts as independent node, it means, can receive, transmit and classifies the nodes .Every individual node itself maintains the training dataset to classify the anonymous behavior of the node which is connected Experimental analysis For experimental purpose we implemented this Authentication based IDS mechanism in java, by considering a synthetic dataset (training Dataset) which contains Source ip address , Destination ip address, port number, type of protocol used and number of packets transmitted. Testing sample can be forwarded towards training dataset to compute information gain in terms of entropy to construct the decision tree of training dataset, tree can be constructed based on the highest to lowest entropy of the attribute, Entropy can be calculated as follows Entropy(S)=- log2( )- log2 Information gain can be given as difference of current set entropy and integration of entropy of child sets Gain(A)=E(Current set)-∑ E(all child sets) Our experimental results shows more accurate results than the traditional approaches of intrusion detection system in terms of anonymous behavior and secure transmission of data after classification .Sample training dataset can be taken as follows of Number of packets Status (in bytes) 56 Malicious 120 Not Malicious 35 Malicious 56 Malicious 56 Not Malicious http://www.ijettjournal.org Page 321 International Journal of Engineering Trends and Technology (IJETT) – Volume 16 Number 7 – Oct 2014 20 18 16 14 12 Hybrid 10 Classification 8 Signature Based 6 Trust Metric Based 4 2 0 Time Complexity Accuracy False Positives False Negatives Fig2: Comparative Analysis IV. CONCLUSION We are concluding our research work with efficient classification mechanism by constructing decision tree of the training sample andanalyze the behavior of the testing sample and authentication of data packets can be verified by the intermediate nodes while transmitting the data from source to destination node. Our experimental results show efficient results than the traditional approaches. REFERENCES 1) Internet assigned numbers authority (IANA),http://www.iana.org/assignments/port-number (last accessed October, 2009) 2) A. Madhukar, C. Williamson, A longitudinal study of p2ptraffic classification, in: MASCOTS ’06: Proceedings of the14th IEEE International Symposium on Modeling, Analysis,and Simulation, IEEE Computer Society, Washington, DC,USA, 2006, pp. 179– 188.doi:http://dx.doi.org/10.1109/MASCOTS.2006.6. 3) J. Klensin, SIMPLE MAIL TRANSFER PROTOCOL, IETFRFC 821, April 2001; http://www.ietf.org/rfc/rfc2821.txt [4] Bro intrusion detection system - Bro overview, http://broids. org, as of August 14, 2007. [5] V. Paxson, “Bro: A system for detecting network intruders in realtime,” Computer Networks, no. 31(23-24), pp. 2435–2463, 1999. [6] Azzouna, Nadia Ben and Guillemin, Fabrice, Analysis of ADSL Trafficon an IP Backbone Link, IEEE IEEE GlobalTelecommunicationsConference 2003, San Francisco, USA,December 2003. [7] Cho, Kenjiro, Fukuda, Kenshue, Esaki, Hiroshi and Kato, Akira, The Impact and Implications of the Growth inResidential User-to-User Traffic, ACM SIGCOMM 2006, Pisa, Italy, September 2006. ISSN: 2231-5381 [8]. J. Joshi et al., “Access Control Language for MultidomainEnvironments,” IEEE Internet Computing, vol.8, no. 6, 2004, pp. 40–50. [9]. M. Blaze et al., “Dynamic Trust Management,” Computer,vol. 42, no. 2, 2009, pp. 44–52. [10]. Y. Zhang and J. Joshi, “Access Control and Trust Managementfor Emerging Multi-domain Environments,”Annals of Emerging Research in Information Assurance, Securityand Privacy Services, S. Upadhyaya and R.O. Rao,eds., Emerald Group Publishing, 2009, pp. 421–452. BIOGRAPHIES pavitragunna pursuing mtech in computer science Aditya Institute of Technology And Management (AITAM). Her interesting areas are cloud computing, hadoop, information security. K. PrasadaRao completed his M.Tech and pursuing Ph. D in AcharyaNargarjuna University. Present he is working as Senior Assistant Professor in Department of Computer Science and Engineering, from Aditya Institute of Technology And Management (AITAM), Tekkali,A.P. His Interested areas are Data Mining and Computer Networks. http://www.ijettjournal.org Page 322