INTRUSION ATTACKS DETECTION USING OPPOSITION BASED LEARNING Khawaja MoyeezUllah Ghori, Faisal Bashir, Chaudhary M.Imran National University of Modern Languages, Ministry of Petroleum & NR, Entech Pvt(Ltd.) moiz.ghauri@hotmail.com, rfaisalbashir@hotmail.com, imran_chuadry@hotmail.com Abstract Intrusion Detection System plays an important role in a security framework. Tools installed for security such as antivirus software, firewalls, packet sniffers etc helps to prevent attackers from gaining access to confidential data but they are in no way fool proof. Strengthening the intrusion detection mechanism is a challenging task. Recently many machine learning approaches are used on KDD99Cup dataset for detecting different kinds of intrusions. In this research opposition based algorithms is used to implement rules for detecting various types of attacks. In the proposed study optimal number of features are extracted using various feature selection techniques like Random Projection (RP), Principal Component Analysis (PCA) etc and then opposition based evolutionary algorithms is used to improve the accuracy/performance of intrusion detection framework. Proposed methodology is tested and compared using well known quantities measures with state of the art methods to analyse the results produced by proposed technique. Keywords Intrusion Detection, Machine Learning Algorithms, Opposition Based Learning, Chromosomes, Anti chromosomes, Random Projection, Principal Component Analysis, KDD99Cup dataset, Feature Selection. INTRODUCTION The most publicized threats to security is the intruder (the other is viruses), generally referred to as hacker and cracker. In an important early study of intrusion Anderson identified three classes of intruders: Masquerader, Misfeasor, Clandestine user. The objective of intruder is to gain access to a system or to increase the range of privileges accessible on a system. Two principle countermeasures for intruders are Intrusion Detection and Intrusion Prevention. Intrusion detection is the process utilized to detect the presence or occurrence of an intrusion. Intrusion Detection System (IDS) is an essential tool that compliments any security mechanism such as a firewall or an antivirus. Today most of the surveillance and security monitoring of network infrastructures is done using IDS. Three common problems of accuracy, speed and adaptability are discussed in the literature related to IDS. In this research the focus will be on to improve the accuracy and speed of IDS. For this features from KDD99cup dataset will be extracted for different Intrusion attacks using different feature extraction techniques and then different opposition based evolutionary algorithms will be employed to achieve the speed and accuracy for Intrusion detection framework. To improve the speed opposition based learning is proposed, as in most of the cases learning starts at a random point and in worst case random guess can be far away and more specifically can be in opposite direction. So opposition based learning will give us a freedom to look simultaneously for solution in opposite direction also which will result to achieve the best possible result in shorter time. INTRUSION ATTACKS After Network based environments usage at corporate level, Now Internet usage has increased exponentially and managed to cover all the areas across the globe including business to business (B2B), business to consumer (B2C), internet based e-commerce, hosting portals and internet banking. All areas facing serious threats of intrusion attacks, despite hosting the best possible firewalls at their sides. Hard work require to both Security experts and hackers to manage their responsibilities. Here, we are mentioning types of intrusions included in the KDD99Cup dataset and also completing work around on these four types. Given below are the brief descriptions of these intrusion attacks. User to Root (U2R): Remote to Local (R2L): Denial of Service(DoS): Probing: Type of attack in which normal user gains access to super user privileges. Type of attack in which there is an unauthorized access from a remote machine to the protected network. Type of a attack in which network resources are made unavailable to its legitimate users. Type of attack in which gathering relevant information about any web application including web-services. This considered as most popular and effective intrusion attack. INTRUSIONS DETECTION Several techniques proposed and implemented for the intrusion detection but security experts still fighting to achieve best results on the basis of speed and accuracy. Intrusions detection is security management way for a computers and networks. It gathers and analyses information from different possible areas for a computer and network to identify intrusion attacks, which includes both External (one in which unknown attacker enters the system and uses network resources) and Internal (one in which internal attacker or authentic user trying to access unauthorized network resources within the same environment). Normally, Intrusions detection includes given below functions: Ability to recognize patterns typical of attacks Analysis of abnormal activity patterns Analysing system configurations and vulnerabilities Monitoring and analysing both user and system activities Assessing system and file integrity There is wide range of Intrusion detection types but here we completed work in two major types and excluding Active, Passive, Knowledge and Behaviour based IDS. Network Intrusion Detection System (NIDS): Based on network activities including network traffic examining and gaining access to hub, switch, port mirroring and network taping. Host Intrusion Detection System (HIDS): Based on host activities and state including modifications in binaries, application logs, passwords. PROBLEM STATEMENT Researchers working at maximum level to introduce stable and single concluded technique to handle intrusion attacks but due to diversify nature of this problem, needs to make more findings on the basis of machine intelligence techniques. Normally, three major issues rise for each intrusion detection technique reliability. Speed Accuracy Adaptability In this paper, we address speed and accuracy issues using opposition based learning with random projection and principal component analysis by keeping in state, detail review of previously incorporated machine learning techniques for an intrusions detection. OPPOSITION BASED LEARNING Opposition Based learning is based on the concept of opposite point, circumstances, location or state.If we consider this approach in-terms of number then it’s simple description for understanding is following. If x is a real number in the range [a,b], i.e. x € [a,b] then opposite number (x') of x is defined as x' = a + b - x. For Example, if a = -7 and b = 7 then opposite of x = -3 will be x' = 3. When working with n dimensional vectors, the definition of opposite numbers can be extended to opposite points in n dimensions. If X(x1, x2,......xn) is an n dimensional, where xi € [ai ,bi] and i = 1,2,3,....n; then opposite points of X is X'(x1',x2',....xn') where xi' = ai + bi - xi. FEATURES EXTRACTION For a Features extraction and construction, we normally refer to approximation algorithms and projection modelling. Here, we are considering Random Projection (RP) and Principal Component Analysis (PCA) for intrusions detection features. Random Projection: RP substantially reduce dimensionality of a problem while still retaining a significant degree of problem structure. In an algorithmic way, we can define as: Given n points in a space of any dimension Rn and project these points down to a random d dimensional subspace for a d << n and produce given below outcomes. If d = w (1/γ2 log n) then relative distances and angles between all pairs of points approximately up-to 1 ± γ. This way is beneficial for a fast approximation. If d = 1 then project points to a random line and produce something presentable. This will helps in a rounding approximation. Principal Component Analysis: PCA finds linear projection of high dimensional data into a lower dimensional data with minimised least square construction error and maximized the variance retained. It is a way of identifying patterns in the data and representing data in such a way as to highlight their similarities and differences. This technique will be good for a repeated intrusion attacks detection at certain level. PCA is a powerful tool for analysing data. In an algorithmic way, define as: A d-dimensional subspace of Rn which captures as much of the variation in the data set as possible. Given data K = {x1, x2, ......, xn}, it finds the linear projection to Rd for which i=1Σm || x i * - µ* ||2 is maximized, where x i * is the projection point of x i and µ* is the mean of projected data. DATASET DESCRIPTION KDD99Cup dataset used by several machine learning techniques for experimental results. It includes wide range of intrusions with different types. In this case, dataset approximately consists of 5,000,000 data instances and produced during intrusions simulation with network based environment. Labelled between 15-18 intrusion attacks with different types during required simulation. Almost 30-35 features and most of them taking with continuous values and considering required projection and analysed at different instances. It covers all four main attack types with required features during simulation: User to Root (normal user gains access to super user privileges), Remote to Local (there is an unauthorized access from a remote machine to the well protected network), Denial of Service (network resources are made unavailable to its legitimate users), Probing (gathering relevant information about any web application). All different types of attacks produced using normal activity in the background instead complex activity measures. It helps to normalized training set and make proportional real time scenario while producing required results. The proportion of KDD99Cup training dataset observed close to practice the normal operations in the environment and filtered with very low percentages to balance the instances. PROPOSED SOLUTION DETECTION WITH OBL USING RP AND PCA In this work opposition based learning is applied first time for intrusion detection system mainly opposition based genetic algorithm. This work is contribution to network security and Machine Learning domains. The solution obtained can be a part of IDS framework as detection module, other aspects of IDS including, reporting mechanism, alert generation etc are beyond scope of the work. In the proposed technique, Random Projection (RP) is used for the features extraction and considered Principal Component Analysis (PCA) model to properly analysed the data for Opposition Based Learning (OBL) genetic algorithm to achieve good speed and accuracy for an Intrusions Detection under machine learning techniques. As described in the given below complete flow diagram of proposed technique, features extracted from selected KDD99Cup dataset after implementing required fusion system then training set tested after applying Opposition Based Learning technique and generate chromosomes with anti-chromosomes. OBL is one of the technique to concentrate on opposite state or circumstances. Relevant chromosome produced and calculated its required accuracy and compared with previously available machine learning techniques during experimental results. Also identified speed of proposed technique reliability with others and reduced the complexity of problem. Dataset Feature Selection Feature Fusion System Training Testing Opposition Based Learning OGA chromosome anti chromosome Best Chromosome / Solution Calculate the accuracy Figure1: Model Description - Intrusions Detection using OBL OBL makes faster convergence and improves accuracy due to its Opposition factor as compared to other machine learning techniques on-board. The concept of opposition in OBL having key role, interplays between opposites normally provides state of balance for any solution i.e., up/down, right/left, day/night, male/female, multiplication/division, hot/cold etc. Here complete possibility, opposition based algorithm improves performance of any required solution efficiently. EXPERIMENTAL RESULTS Results completely surrounded by accuracy and speed to intrusions detection. Most of the machine learning techniques considered Detection and False Positive ratios to evaluate performances of any proposed system. Given below are the experimental results with labelled table: FUNCTION F1 F2 F3 F4 ATTACK TYPES U2R L2R DOS PROBE DETECTION RATIO FALSE POSITIVE 11.2% 10.8% 88.6% 70.8% 39.2% 24.5% 6.1% 3.3% Table1: Experimental Results Overall results are more than 10% for a detection rate for each type of intrusion attacks but results for a false positive rate above 24% for a U2R and L2R attacks. On the other hand, results are completely reliable for a DOS and Probing with below 7% false positive rates but probability of missing intrusions observed with false negative rate slightly dropped. Researchers normally involved to produce separate machine learning techniques for each attack type. Here, we tried to simulate all four major attack types under OBL and also managed result differences maximization with low percentage differences for each attack by considering their unique dimensions on priority basis. 100 80 60 DR (%) 40 FPR (%) 20 0 U2R L2R Dos PROBE Figure2: Graph of Experimental Results We also used masquerading users to detect and compare several intrusion attacks. The dataset was consists of 50 files representing each file to one user. Each file contains almost 15000 commands. First 5000 commands do not contain any masquerade data but next 10000 commands contains with 100 blocks of 100 commands each. There is no masquerade data for a 5000 commands and probability of masquerade was also 1% for any block after first 5000 commands but once a masquerade block appears then next blocks 80% probability to be masquerade. Keeping in view, first 5000 commands used for a training. Masquerade Dataset Specifications and Results Total Users 50 Training Data 5000 commands Testing Data 10000 commands Block size 100 Average Detection Rate 98.6% Average False Positive Rate 20.8% Table2:Masquerade Experimental Results Then we randomly selected 20 users with masquerade commands block data and applied proposed technique with implemented features. During this, observed false positive rate increased but masquerade detection rate remains 100% for all the randomly masquerade based selected users. 120 100 80 Detection Rate (%) 60 False Positive Rate (%) 40 20 0 Figure3: Graph of Masquerade Experimental Results Several techniques developed and implemented to gain maximum detection rate and decrease false positive detection rate but due to statistical nature of masquerade dataset, results are not up-to the mark. Proposed technique due to its close relevance with faster convergence and accuracy factors given valuable experimental results. In short, masquerade detection is a way to collect specific user information and creating relevant profile. These profiles based on general user information like session location, session time and duration, commands issued during login session for any assigned or performed task etc. After completion of profile, user commands log compares with completed profile. If doesn’t match then we normally consider as masquerade attack and apply any proposed technique for the detection. CONCLUSION AND FUTURE WORK In the proposed work opposition based learning is deployed to intrusion detection. Keeping in mind the real world situation where types of intrusions are changing rapidly and becoming complex, proposed framework would be fast and accurate. Previously researchers worked on several Machine Intelligence techniques to improve Intrusion Detection. Here, we are considering Opposition based learning technique for intrusion detection and using some practical results to consider this technique remarkable. OBL considering opposite guess for detection and this can be approachable because intrusion attacks normally based on random techniques with totally opposite scenarios. In short, OBL starts from random point and leads towards opposite location. Then Random projection and Principal component analysis modelling techniques enhancing Opposition based learning proposed for intrusions detection. RP gives support for a random projected data and PCA helps analysing that data. Proposed technique given good results in-terms of required accuracy and speed of intrusions detection, which was main goal to improve at efficient level. According to our literature review, random point/number generation, in absence of a valuable knowledge, this is a common practice to get initial guess or point randomly. If we look at the nature of Intrusion attacks then it will be more appropriate, if random point is opposite guess. In future, we are looking to continue this technique for a detection of a repeated intrusions attacks by emerging additional machine intelligence techniques with Opposition based learning. REFERENCES [1] A. Safdar, A. Farrukh, S. Waseem, “Remote-to-Local Intrusion Attacks Detection Using Incremental Genetic Algorithm”, Department of Computer Science, FAST, Islamabad, Pakistan. [2] KDD Cup 1999 dataset: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html, (1999) [3] S. Maheshkumar, S. Gursel, “Application of Machine Learning Algorithms to KDD Intrusion Detection Dataset within Misuse Detection Context”, EECS Dept, University of Toledo, Ohio 43606 USA. [4] T. R. Hamid, “Opposition Based Learning: A New Scheme for Machine Intelligence”, Pattern Analysis and Machine Intelligence Lab, System Design Engineering, University of Waterloo, Ontario, Canada. [5] M. John, “Testing Intrusion Detection Systems: A Critique of the 1998 and 1999 DARPA Intrusion Detection System Evaluations as Performed by Lincoln Laboratory”, Carnegie Mellon University, Pittsburgh. [6] C. Carlos, A. Enrique, “ Evolutionary Algorithms”, Dept. Lenguajes y Ciencias de la Computacion, ETSI informatica, University de Malaga, Campus de Teatinos, 29016 – Malaga – Spain, 2004. [7] T. R. Hamid, “Opposition Based Reinforcement Learning”, Pattern Analysis and Machine Intelligence Lab, System Design Engineering, University of Waterloo, Ontario, Canada. [8] S.Chavan, K. Shah, N.Dave, S. Mukherjee, A. Abraham, and S. Sanyal, “Adaptive neuro-fuzzy intrusion detection systems,” itcc, vol.01, p.70, 2004 [9] S. B. Idris, N. B. , “Artificial intelligence techniques applied to intrusion detection,” in INDICON, 2005 Annual IEEE, pp.52–55–11–13, December2005. [10] http://people.revoledu.com/kardi/tutorial/ReinforcementLearning/index.html, last visited: 20-11-2010. [11] http://www.schonlau.net/intrusion.html (Masquerading dataset)