Intrusion Attacks Detection Using OPPOSITION BASED

advertisement
INTRUSION ATTACKS DETECTION USING OPPOSITION BASED
LEARNING
Khawaja MoyeezUllah Ghori, Faisal Bashir, Chaudhary M.Imran
National University of Modern Languages, Ministry of Petroleum & NR, Entech Pvt(Ltd.)
moiz.ghauri@hotmail.com, rfaisalbashir@hotmail.com, imran_chuadry@hotmail.com
Abstract
Intrusion Detection System plays an important role in a security framework. Tools installed for security such as
antivirus software, firewalls, packet sniffers etc helps to prevent attackers from gaining access to confidential
data but they are in no way fool proof. Strengthening the intrusion detection mechanism is a challenging task.
Recently many machine learning approaches are used on KDD99Cup dataset for detecting different kinds of
intrusions. In this research opposition based algorithms is used to implement rules for detecting various types of
attacks. In the proposed study optimal number of features are extracted using various feature selection
techniques like Random Projection (RP), Principal Component Analysis (PCA) etc and then opposition based
evolutionary algorithms is used to improve the accuracy/performance of intrusion detection framework. Proposed
methodology is tested and compared using well known quantities measures with state of the art methods to
analyse the results produced by proposed technique.
Keywords
Intrusion Detection, Machine Learning Algorithms, Opposition Based Learning, Chromosomes, Anti
chromosomes, Random Projection, Principal Component Analysis, KDD99Cup dataset, Feature Selection.
INTRODUCTION
The most publicized threats to security is the intruder (the other is viruses), generally referred to as hacker and
cracker. In an important early study of intrusion Anderson identified three classes of intruders: Masquerader,
Misfeasor, Clandestine user. The objective of intruder is to gain access to a system or to increase the range of
privileges accessible on a system. Two principle countermeasures for intruders are Intrusion Detection and
Intrusion Prevention. Intrusion detection is the process utilized to detect the presence or occurrence of an
intrusion. Intrusion Detection System (IDS) is an essential tool that compliments any security mechanism such
as a firewall or an antivirus. Today most of the surveillance and security monitoring of network infrastructures is
done using IDS. Three common problems of accuracy, speed and adaptability are discussed in the literature
related to IDS. In this research the focus will be on to improve the accuracy and speed of IDS. For this features
from KDD99cup dataset will be extracted for different Intrusion attacks using different feature extraction
techniques and then different opposition based evolutionary algorithms will be employed to achieve the speed
and accuracy for Intrusion detection framework. To improve the speed opposition based learning is proposed, as
in most of the cases learning starts at a random point and in worst case random guess can be far away and more
specifically can be in opposite direction. So opposition based learning will give us a freedom to look
simultaneously for solution in opposite direction also which will result to achieve the best possible result in
shorter time.
INTRUSION ATTACKS
After Network based environments usage at corporate level, Now Internet usage has increased exponentially and
managed to cover all the areas across the globe including business to business (B2B), business to consumer
(B2C), internet based e-commerce, hosting portals and internet banking. All areas facing serious threats of
intrusion attacks, despite hosting the best possible firewalls at their sides. Hard work require to both Security
experts and hackers to manage their responsibilities.
Here, we are mentioning types of intrusions included in the KDD99Cup dataset and also completing work
around on these four types. Given below are the brief descriptions of these intrusion attacks.
User to Root (U2R):
Remote to Local (R2L):
Denial of Service(DoS):
Probing:
Type of attack in which normal user gains access to super user privileges.
Type of attack in which there is an unauthorized access from a remote machine to the
protected network.
Type of a attack in which network resources are made unavailable to its legitimate
users.
Type of attack in which gathering relevant information about any web application
including web-services. This considered as most popular and effective intrusion
attack.
INTRUSIONS DETECTION
Several techniques proposed and implemented for the intrusion detection but security experts still fighting to
achieve best results on the basis of speed and accuracy.
Intrusions detection is security management way for a computers and networks. It gathers and analyses
information from different possible areas for a computer and network to identify intrusion attacks, which
includes both External (one in which unknown attacker enters the system and uses network resources) and
Internal (one in which internal attacker or authentic user trying to access unauthorized network resources within
the same environment).
Normally, Intrusions detection includes given below functions:
 Ability to recognize patterns typical of attacks
 Analysis of abnormal activity patterns
 Analysing system configurations and vulnerabilities
 Monitoring and analysing both user and system activities
 Assessing system and file integrity
There is wide range of Intrusion detection types but here we completed work in two major types and excluding
Active, Passive, Knowledge and Behaviour based IDS.
Network Intrusion Detection System (NIDS): Based on network activities including network traffic examining
and gaining access to hub, switch, port mirroring and network taping.
Host Intrusion Detection System (HIDS): Based on host activities and state including modifications in binaries,
application logs, passwords.
PROBLEM STATEMENT
Researchers working at maximum level to introduce stable and single concluded technique to handle intrusion
attacks but due to diversify nature of this problem, needs to make more findings on the basis of machine
intelligence techniques. Normally, three major issues rise for each intrusion detection technique reliability.



Speed
Accuracy
Adaptability
In this paper, we address speed and accuracy issues using opposition based learning with random projection and
principal component analysis by keeping in state, detail review of previously incorporated machine learning
techniques for an intrusions detection.
OPPOSITION BASED LEARNING
Opposition Based learning is based on the concept of opposite point, circumstances, location or state.If we
consider this approach in-terms of number then it’s simple description for understanding is following.
If x is a real number in the range [a,b], i.e. x € [a,b] then opposite number (x') of x is defined as x' = a + b - x.
For Example, if a = -7 and b = 7 then opposite of x = -3 will be x' = 3.
When working with n dimensional vectors, the definition of opposite numbers can be extended to opposite points
in n dimensions. If X(x1, x2,......xn) is an n dimensional, where xi € [ai ,bi] and i = 1,2,3,....n; then opposite points
of X is X'(x1',x2',....xn') where xi' = ai + bi - xi.
FEATURES EXTRACTION
For a Features extraction and construction, we normally refer to approximation algorithms and projection
modelling. Here, we are considering Random Projection (RP) and Principal Component Analysis (PCA) for
intrusions detection features.
Random Projection: RP substantially reduce dimensionality of a problem while still retaining a significant
degree of problem structure. In an algorithmic way, we can define as:
Given n points in a space of any dimension Rn and project these points down to a random d dimensional
subspace for a d << n and produce given below outcomes.
If d = w (1/γ2 log n) then relative distances and angles between all pairs of points approximately up-to 1 ± γ.
This way is beneficial for a fast approximation.
If d = 1 then project points to a random line and produce something presentable. This will helps in a rounding
approximation.
Principal Component Analysis: PCA finds linear projection of high dimensional data into a lower dimensional
data with minimised least square construction error and maximized the variance retained. It is a way of
identifying patterns in the data and representing data in such a way as to highlight their similarities and
differences. This technique will be good for a repeated intrusion attacks detection at certain level. PCA is a
powerful tool for analysing data. In an algorithmic way, define as:
A d-dimensional subspace of Rn which captures as much of the variation in the data set as possible. Given data K
= {x1, x2, ......, xn}, it finds the linear projection to Rd for which i=1Σm || x i * - µ* ||2 is maximized, where x i * is
the projection point of x i and µ* is the mean of projected data.
DATASET DESCRIPTION
KDD99Cup dataset used by several machine learning techniques for experimental results. It includes wide range
of intrusions with different types. In this case, dataset approximately consists of 5,000,000 data instances and
produced during intrusions simulation with network based environment. Labelled between 15-18 intrusion
attacks with different types during required simulation. Almost 30-35 features and most of them taking with
continuous values and considering required projection and analysed at different instances.
It covers all four main attack types with required features during simulation: User to Root (normal user gains
access to super user privileges), Remote to Local (there is an unauthorized access from a remote machine to the
well protected network), Denial of Service (network resources are made unavailable to its legitimate users),
Probing (gathering relevant information about any web application).
All different types of attacks produced using normal activity in the background instead complex activity
measures. It helps to normalized training set and make proportional real time scenario while producing required
results. The proportion of KDD99Cup training dataset observed close to practice the normal operations in the
environment and filtered with very low percentages to balance the instances.
PROPOSED SOLUTION
DETECTION WITH OBL USING RP AND PCA
In this work opposition based learning is applied first time for intrusion detection system mainly opposition
based genetic algorithm. This work is contribution to network security and Machine Learning domains. The
solution obtained can be a part of IDS framework as detection module, other aspects of IDS including, reporting
mechanism, alert generation etc are beyond scope of the work.
In the proposed technique, Random Projection (RP) is used for the features extraction and considered Principal
Component Analysis (PCA) model to properly analysed the data for Opposition Based Learning (OBL) genetic
algorithm to achieve good speed and accuracy for an Intrusions Detection under machine learning techniques.
As described in the given below complete flow diagram of proposed technique, features extracted from selected
KDD99Cup dataset after implementing required fusion system then training set tested after applying Opposition
Based Learning technique and generate chromosomes with anti-chromosomes. OBL is one of the technique to
concentrate on opposite state or circumstances. Relevant chromosome produced and calculated its required
accuracy and compared with previously available machine learning techniques during experimental results. Also
identified speed of proposed technique reliability with others and reduced the complexity of problem.
Dataset
Feature Selection
Feature Fusion System
Training
Testing
Opposition Based Learning
OGA
chromosome
anti chromosome
Best Chromosome / Solution
Calculate the
accuracy
Figure1: Model Description - Intrusions Detection using OBL
OBL makes faster convergence and improves accuracy due to its Opposition factor as compared to other
machine learning techniques on-board.
The concept of opposition in OBL having key role, interplays between opposites normally provides state of
balance for any solution i.e., up/down, right/left, day/night, male/female, multiplication/division, hot/cold etc.
Here complete possibility, opposition based algorithm improves performance of any required solution
efficiently.
EXPERIMENTAL RESULTS
Results completely surrounded by accuracy and speed to intrusions detection. Most of the machine learning
techniques considered Detection and False Positive ratios to evaluate performances of any proposed system.
Given below are the experimental results with labelled table:
FUNCTION
F1
F2
F3
F4
ATTACK TYPES
U2R
L2R
DOS
PROBE
DETECTION RATIO
FALSE POSITIVE
11.2%
10.8%
88.6%
70.8%
39.2%
24.5%
6.1%
3.3%
Table1: Experimental Results
Overall results are more than 10% for a detection rate for each type of intrusion attacks but results for a false
positive rate above 24% for a U2R and L2R attacks. On the other hand, results are completely reliable for a DOS
and Probing with below 7% false positive rates but probability of missing intrusions observed with false negative
rate slightly dropped. Researchers normally involved to produce separate machine learning techniques for each
attack type. Here, we tried to simulate all four major attack types under OBL and also managed result differences
maximization with low percentage differences for each attack by considering their unique dimensions on priority
basis.
100
80
60
DR (%)
40
FPR (%)
20
0
U2R
L2R
Dos
PROBE
Figure2: Graph of Experimental Results
We also used masquerading users to detect and compare several intrusion attacks. The dataset was consists of 50
files representing each file to one user. Each file contains almost 15000 commands. First 5000 commands do not
contain any masquerade data but next 10000 commands contains with 100 blocks of 100 commands each. There
is no masquerade data for a 5000 commands and probability of masquerade was also 1% for any block after first
5000 commands but once a masquerade block appears then next blocks 80% probability to be masquerade.
Keeping in view, first 5000 commands used for a training.
Masquerade Dataset Specifications and Results
Total Users
50
Training Data
5000 commands
Testing Data
10000 commands
Block size
100
Average Detection Rate
98.6%
Average False Positive Rate
20.8%
Table2:Masquerade Experimental Results
Then we randomly selected 20 users with masquerade commands block data and applied proposed technique
with implemented features. During this, observed false positive rate increased but masquerade detection rate
remains 100% for all the randomly masquerade based selected users.
120
100
80
Detection Rate (%)
60
False Positive Rate
(%)
40
20
0
Figure3: Graph of Masquerade Experimental Results
Several techniques developed and implemented to gain maximum detection rate and decrease false positive
detection rate but due to statistical nature of masquerade dataset, results are not up-to the mark. Proposed
technique due to its close relevance with faster convergence and accuracy factors given valuable experimental
results. In short, masquerade detection is a way to collect specific user information and creating relevant profile.
These profiles based on general user information like session location, session time and duration, commands
issued during login session for any assigned or performed task etc. After completion of profile, user commands
log compares with completed profile. If doesn’t match then we normally consider as masquerade attack and
apply any proposed technique for the detection.
CONCLUSION AND FUTURE WORK
In the proposed work opposition based learning is deployed to intrusion detection. Keeping in mind the real
world situation where types of intrusions are changing rapidly and becoming complex, proposed framework
would be fast and accurate.
Previously researchers worked on several Machine Intelligence techniques to improve Intrusion Detection. Here,
we are considering Opposition based learning technique for intrusion detection and using some practical results
to consider this technique remarkable. OBL considering opposite guess for detection and this can be
approachable because intrusion attacks normally based on random techniques with totally opposite scenarios. In
short, OBL starts from random point and leads towards opposite location. Then Random projection and Principal
component analysis modelling techniques enhancing Opposition based learning proposed for intrusions
detection. RP gives support for a random projected data and PCA helps analysing that data. Proposed technique
given good results in-terms of required accuracy and speed of intrusions detection, which was main goal to
improve at efficient level.
According to our literature review, random point/number generation, in absence of a valuable knowledge, this is
a common practice to get initial guess or point randomly. If we look at the nature of Intrusion attacks then it will
be more appropriate, if random point is opposite guess.
In future, we are looking to continue this technique for a detection of a repeated intrusions attacks by emerging
additional machine intelligence techniques with Opposition based learning.
REFERENCES
[1]
A. Safdar, A. Farrukh, S. Waseem, “Remote-to-Local Intrusion Attacks Detection Using Incremental
Genetic Algorithm”, Department of Computer Science, FAST, Islamabad, Pakistan.
[2]
KDD Cup 1999 dataset: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html, (1999)
[3]
S. Maheshkumar, S. Gursel, “Application of Machine Learning Algorithms to KDD Intrusion Detection
Dataset within Misuse Detection Context”, EECS Dept, University of Toledo, Ohio 43606 USA.
[4]
T. R. Hamid, “Opposition Based Learning: A New Scheme for Machine Intelligence”, Pattern Analysis
and Machine Intelligence Lab, System Design Engineering, University of Waterloo, Ontario, Canada.
[5]
M. John, “Testing Intrusion Detection Systems: A Critique of the 1998 and 1999 DARPA Intrusion
Detection System Evaluations as Performed by Lincoln Laboratory”, Carnegie Mellon University,
Pittsburgh.
[6]
C. Carlos, A. Enrique, “ Evolutionary Algorithms”, Dept. Lenguajes y Ciencias de la Computacion, ETSI
informatica, University de Malaga, Campus de Teatinos, 29016 – Malaga – Spain, 2004.
[7]
T. R. Hamid, “Opposition Based Reinforcement Learning”, Pattern Analysis and Machine Intelligence
Lab, System Design Engineering, University of Waterloo, Ontario, Canada.
[8]
S.Chavan, K. Shah, N.Dave, S. Mukherjee, A. Abraham, and S. Sanyal, “Adaptive neuro-fuzzy intrusion
detection systems,” itcc, vol.01, p.70, 2004
[9]
S. B. Idris, N. B. , “Artificial intelligence techniques applied to intrusion detection,” in INDICON, 2005
Annual IEEE, pp.52–55–11–13, December2005.
[10]
http://people.revoledu.com/kardi/tutorial/ReinforcementLearning/index.html, last visited: 20-11-2010.
[11]
http://www.schonlau.net/intrusion.html (Masquerading dataset)
Download