4-Latent Semantic Analysis

advertisement

Wired/Wireless Intrusion Detection System Using Heuristic Based

Principal Component Analysis

Abstract

Normally Principle Component Analysis(PCA)is need to detect intrusion by transforming a set of multivariate observations to a lower dimension space retaining the variability of the origin data from any change .However PCA is successful in reducing dimensionality ,but it doesn't take the labels into account and it fails to present the data in a way to be analyzed, but wireless traffic is non-linear and therefore it is not feasible for PCA ,In this research Latent Semantic Analysis

(LSA)is proposed to reveal the variables in data.

We are intending to introduce superior algorithm to frame Dynamic Principle Component

Analysis (DPCA)in a heuristic fashion ,this achievement will be explored in properties of emerging platforms such as smartness and mobility ,we need to merge DPCA and LSA to reveal semantics over variables.

So for, a group of algorithms have been created and the testing and analysis included transferring friendly packets and intruding packets .the simulation using data mining and

Artificial Intelligence(AI) showed how the intruding packets were detected and analysed,this analysis has taken the stationary networks. The new stage of this research will take the mobility into account all different speeds and directions.

Motivation of this Proposal

1- PCA works under the following restricted assumption:

• the distribution of events occurred within data flows is normal

No auto-correlation is exist among observations

1

Variables are stationary

Mobile and Wireless networks are producing dynamic environment and auto-correlation in variables is possible, thus time lags of the time series is incorporating within vectors describing the observation.

2- Wireless and Mobile network traffic have observations that hold semantic among variables, and this semantic can be recruited to produce smart PCA of the data set.

3- Wireless and Mobile data flows is non-stationary, especially, in handoff and resuming points which weakening the reliability of results obtained by PCA and DPCA.

4- PCA and dynamic PCA do not take in account the semantic interpretation of the variables describe the observation while new emerged wireless and mobile networks are working in a smart environment; this smart environment imposes semantic relationships among variables.

Introduction

With the explosive rapid expansion of computers in last decade and so, their security has become an important issue, Security is important in any environment . As large information is available on the network and it is possible to share this data through it, it should be secure. It is somewhat defined in wired network but in wireless there is great challenge of different attacks. The process of monitoring the events occurring in a computer system and analyzing them for identifying intrusions is known as intrusion detection technique and the system is known as intrusion detection system (IDS). [1]

Intrusion Detection System (IDS) is an important detection used as a countermeasure to preserve data integrity and system availability from attacks. Intrusion Detection Systems (IDS) is a combination of software and hardware that attempts to perform intrusion detection. Intrusion detection is a process of gathering intrusion related knowledge occurring in the process of monitoring the events and analyzing them for sign or intrusion. It raises the alarm when a possible

2

intrusion occurs in the system. The network data source of intrusion detection consists of large amount of textual information, which is difficult to comprehend and analyze.[2]

Intrusion detection in wireless networks has gained considerable attention in the last few years.

Wireless networks are not only susceptible to TCP/IP-based attacks native to wired networks, they are also subject to a wide array of 802.11-specific threats. Such threats range from passive eavesdropping to more devastating denial of service attacks. To detect these intrusions classifiers are built to distinguish between normal and anomalous traffic.[3]

Principal Component Analysis (PCA) is a multivariate statistical method which models the linear correlation structure of a multivariate process from nominal historical data. PCA transforms a set of multivariate observations to a lower dimension orthogonal space, retaining the most variability of the original data . Because of the simplification and the orthogonal property obtained with PCA, this has been used with success for fault diagnosis issues.[2]

Theoretical Background

๐‘‰ฬ… ๐‘–

= ∑

๐‘ ๐‘—=1 ๐‘Ž๐‘ก๐‘ก๐‘Ÿ๐‘–๐‘ ๐‘—

∗ ๐‘ข ๐‘—

---- 1

๐‘† = {๐‘ฃ

1

, ๐‘ฃ

2

, … , ๐‘ฃ

๐‘€

} ---- 2

๐ถ = {๐‘

1

, ๐‘

2

, … , ๐‘

๐‘€

} ----3

๐‘ฝ ๐’Š

: Vector of attributes collected due to event occurred within the problem world ๐’‚๐’•๐’•๐’“๐’Š๐’ƒ ๐’‹

: Scalar value of an attribute in ๐‘ข ๐‘— direction

๐‘บ : Set of ๐‘‰ฬ… ๐‘–

Such that:

∀(๐‘ฃ ∈ ๐‘†) ∃(๐‘ ∈ ๐ถ) ๐ถ๐‘™๐‘Ž๐‘ ๐‘ ๐‘–๐‘“๐‘ฆ(๐‘, ๐‘ฃ)

Let

3

๐ท๐‘–๐‘š(๐‘ฃ ๐‘–

) = ๐ฟ and ๐ท๐‘–๐‘š(๐‘ฃ ๐‘–

) ๐‘ƒ๐ถ๐ด = ๐‘ƒ ๐‘ฆ๐‘–๐‘’๐‘™๐‘‘๐‘ 

→ ๐‘ƒ๐ถ๐ด: ๐ฟ → ๐‘ƒ , ๐‘คโ„Ž๐‘’๐‘Ÿ๐‘’ ๐‘ƒ < ๐ฟ ๐‘ฆ๐‘–๐‘’๐‘™๐‘‘๐‘ 

→ ∀๐‘Ž๐‘ก๐‘ก๐‘Ž๐‘๐‘˜ ∃๐‘Ž๐‘ก๐‘ก๐‘Ÿ๐‘–๐‘๐‘ข๐‘ก๐‘’ (๐‘‰๐‘Ž๐‘Ÿ๐‘–๐‘’๐‘›๐‘๐‘’(๐‘Ž๐‘ก๐‘ก๐‘Ÿ๐‘–๐‘๐‘ข๐‘ก๐‘’) > ๐‘กโ„Ž๐‘Ÿ๐‘’๐‘ โ„Ž๐‘œ๐‘™๐‘‘)

→ ๐‘†๐‘–๐‘”๐‘›๐‘Ž๐‘ก๐‘ข๐‘Ÿ๐‘’(๐‘Ž๐‘ก๐‘ก๐‘Ž๐‘๐‘˜, ๐‘Ž๐‘ก๐‘ก๐‘Ÿ๐‘–๐‘๐‘ข๐‘ก๐‘’) and

๐‘ƒ๐ถ๐ด โ„Ž๐‘’๐‘ข๐‘Ÿ๐‘–๐‘ ๐‘ก๐‘–๐‘๐‘ 

: ๐ฟ → ๐พ , ๐‘คโ„Ž๐‘’๐‘Ÿ๐‘’ ๐พ < ๐‘ƒ < ๐ฟ ๐‘‚๐‘… (๐‘ƒ < ๐พ < ๐ฟ → ๐ผ๐ท๐‘† โ„Ž๐‘’๐‘ข๐‘Ÿ๐‘ ๐‘ก๐‘–๐‘๐‘ 

๐พ

> ๐ผ๐ท๐‘†

๐‘ƒ

)

Facts and Axioms

PCA is a statistical orthogonal transformation

PCA is combined with knowledge guiding to reduce noise and probabilistic behavior

(e.g., PCA + ANN)

PCA is successful in reducing the dimensionality of the data sets but it does not take the parameters or labels into accounts; this way it fails to represent the data in a way that simplifies the interpretation of underlying parameters

Attributes in KDD are divided into three groups: basic features, content features, and statistical features of network connection.

Classes in KDD dataset are mainly categorized into five classes: Normal, denial of service (DoS), remote to user (R2L), user to root (U2R), and probing.

new classes of threats are added by emerging platforms such as cloud and Smartphones

Hypothesis To be investigated and pursued

1-Hypothesis 1: Reduced set of attributes is a local non-complete set over network attack domain (hint: PCA and Heuristic methodology are domain specific)

4

2-Hypothesis 2: Mobile network adds new dimensions to the vector of security attributes due to dynamic architecture it imposes.

3-Hypothesis 3: time series analysis of occurrence is a crucial value in perceiving network threats and events (hint: dynamic principal component analysis)

4-Hypothesis 4: application level attributes are crucial values in detecting intrusions in application level distributed systems (i.e., web services over the cloud)

5-Hypothesis 5: LSA (Latent Semantic Analysis) increases the performance of

DPCA algorithm and produces more reliable results (Hint: LSA is executed in parallel with DPCA)

The Proposed Scheme

Considering the limitation of the conventional PCA, figure (1) presents the proposed scheme where dynamic PCA (DPCA) has been suggested to monitor non-stationary data of network and conduct on-line means estimation; this is combined with LSA (Latent Semantic

Analysis) which is proposed to reveal semantics over variables and change the objective function of DPCA according revealed semantics. DPCA extracts time-dependent relationship in the measurement through augmenting the measured data matrix by time lagged measured variables.

Training

Samples

Traffic Tracer

5 and Sampler

Traffic capture

Dynamic

PCA

Heuristic

Attribute vector

Configuration & control

LSA

Attribute estimator

On-line mean estimator

Initial Training Samples

Ontolog y

Initial Dataset

Training Examples

ID3

Generate

Decision tree

1-KDD 99 Data Set

Decision Tree

The KDD Cup 1999 Intrusion detection contest data was prepared by DARPA Intrusion a true Air Force environment, but peppered it with multiple attacks. The raw data was processed into connection records. Most of the researchers use this KDD99 data set as input to their approaches.

There are main 4 attacks in KDD99 dataset.

1) Denial of Service Attack (DoS): is an attack in which the attacker makes some computing or memory resource too busy or too full to handle legitimate requests, or denies legitimate users access to a machine.

2) Remote to Local Attack (R2L): occurs when an attacker who has the ability to send packets to a machine over a network but who does not have an account on that machine exploits some vulnerability to gain local access as a user of that machine .

6

3) User to Root Attack (U2R): is an attack in which attacker starts out with access to a normal user account on the system and is able to exploit some vulnerability to gain root access in system.

4) Probe Attack: is an attempt to gain access to a computer and its files through a known or probable weak point in the computer system.[4,5]

2-Data Mining

Data mining is the art and science of intelligent data analysis. The aim is to discover meaningful insights and knowledge from data. Discoveries are often expressed as models, and we often describe data mining as the process of building models. A model captures, in some formulation, the essence of the discovered knowledge. A model can be used to assist in our understanding of the world. Models can also be used to make predictions.

For the data miner, the discovery of new knowledge and the building of models that nicely predict the future can be quite rewarding. Indeed, data mining should be exciting and fun as we watch new insights and knowledge emerge from our data. With growing enthusiasm, we meander through our data analyses, following our intuitions and making new discoveries all the time_discoveries that will continue to help change our world for the better.. Data mining Data

Mining has been applied in most areas of endeavor. There are data mining teams working in business, government, financial services, biology, medicine, risk and intelligence, science, and engineering. Anywhere we collect data, data mining is being applied and feeding new knowledge into human endeavor.and one of the important data mining methods is the decision tree.[6]

2.1 Decision tree

A decision tree is one of the most widely used supervised learning methods used for data exploration. It is easy to interpret and can be re-represented as If-thenelse rules. A decision tree consists of nodes and branches connecting the nodes.

The nodes located at the bottom of the tree are called leaves and indicate classes,

7

A decision tree aids in data exploration in the following manner :

•It reduces a volume of data by transformation into a more compact form that preserves the essential characteristics and provides an accurate summary.

•It discovers whether the data contains well-separated classes of patterns, such that the classes can be interpreted meaningfully in the context of a substantive theory.

•It maps data in the form of a tree so that prediction values can be generated by backtracking from the leaves to its root. This may be used to predict the outcome for a new data or query.[7]

The most popular decision tree algorithm is ID3. The following subsections explain basic concepts of ID3 algorithm:

2.1.1 ID3 Algorithm

Based on Hunt’s algorithm, Quinlan developed an algorithm called ID3, in which he used

Shannon’s entropy as a criterion for selecting the most significant/discriminatory feature:

Entropy(S)

= ∑ ๐‘ ๐‘–=1

−๐‘ ๐‘–

.

log

2 ๐‘ ๐‘–

(1) where p_i is the proportion of the patterns belonging to the ith class.

The uncertainty in each node is reduced by choosing the feature that most reduces its entropy

(via the split). To achieve this result, Information Gain (InfoGain) that measures expected reduction in entropy caused by knowing the value of a feature F_j, is used:

InfoGain(S, F_j) =Entropy(S) −

∑ ๐‘ฃ ๐‘–

∈๐‘‰

๐น๐‘—

|๐‘† ๐‘ฃ๐‘–

|

|S|

. ๐ธ๐‘›๐‘ก๐‘Ÿ๐‘œ๐‘๐‘ฆ(๐‘† ๐‘ฃ ๐‘–

)

(2) where V_(F_j ) is a set of all possible values of feature F_j and S_(v_i )is a subset of S for which feature F_j has value v_i.

8

The InfoGain is used to select the best feature (reducing the entropy by the largest amount) at each step of growing a decision tree. To compensate for the bias of the InfoGain for features with many outcomes, a measure called the Gain Ratio is used:

๐ผ๐‘›๐‘“๐‘œ๐บ๐‘Ž๐‘–๐‘›(๐‘†,๐น ๐‘—

)

GR(S, F_j) =

๐‘†๐‘๐‘™๐‘–๐‘ก ๐ผ๐‘›๐‘“๐‘œ๐‘Ÿ๐‘š๐‘Ž๐‘ก๐‘–๐‘œ๐‘›(๐‘†,๐น ๐‘—

)

(3) where

Split Information(S,F_j) =

๐ถ ๐‘–=1

|๐‘† ๐‘–

|

|S|

. log

2

(

|๐‘† ๐‘–

|S|

|

)

(4)

The Split Information is the entropy of S with respect to values of feature F_j. In a situation when two or more features have the same value of InfoGain the feature that has the smaller number of values is selected. Use of the GR results in the generation of smaller trees .[8]

Algorithm (2.2) ID3

Input: S a set of training examples.

Output : A decision tree.

Steps:

1. Create the root node containing the entire set S

2. If all examples are positive, or negative, then stop: decision tree has one node.

3. Otherwise (the general case).

Select feature F_j that has the largest GR value

For each value v_i from the domain of feature F_j:

(a) add a new branch corresponding to this best feature value v_i, and a new node, which stores

9

all the examples that have value v_i for feature F_j

(b) if the node stores examples belonging to one class only, then it becomes a leaf node, otherwise below this node add a new subtree, and go to step 3

4. End

3-Principle Component Analysis

Principal component analysis (PCA) is a statistical analysis of data in an effective way. With the aim of space in the data as much as possible to find a set of data variance explained by a special matrix, the original projection of high dimensional data to lower dimensional data space, and retains the main information data in order to deal with data information easily.

Principal component analysis is a feature selection and feature extraction process, its main goal is to enter a large search space characteristics of a suitable vector, and the characteristics of all the main features extracted. Characteristics of the selection process is to achieve the characteristics of the input space from the space map, the key to this process is to select feature vectors and input at all the features on the projector, making these projectors feature extraction can meet both the requirements of the smallest error variance. For a given M-dimensional random vector

X =[×

1

+

×

2

+…..+× m

] T For its mean[X]=0หˆThe covariance Expressed as follows:

C

X=

E[( X –E[ X])( X –E[ X]) T ] (1)

Because of E[X]=0, covariance matrix is therefore autocorrelation matrix

Cx= E[ XX T ] (2)

Calculation eigenvalues of Cx ฦ›

1, ฦ›

2,

…,ฦ› m and the corresponding normalized eigenvector ω

1

2,

…,ω m, the following equation

10

Cx

ω

j =

ฦ›

i

ω

j

i=1,2,…,m (3)

Where ω j

=[ ω j1,

ω j2,…,

ω jm

] T .Eigenvector here ω

1,

ω

2 ,…

ω m is to satisfy the characteristics of the input conditions. Eigenvalue based ฦ›

1

≥ ฦ›

2

...

ฦ› m ,

The Y i =

ω i

T X i= 1,2,…,m feature vector is input to the projector, express the matrix as follows:

Y= ω T X (4)

With a linear combination of eigenvectors can be reconfigurable X, The following formula:

X=ωY= ∑ ๐‘š ๐‘–=1

๐‘Œ๐‘– ๐œ”๐‘– (5)

Characteristics obtained through the selection of all the principal components, and in the feature extraction process, then select the main features to achieve the purpose of dimensionality reduction.

Y to the mean of the vector analysis

E[Y]= E[ω T X ]= ω T E[X ] = 0 (6) since the covariance matrix C

Y is the autocorrelation of the matrix Y,be:

C

Y

= E[Y Y T ]= E[ω T X X T ω]= ω T E[X X T ω ] = 0 (7)

ω for X because of the eigenvectors matrix, so there i

ฦ›

1

0 โ‹ฏ 0

0 ฦ›

2

0 0

11

C

Y

= โ‹ฎ โ‹ฎ โ‹ฑ โ‹ฎ

0 0 โ‹ฏ ฦ› m

(8)

In the truncated Y, it is necessary to ensure the cut-off is the sense of mean square deviation is the optimal. ฦ›

1

, ฦ›

2,…,

ฦ› m

can only consider the first L largest eigenvalues, with these characteristics for Reconstruction of X , the estimated value of reconstruction is as follows:

แบŒ = ∑ ๐ฟ ๐‘–=1 ๐œ” i

Y i

(9)

Its variance are met as follows: ะต

L =

E[( X-

แบŒ) 2 ]= ∑ ๐‘€ ๐‘–=๐ฟ+1 ฦ›๐‘– (10)

According to the formula (10), The current characteristic value L is larger, the minimum mean square error can be achieved. Also the formula as follows:

∑ ๐‘š ๐‘–=1 ฦ›i = ∑ ๐‘š ๐‘–=1 ๐‘ž๐‘–๐‘– (11)

Where q ii

is the diagonal matrix element of C

X

, the contribution rate of variance as follows:

12

When ๐œ‘ (L) is large enough, you can pre-L constitute a feature vector space ω

1

, ω

2

,… ω

L

as a low-dimensional projection space, thus completing the deal with dimensionality reduction.[9]

4-Latent Semantic Analysis

LSA is a new algebraic model of information retrieval, proposed by Landauer and Dumais et al.

It is a calculation theory and method for knowledge acquisition and representation that has been applied to information retrieval, question answering system.[10]

LSA has been widely used to analyze the latent semantics of documents in an unsupervised way by exploring the relationships between a set of terms and a corpus of documents and a set of latent topics .The mathematical formulation of LSA is based on the singular value decomposition

(SVD) of matrices, which imposes the restriction that all latent topics are mutually orthogonal, which is not always proper or reasonable for real-world applications.[11]

LSA derives the meaning of terms from approximating the structure of term usage among documents through SVD.This underlying relationship between terms is believed to be mainly due to transitive relationships between terms, that is, terms are similar if they cooccur with the same terms within files.

Constructing a latent semantic space model relies on the process of Singular Value

Decomposition (SVD) which can be expressed by Formula:

13

X=VSD T (1)

Where X is an m-by-n matrix whose rows denote the features and columns denote the documents.m is the number of features and n is the number of documents in the training corpus. The three matrices V, S, D on the right of the equation are the results of the process of SVD of X matrix. The S matrix is a diagonal matrix, the values on the diagonal are the singular values(the positive square roots of eigenvalues of XXT (or XTX) matrix), and those values are distributed on the diagonal in descending order. V matrix is composed of the eigenvectors of XXT matrix and these eigenvectors correspond to singular values’ order, D matrix is composed of the eigenvectors of XTX matrix and these eigenvectors’ order also corresponds to singular values’ order. If we use “r” to denote the number of XXT

(or XTX) matrix’s positive eigenvalues, V will be an m-by-r matrix and D will be an n-by-r matrix. We view V as a matrix which describes the features in latent semantic space, and view D as a matrix which denotes the documents in latent semantic space. When we truncate the three matrices to k dimensions, we will get a model with lower dimensions.

The dimensions in the Latent semantic space model represent latent concepts, so both the features and the documents will be described by latent concepts.

When we describe a new document Q in this space, we can use the formulas (2) ~

(5):

X=VSD T → D T = ( VS ) -1 X (2)

Because both V and D are orthogonal matrices, we have

14

V -1 =V H (3)

Then

D T =SV H X (4)

So, the new document can be mapped into latent semantic space through (5):

Q=XVS (5)

It’s clear that Q can be directly mapped into latent semantic space by using the VS matrix.[12]

15

16

17

Related Works

In [1], two drawback of PCA have been investigated, first one is the assumption of the existing of linear relationships among process variables, and second one is the challenge of process dynamics which it hasn’t been considered due to fact that PCA is created for analyzing steady state processes, thus it is not able to handle any process dynamics. The authors presented in this work a PCA based multivariate time-series segmentation method which addressed the first drawback, and dynamic extension or multivariate timeseries segmentation has been developed to segment these series based on the changes in process dynamics.

1In [2], a modification to the DPCA algorithm for fault detection has been proposed, in which an appropriate standardization with respect to on-line estimated statistical parameters is carried out if simple healthy relations between variables can be obtained.

18

This idea allows to deal with non-stationary signals and to reduce significatively the rate of false alarms. It was shown through a series of tests the effectiveness of the proposed fault detection algorithm to distinguish between normal changes in signals and the variations due to the presence of faults.

2 In [3] Baig M. N. et al., present model for feature selection uses the information gain ratio measure as a means to compute the relevance of each feature and the k-means classifier to select the optimal set of MAC layer features that can improve the accuracy of intrusion detection systems while reducing the learning time of their learning algorithm.

The optimization of the feature set for wireless intrusion detection systems on the performance and learning time of different types of classifiers based on neural networks.

Experimental results with three types of neural network architectures clearly show that the optimization of a wireless feature set has a significant impact on the Efficiency and accuracy of the intrusion detection system. In [13] Neelakantan N. P. et al., present that

802.11 network, the features used for training and testing the intrusion detection systems consist of basic information related to the TCP/IP header, with no considerable attention to the features associated with lower level protocol frames. The resulting detectors were

efficient and accurate in detecting network attacks at the network and transport layers, but unfortunately, not capable of detecting 802.11-specific attacks such as deauthentication attacks or MAC layer DoS attack. In [14] Al-Janabi S. T. et al., they tend to develop an anomaly based intrusion detection system (IDS) that can promptly detect and classify various attacks. Anomaly-based IDSs need to be able to learn the dynamically changing behavior of users or systems. they are experimenting with packet behavior as parameters in anomaly intrusion detection. There are several methods to assist IDSs to learn system's behavior. Their proposed IDS use a back propagation artificial neural network (ANN) to learn system's behavior. They have used the KDD'99 data set in our experiments and the obtained results satisfy the work objective.

In [15] Reddy E. K. et al, they see network security technology has become crucial in protecting government and industry computing infrastructure. Modern intrusion detection applications facing complex problems. These applications has to be require reliable, extensible, easy to manage, and have low maintenance cost. In recent years, data mining-based intrusion detection

19

systems (IDSs) have demonstrated high accuracy, good generalization to novel types of intrusion, and robust behavior in a changing environment. Still, significant challenges exist in the design and implementation of production quality IDSs. Instrumenting components such as of data transformations, model deployment, cooperative distributed detection and complex engineering endeavor. In [16] Suebsing A. et al, see in the previous researches on feature selection, the criteria and way about how to select the features in the raw data are mostly difficult to implement. Therefore, this work presents the easy and novel method, for feature selection, which can be used to separate correctly between normal and attack patterns of computer network connections. The goal in their work is to effectively apply Euclidean Distance for selecting a subset of robust features using smaller storage space and getting higher Intrusion detection performance. Experimental results show that the proposed approach based on the Euclidean

Distance can improve the performance of a true positive intrusion detection rate especially for detecting known attack patterns. In [17] Bensefia H. et al., propose a new approach for IDS adaptability by integrating a Simple Connectionist Evolving System (SECOS) and a Winner-

Takes-All (WTA) hierarchy of XCS (eXtended Classifier System). This integration puts in relief an adaptive hybrid intrusion detection core that plants the adaptability as an intrinsic and native functionality in the IDS. In[18] Dr. Saad K. Majeed present a proposal Hybrid Multilevel

Network Intrusion Detection System (HMNIDS) which is a "hybrid multilevel IDS", is hybrid because use misuse and anomaly techniques in intrusion detection, and is multilevel since it apply the two detection techniques hierarchal in two levels. First level applies anomaly ID technique using Support Vector Machine (SVM) for detecting the traffics either normal or intrusions, if normal then passes it else the system input the intrusion traffic to the second level to detect the class of intrusion where this level apply Misuse ID technique using Artificial

Neural Networks (ANN). The proposal depend on Data mining is a DM-based HMNIDS since mining provide iterative process so if results are not satisfied with optimal solution, the mining steps will continue to be carried out until mining results are corresponding intention results. For training and testing of MHNIDS in our experiment, we used NSL-KDD data set. It has solved some of the inherent problems of the KDD’99. NSL-KDD similar to KDD99 their connections contains 41 features and is labeled as either normal or attack type, many of these features are irrelative in classification process. Principle Component Analysis (PCA) is used as feature

20

extraction to reduce no. of features to avoid time consuming in training and real-time detecting.

PCA introduce 8 features as subset of correlated intrinsic features present the basic point in classification. The sets of features that have been resulted from PCA and the all features set will be the feeding of HMNIDS. The results obtained from HMNIDS showing that accuracy rate of SVM and ANN classifiers separately are both high but they are higher with PCA (8) features than all (41) features. Confusion matrix of HMNIDS gives high detection rates and less false alarm rate, also they are higher with (8) PCA than all

(41).In [19] Dr. Saad K. Majeed presents a proposal Wireless Network Intrusion Detection

System (WNIDS) which is use misuse and anomaly techniques in intrusion detection. The proposal depend on Data mining is a DM-based WNIDS since mining provide iterative process so if results are not satisfied with optimal solution, the mining steps will continue to be carried out until mining results are corresponding intention results. For training and testing of WNIDS in our experiment, we used collected dataset called it Wdataset, the collection done on an organized

WLAN 802.11 consist of 5 machines. The collection of data involved frames from all types

(normal and the four known intrusions and unknown intrusion).

The collected connections contain features those appear directly in the header of 802.11 frames and we added one more feature (casting) since it is critical in distinguish among intrusions.

These connections are labeled as either normal or attack type, many of these features are irrelative in classification process. Here we propose Support Vector Machine SVM classifier as feature extraction to reduce no. of features to avoid time consuming in training and real-time detecting. SVM introduce 8 features as subset of correlated intrinsic features present the basic point in classification. The sets of features that have been resulted from SVM and the all features set will be the feeding of WNIDS.

The results obtained from WNIDS showing that accuracy rate of ANN and ID3 classifiers are both higher with SVM (8) features than set of all features. And absolutely, ANN accuracy is higher than ID3 with both sets of features.

References

21

1Zoltan Banko, Laszlo Dobos, and Janos Abonyi, “ Dynamic Principal Component

Analysis in Multivariate Time-Series Segmentation”, 2011,

2-

Jesus Mina and Cristina Verde,” Fault Detection for Large Scale Systems Using

Dynamic Principal Components Analysis with Adaptation”, International Journal of

Computers, Communications & Control, Vol. II, 2007

3Baig M. N. and Kumar K. K. , “Intrusion Detection in Wireless Networks Using

Selected Features”, (IJCSIT) International Journal of Computer Science and Information

Technologies, Vol. 2 (5) , 2011, 1887-1893.

4Vidit Pathak, Dr. Ananthanarayana V. S." A Novel Multi-Threaded K-Means Clustering

Approach for Intrusion Detection",978-1-4673-2008-5/12/$31.00 ©2012 IEEE.

5"KDD Cup 1999 Data", The UCI KDD Archive, Information and Computer Science,

University of California, Irvine, 1999, available at: http://kdd.ics.uci.edu/databases/kddcup99/kddcup99.html.

6Chetan R & 2Ashoka D.V." Data Mining Based Network Intrusion Detection System: A

Database Centric Approach"2012 International Conference on Computer

Communication and Informatics (ICCCI -2012), Jan. 10 – 12, 2012, Coimbatore,

INDIA"

7Krzysztof J. Cios, Witold Pedrycz, Roman W. Swiniarski, and Lukasz A. Kurgan, "Data

Mining A Knowledge Discovery Approach", Springer, 2007.

8Jiawei Han, and Micheline Kamber, "Data Mining: Concepts and Techniques", Morgan

Kaufmaan Publishers, 2006.

9Chen Yu, Zhang jian ,Yi Bo, Chen Deyun" A Novel Principal Component Analysis

Neural Network Algorithm for Fingerprint Recognition in Online Examination System

" 2009 Asia-Pacific Conference on Information Processing.

10Wei Song and Soon Cheol Park" Analysis of Web Clustering Based on Genetic

Algorithm with Latent Semantic Indexing Technology" Sixth International Conference on Advanced Language Processing and Web Information Technology, 0-7695-2930-5/07

$25.00 © 2007 IEEE DOI 10.1109/ALPIT.2007.77.

11Sheng-Yi Kong and Lin-Shan Lee" Semantic Analysis and Organization of Spoken

22

Documents Based on Parameters DerivedFrom Latent Topics" IEEE TRANSACTIONS

ON AUDIO, SPEECH, AND LANGUAGE PROCESSING, VOL. 19, NO. 7,

SEPTEMBER 2011

12Dongfeng Cai, Liwei Chang, Duo Ji," LATENT SEMANTIC ANALYSIS BASED ON

SPACE INTEGRATION", Proceedings of IEEE CCIS2012, 978-1-4673-1857-

0/12/$31.00 ©2012 IEEE.

13Neelakantan N. P., Nagesh C. and Tech M.., “Role of Feature Selection in Intrusion

Detection Systems for 802.11 Networks”, International Journal of Smart Sensors and Ad

Hoc Networks (IJSSAN) Volume-1, Issue-1, 2011.

14Al-Janabi S. T., and Saeed H. A., “A Neural Network Based Anomaly Intrusion

Detection System”, IEEE Computer Society, 2011 Developments in E-systems

Engineering, pp. 221-226.

15-

Reddy E. K. , Reddy V. N., Rajulu P. G., “A Study of Intrusion Detection in Data

Mining”, Proceedings of the World Congress on Engineering 2011 Vol III WCE 2011,

July 6 - 8, 2011, London, U.K.

16Suebsing A., Hiransakolwong N. , “Euclidean-based Feature Selection for Network

Intrusion Detection”, 2009 International Conference on Machine Learning and

Computing IPCSIT vol.3 (2011) © (2011) IACSIT Press, Singapore.

17-

Bensefia H. and Ghoualmi N., “A New Approach for Adaptive Intrusion Detection”,

2011 Seventh International Conference on Computational Intelligence and Security,

2011.

18Saad K. Majeed, Soukaena H. Hashem and Ikhlas K. Gbashi," Propose HMNIDS

Hybrid Multilevel Network Intrusion Detection System",IJCSI International Journal of

Computer Science Issues, Vol. 10, Issue 5, No 2, September 2013.

19Saad K. Majeed, Soukaena H. Hashem, Ikhlas K. Gbashi" Proposal to WNIDS Wireless

Network Intrusion Detection System" IJSR - INTERNATIONAL JOURNAL OF SCIENTIFIC

RESEARCH , Volume : 2 | Issue : 10 | October 2013 • ISSN No 2277 – 8179

23

Download