Backgrounds - Soft Computing Lab.

advertisement
Evolutionary Neural Networks
1
Backgrounds
Why NN+EC?
• “Evolving brains”: Biological neural networks compete and evolve
– The way that intelligence was created
Optimal solution
• Global search
Local Max
Initial weights
Population Samples
• Adaptation to dynamic environments without human intervention
– Architecture evolution
2
Backgrounds
General Framework of EANN
[X. Yao]
3
Backgrounds
Evolution of Connection Weights
1. Encode each individual neural network’s connection weights into
chromosomes
2. Calculate the error function and determine individual’s fitness
3. Reproduce children based on selection criterion
4. Apply genetic operators
4
Backgrounds
Representation
•
Binary representation
– Weights are represented by binary digits
• e.g. 8 bits can represent connection weights between
-127 and +127
– Limitation on representation precision
• too few bits → some numbers cannot be approximated
• too many bits → training might be prolonged
•
To overcome binary representation, some proposed using real number
– i.e., one real number per connection weight
•
Standard genetic operators such as crossover not applicable to this
representation
– However, some argue that it is possible to perform evolutionary
computation with only mutation
– Fogel, Fogel and Porto (1990): adopted one genetic operator –
Gaussian random mutation
5
Backgrounds
Evolution of Architectures
1. Encode each individual neural network’s architecture into
chromosomes
2. Train each neural network with predetermined learning rule
3. Calculate the error function and determine individual’s fitness
4. Reproduce children based on selection criterion
5. Apply genetic operators
6
Backgrounds
Direct Encoding
• All information is represented by binary strings, i.e. each connection
and node is specified by some binary bits
• An N by N matrix C  (cij ) N  N can represent the connectivity with N
nodes, where
 1, if connection is ON
cij  
0, if connection is OFF
• Does not scale well since large NN needs a big matrix to represent
7
Backgrounds
Indirect Encoding
• Only the most important parameters or features of an architecture
are represented. Other details are left to the learning process to
decide
– e.g. specify the number of hidden nodes and let the learning
process decide how they are connected (e.g. fully connected)
• More biologically plausible as it is impossible for genetic information
encoded in humans to specify the whole nervous system directly
according to the discoveries of neuroscience
8
Backgrounds
Evolution of Learning Rules
1. Decode each individual into a learning rule
2. Construct a neural network (either pre-determined or randomly) and
train it with decoded learning rule
• refers to adapting the learning function, in this case, the
connection weights are updated with an adaptive rule
3. Calculate the error function and determine individual’s fitness
4. Reproduce children based on
selection criterion
5. Apply genetic operators
9
Two Case Studies
• Evolving intrusion detector
• Evolving classifier for DNA microarray data
10
Evolutionary Learning Program’s Behavior
In Neural Networks for Anomaly Detection
11
Motivation (1)
• Attacker’s strategy: Leading to malfunctions by using program’s bug
– Showing different behavior compared to normal one
• Anomaly detection
– Learning normal program’s behavior from audit data
– Classifying programs which show different behavior with normal
one as intrusion
– Adopted in many host-based intrusion detection system
• System audit data and machine learning techniques
– Basic security module (BSM)
– Rule-based learning, neural network and HMM
12
Motivation (2)
• Machine learning methods such as Neural network (NN) and HMM
– Effective for intrusion detection based on program’s
behavior
• Architecture of classifier
– The most important thing in classification
– Searching for appropriate architecture for the problems is crucial
• NN: the number of hidden neurons and connection
information
• HMM: the number of states and connection information
• Traditional methods
– Trial-and-error
• Train 90 neural networks [Ghosh99]
 It took too much time because the size of audit data is too large
Optimizing architectures as well as connection weights
13
Related Works
• S. Forrest (1998, 1999)
– First intrusion detection by learning program’s behavior
– HMM performed better than other methods
• J. Stolfo (1997) : Rule-based learning (RIPPER)
• N. Ye (2001)
– Probabilistic methods: Decision tree, chi-square multivariate test
and one order Markov chain model (1998 IDEVAL data)
• Ghosh (1999, 2000)
– Multi-layer perceptrons and Elman neural network
– Elman neural network performed the best (1999 IDEVAL data)
• Vemuri (2003)
– kNN and SVM (1998 IDEVAL data)
14
The Proposed Method
• Architecture
– System call audit data and evolutionary neural networks
Audit Data
Normal Profile
ps
NNps
su
.
.
.
ping
15
NNat
NNlogin
.
.
.
NNping
Detector
login
GA Modeler
Preprocessor
BSM Audit Facility
at
NNsu
ALARM
Normal Behavior Modeling
• Evolutionary neural networks
– Simultaneously learning weights and architectures using
genetic algorithm
– Partial training: back-propagation algorithm
– Representation: N  N matrix
• Rank-based selection, crossover, mutation operators
• Fitness evaluation : Recognition rate on training data (mixing real
normal sequences and artificial intrusive sequences)
Generating neural networks with optimal architectures
for learning program’s behavior
16
ENN (Evolutionary Neural Network)
Algorithm
Generate initial ANNs
BSM data
Train the ANNs partially
Data separation
Compute the fitness
Rank-based seletion
Apply crossover and mutation
Test
data
Training
data
Generate new generation
Stop?
Yes
Train the ANNs fully
Evaluation
17
No
Representation
I1
I 1  0.0
H 1 0.4
H 2  0.5

H 3  0.0
O1  0.1
H1
1.0
0.0
0.0
0.0
0.7
H2
1.0
0.0
0.0
0.1
0.2
Input Node
Hidden Node
Output Node
H3
0.0
0.0
1.0
0.0
0.7
O1
1 .0 
1.0 
1 .0 

1 .0 
0.0
I1
I 1  0.0
H 1 0.4
H 2  0.5

H 3  0.0
O1  0.1
Generation of
Neural Network
I1
0.5
H2
0.1
H3
0.7
0.2
0.7
0.1
H1
1.0
0.0
0.0
0.0
0.7
H2
1.0
0.0
0.0
0.1
0.2
Weight
18
0.4
H1
H3
0.0
0.0
1.0
0.0
0.7
O1
1 .0 
1.0 
1 .0 

1 .0 
0.0
Connectivity
O1
Crossover (1)
0.4
I1
0.5
H1
H2
0.7
0.2
0.4
I1
19
0.5
0.4
0.5
0.4
0.1
0.7
H3
0.1
0.1
I1
O1
H3
0.7
0.1
0.5 O1
H2
H3
0.1
Crossover
H1
0.2
0.1
H2 0.5 O1
H1
0.1
I1
0.5
H1
H2
0.2
0.2 O1
0.1
0.7
H3
Crossover (2)
I1
H1
H2
H3
O1
I1
0.0
0.4

0.5

0.0
 0.1

H 1 H 2 H 3 O1
1.0 1.0 0.0 1.0 
0.0 0.0 0.0 1.0 
0.0 0.0 1.0 1.0 

0.0 0.1 0.0 1.0 
0.7 0.2 0.7 0.0
I1
H1
H2
H3
O1
I1
0.0
0.4

0.5

0.4
 0.1

H 1 H 2 H 3 O1
1.0 1.0 1.0 1.0 
0.0 1.0 0.0 1.0 
0.1 0.0 0.0 1.0 

0.0 0.0 0.0 0.0
0.7 0.5 0.0 0.0
I1
0.0
 0.1

0.5

0.0
0.0

H 1 H 2 H 3 O1
1.0 1.0 0.0 0.0
0.0 0.0 0.0 1.0 
0.0 0.0 1.0 1.0 

0.0 0.1 0.0 1.0 
0.2 0.2 0.7 0.0
Crossover
I1
H1
H2
H3
O1
20
I1
0.0
 0.1

0.5

0.4
0.0

H 1 H 2 H 3 O1
1.0 1.0 1.0 0.0
0.0 1.0 0.0 1.0 
0.1 0.0 0.0 1.0 

0.0 0.0 0.0 0.0
0.2 0.5 0.0 0.0
I1
H1
H2
H3
O1
Mutation
H1
H1
0.4
I1
0.5
H2
I1
0.7
0.5
0.3
H2
H3
H1
H1
0.4
0.7
H2
0.2
0.1
0.1
0.7
O1
0.5
O1
0.7
0.7
Delete Connection
I1
0.2
0.1
0.1
H3
21
O1
Add Connection
0.7
0.1
0.4
0.5
0.2
0.1
H3
I1
0.4
0.7
H2
0.1
H3
0.1
0.2
O1
Anomaly Detection (1)
• 280 system calls in BSM audit data
– 45 frequently occurred calls (indexing as 0~44)
– Indexing remaining calls as 45
exit
fcntl
ioctl
munmap
fork
rename
pipe
seteuid
creat
mkdir
setuid
putmsg
unlink
fchdir
utime
getmsg
chown
open -read
setgid
auditon
access
open - write
mmap
memcntl
stat
open - write,creat
audit
sysinfo
lstat
open - write,trunc
setgroups
close
readlink
open - write,creat,trunc
setpgrp
getaudit
execve
open - read,write
chdir
pathconf
vfork
open - read,write,crea
• 10 input nodes, 15 hidden nodes (Maximum number of hidden
nodes), 2 output nodes
– Normalizing input values between 0 and 1
– Output nodes: Normal and anomaly
22
Anomaly Detection (2)
Output value
• Evaluation value will rise up shortly when intrusion occurs
– Detection of locally continuous anomaly sequence is important
1
0.9
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
Abnormal
normal
1
11
21 31
41
51 61
Time
71 81
91 101 111
– Considering previous values
 t  w1   t 1  w2  o1t  w3  ot2
• Normalizing output values for applying the same threshold to all
neural networks
t  m
'
– m: Average output value for training data, d: std  t 
d
23
Experimental Design
• 1999 DARPA IDEVAL data provided by MIT Lincoln Lab
– Denial of Service, probe, Remove-to-local (R2L), User-to-root
(U2R)
– Main focus: Detection of U2R attack
• Bearing marks of traces in audit data
• Monitoring program’s behavior which has SETUID privilege
– Main target for U2R attack
24
at
rsh
sendmail
deallocate
atq
su
utmp_update
list_devices
atm
uptime
accton
ffbconfig
chkey
w
xlock
ptree
crontab
yppasswd
ff.core
pwait
eject
volcheck
kcms_configure
ssh
fdformat
ct
kcms_calibrate
sulogin
login
nispasswd
mkcookie
admintool
newgrp
top
allocate
sulogin
passwd
quota
mkdevalloc
whodo
ps
ufsdump
mkdevmaps
pt_chmod
rcp
ufsrestore
ping
rlogin
rdist
exrecover
sacadm
Experimental Design (2)
•
1999 IDEVAL : audit data for 5 weeks
– 1, 3 weeks (attack free)  training data
– 4-5 weeks  test data
•
•
Test data includes totally 11 attacks with 4 types of U2R
Name
Description
Times
eject
exploiting buffer overflow in the 'eject' program
2
ffbconfig
exploiting buffer overflow in the 'ffbconifg' program
2
fdformat
exploiting buffer overflow in the 'fdformat' program
3
ps
race condition attack in 'ps' program
4
Setting of genetic algorithm
– Population size: 20, crossover rate: 0.3 mutation rate: 0.08, Maximum
generation:100
– The best individual in the last generation
25
Evolution Results
• Convergence to fitness 0.8 near 100 generations
1
0.9
0.8
0.7
fitness
0.6
average
minimum
maximum
0.5
0.4
0.3
0.2
0.1
0
1
26
12
23
34
45
56
generations
67
78
89
100
Learning Time
• Environments
– Intel Pentium Zeon 2.4GHz Dual processor, 1GB RAM
– Solaris 9 operating system
Hidden
Running
Types
Nodes
Time (sec)
• Data
10
235.5
– Login program
15
263.4
– Totally 1905 sequences
20
454.2
• Parameters
– Learning for 5000epoch
25
482
– Average of 10 runs
MLP
30
603.6
ENN
27
35
700
40
853.6
50
1216
60
1615
15
4460
Detection Rates
• 100% detection rate with 0.7 false alarm per day
1
0.9
Detection Rate
0.8
0.7
0.6
0.5
0.4
0.3
0.2
0.1
0
0
5
10
15
20
False Alarm Per Day
• Elman NN which shows the best performance for the 1999
IDEVAL data : 100% detection rate with 3 false alarms per day

28
Effectiveness of Evolutionary NN for IDS
Results Analysis – Architecture of NN
• The best individual for learning behavior of ps program
– Effective for system call sequence and more complex than
general MLP
29
Comparison of Architectures
• Comparison of the number of connections between ENN learned for
100 generations using ps program data and MLP
• They have the similar number of connections
• However, ENN has different types of connections and sophisticated
architectures
MLP
ENN
FROM╲TO Input Hidden
30
Output
FROM╲TO Input Hidden Output
Input
0
150
0
Input
0
86
15
Hidden
0
0
30
Hidden
0
67
19
Output
0
0
0
Output
0
0
0
Evolving Artificial Neural Networks
for DNA Microarray Analysis
31
Motivation
• Colon cancer : The second only to lung cancer as a cause of
cancer-related mortality in Western countries
• The development of microarray technology has supplied a large
volume of data to many fields
• It has been applied to prediction and diagnosis of cancer, so that it
expectedly helps us to exactly predict and diagnose cancer
• Proposed method
– Feature selection + evolutionary neural network (ENN)
– ENN : no restriction on architecture (design without human’s
prior knowledge)
32
What is Microarray?
• Microarray technology
– Enables the simultaneous analysis of thousands of
sequences of DNA for genetic and genomic research and for
diagnostics
• Two Major Techniques
– Hybridization method
• cDNA microarray/ Oligonucleotide microarray
– Sequencing method
• SAGE
33
Acquiring Gene Expression Data
DNA microarray
Cy3
Image scanner
log
Genes
Cy5
Int(Cy 5)
2 Int(Cy 3)
Gene expressin data
Samples
34
Machine Learning for DNA Microarray
Microarray
Pearson's correlation coefficient
Spearman's correlation coefficient
Euclidean distance
Cosine coefficient
Information gain
Mutual information
Signal to noise ratio
Expression data
Feature selection
3-layered MLP with backpropagation
k-nearest neighbor
Support vector machine
Structure adaptive self-organizing map
Ensemble classifier
Cancer predictor
Tumor
35
Normal
Related Works
Method
Authors
Feature
Classifier
Furey et al.
Signal to noise ratio
SVM
90.3
Li et al.
Genetic algorithm
KNN
94.1
Nearest neighbor
80.6
SVM with quadratic kernel
74.2
AdaBoost
72.6
Logistic discriminant
87.1
Quadratic discriminant
87.1
Logistic discriminant
93.5
Quadratic discriminant
91.9
Ben-Dor et al.
All genes, TNoM score
Principal component analysis
Nguyen et al.
Partial least square
36
Accuracy
(%)
Overview
Microarray data
Generate initial ANNs
Feature selection
Train the ANNs partially
Data separation
Compute the fitness
Rank-based seletion
Apply crossover and mutation
Test
data
Validation
data
Training
data
Generate new generation
Stop?
Yes
Train the ANNs fully
Evaluation
37
No
Colon Cancer Dataset
• Alon’s data
• Colon dataset consists of 62 samples of colon epithelial cells
taken from colon-cancer patients
– 40 of 62 samples are colon cancer samples and the
remaining are normal samples
• Each sample contains 2000 gene expression levels
• Each sample was taken from tumors and normal healthy parts of
the colons of the same patients and measured using high density
oligonucleotide arrays
• Training data: 31 of 62, Test data: 31 of 62
38
Experimental Setup
• Feature size : 30
• Parameters of genetic algorithm
– Population size : 20
– Maximum generation number : 200
– Crossover rate : 0.3
– Mutation rate : 0.1
• Fitness function : recognition rate for validation data
• Learning rate of BP : 0.1
39
Performance Comparison
1
1: EA N N
2: M LP
3: SA SO M
4: SVM (Linear)
5: SVM (R B F)
6: K N N (C osine)
7: K N N (Pearson)
0.94
0.9
A c c u ra c y
0.8
0.71 0.71
0.71
0.7
0.81
0.71 0.74
0.6
1
2
C la s s ifie r
40
3
4
5
6
S1
7
Sensitivity/Specificity
• Sensitivity = 100%
• Specificity = 81.8%
• Cost comparison
– Classifying cancer person as normal person > classifying normal
person as cancer person
EANN
Actual
41
Predicted
0 (Normal) 1 (Cancer)
0 (Normal)
9
2
1 (Cancer)
0
20
Architecture Analysis
Whole architecture
From input to hidden neuron
42
Architecture Analysis (2)
Input to output
Input to output
relationship
is useful to analyze
Hidden neuron to output neuron
Hidden neuron to hidden neuron
43
Download