Internet Worm Detection and Classification with Data Mining

advertisement
Internet Worm Detection and Classification with Data Mining Approaches
Narubordee Sarnsuwan1, Naruemon Wattanapongsakorn1,* and Chalermpol Charnsripinyo2
1
Computer Engineering Department, King Mongkut’s University of Technology Thonburi,
126 Pracha-Utid, Tung-Kru, Bangkok 10140 Thailand,
2
Network Technology Laboratory, National Electronics and Computer Technology Center,
Klong Luang, Pathumthani, 10120 Thailand
*Corresponding author: naruemon@cpe.kmutt.ac.th
Abstract
Presently, trend of many malwares focuses on
network end-point with diverse behaviors. In this
paper, we present techniques to detect and classify
many types of internet worm at network end-point by
using data mining approaches which are Bayesian
network, C4.5 Decision tree and Random forest. We
use port and protocol profiles to train and test our
detection models. Our results show that the detection
rates of classification and detection known worms
are at least 98.5% while the unknown worm detection
rate is about 97% with Decision tree and Random
forest, and 80% with Bayesian network.
Key Words: Internet worm, Data mining, Bayesian
network, Decision tree, Random forest, Network endpoint
1. Introduction
Internet worm is a malicious code or program that
exploits security holes on a network without human
interference. Internet worm is self propagating, and
fast spreading [1]. In 1988, internet worm was
released the first time and over hundred hosts were
infected. After that the threat of internet worm has
been growing and causing more damage to network
systems. To detect the internet worm, many
approaches were proposed and based on signatures
for misuse detection that can’t detect unknown/new
worms. Most commercial programs are normally
based on the signature approach where the worm
signature is available after the attack to the network
system for several days or weeks. Moreover,
signature extraction must be considered by using
expert knowledge. Thus, network anomaly detection
and real times detection was become a great
challenge.
Network anomaly detection requires benign and
infected behavior. This detection can deal with
unknown worm. However, normal behavior of some
applications (e.g., peer to peer protocol) is difficult to
define and handle. So, this approach has high false
detection rate.
Recently, it has been reported [2, 3] that the
security at the network endpoint is more efficient to
set up shield for detecting virus and worm. The
authors in [2] proposed Kullback-Leibler divergence
to find frequency of source and destination ports that
worm uses to propagate. Most worms spread on some
fixed ports that are used by some vulnerable services.
In this paper, we propose techniques to detect and
classify internet worm by using data mining
approaches without feature extraction [4]. We
consider source ports and destination ports as well as
some protocols that worm attempts to propagate [2].
Source and destination ports and protocols of normal
and worm behavior are trained on data mining
models. Moreover, known worm and unknown worm
are classified by these models with high detection
rate and low false alarm rate. Our detection approach
focuses on detection at the network end point with
detection rate over 99% and false alarm rate around
1% on known worm and over 94% detection rate
with false alarm rate close to 0% on unknown worm
without feature extraction and K-L divergence.
The remainder of this paper is organized as
follows. In section 2, we present related approaches
of internet worm detection. In section 3, we describe
information and characteristics of internet worm that
are used in our approach. In section 4, we give details
of our approach to detect internet worm. Then, we
provide results of our approach and conclusion in
section 5 and 6, respectively.
2. Related Work
Misuse detection is a type of network intrusion
detection based on signature such as Snort and Bro
[5]. Misuse detection must have information of
signature or pattern of each attack before it can
classify data and detect the attack. Thus, the misuse
approach can’t detect unknown/new attack because it
has no information and experience of new attack.
Internet worm can propagate very fast on a network
before a human expert can extract the signature.
Therefore, misuse detection is not sufficient to detect
the problem.
Unlike misuse detection, anomaly detection
requires normal network behavior to associate and
compare with anomaly behavior of the network
traffic. This detection can detect unknown/new-type
of worm. However, it is complicate to differentiate
normal behavior and abnormal behavior of network
traffic. Some benign traffic may be falsely classified
as attack. Examples of network activities that are
difficult to classify are peer-to-peer protocols and
some media applications [2].
Khayam et al. [2] proposed
K-L
divergence
measures to characterize perturbation in source and
destination ports that worms attempt to propagate.
They observed abnormal behavior from results of KL divergence and trained their model with only fixed
ports on Support Vector Machine (SVM) using
random multiple instances of benign traffic profile.
They obtained good results with over 90% of
detection rate for every end point and with false
alarm rate close to 0. However, they used fixed ports
from the results of K-L divergence to train and test
on SVM.
In [3], it proposed trend of attacks that focus on
network endpoint and showed efficient solutions to
detect malware on network end-point.
Data mining approach is a good choice that can
detect unknown internet worm. Siddigui et al. [6]
proposed solutions to detect unknown worm by
extracting features from cleaned program and
infected program. They built a data mining model by
train these features and presented results of unknown
malware detection with overall accuracy around
95.6% and false positive rate around 3.8%.
Other data mining approaches [7] were used to
extract features of malware behavior and build a
model by training with these features. They presented
comparison of various data mining techniques to
detect unknown worm. The results showed that the
average detection rate is over 90% and the false
alarm rate is around 6.67% with n gram extraction.
Although the feature extraction gave high detection
rate, in some cases [4] of feature extraction as n
gram, it took long time and consumed high memory
to process. For example, only n gram extraction with
n equal 1, 2 and 3 for integer array, it consumed
memory of 1KB, 256KB and 64MB, respectively.
Moreover, this process did not include memory
consumption to build the model.
3. Worm Lists
In this section, we describe information of various
worms that are used in our experiments. Several
characteristics of each worm including Port profiles,
and rate of scan per second used by worm to infect
new hosts, are shown in Table 1.
Table 1. Worm Characteristics
Worm
Scan per
second
Port
CodeRedII
4.95
TCP 80
Zotob.G
39.34
TCP 135,445,UDP 137
SoBig.E
21.57
TCP 135,UDP 53
Sdbot-AFR
28.26
TCP 445
Rbot-AQJ
0.68
TCP 139,769
Rbot.CCC
9.7
TCP 139,445
Forbot-FU
Blaster
32.53
10.5
TCP 445
TCP 135,444 UDP 69
Code red II uses a buffer overflow to exploit
vulnerability on Microsoft IIS web servers. After the
worm propagates itself to any host, it sends DOS
attack and provides backdoors to attackers. Then, this
worm will find new hosts to infect with port 80 on
TCP.
Sobig.E worm is attached with email or spam
mail
from
bil@Microsoft.com
and
support@yahoo.com. If any user opens this file, the
worm will start its process. This worm spreads to
other hosts with port 135 on TCP protocol and port
53 on UDP protocol.
Zotob.G exploits buffer over flow vulnerability
on MS Windows Plug and Play and provides
backdoors to attackers with ports 135, 445 on TCP
protocol and port 137 on UDP protocol.
Rbot.AQJ worm provides backdoors and allows
attackers to remotely access on the vulnerable
computer via IRC channels on Windows platform
with ports 139 and 769 on TCP protocol.
Rbot.CCC worm also provides backdoors and
allows attackers to remotely access on the vulnerable
computer via IRC channels on Windows platform.
However, this worm propagates itself with ports 139
and 445.
Forbot-FU propagates itself to other hosts with
Trojan/Optix on Windows. This worm exploits buffer
overflow vulnerability of Windows and provides
backdoor to attackers with port 445 on TCP protocol.
Blaster worm exploits a buffer overflow
vulnerability of DCOM RPC on Windows XP and
Windows 2000 by connecting to ports 135 and 4444
on TCP protocol and port 69 on UDP protocol. This
worm can download and operate itself. After that, the
worm sends DOS attacks to prevent patch update by
sending SYN flood to the destination port 80.
Sdbot-AFR worm exploits a buffer overflow
vulnerability of Windows and provides a backdoor to
attackers with port 445 on TCP protocol. Unlike
Forbot-FU worm, this worm has a higher rate of scan
per second.
4. Data Sets
We use input datasets from [8] which were
collected from 13 different network endpoints. The
datasets were collected over 12-months period. Each
network end-point has different behavior from each
other. Each end host was installed with actual worm
(i.e., Zotob.G, Forbot-FU, Sdbot-AFR, Blaster,
Rbot.CCC and Rbot.AQJ) and simulated worm (i.e.,
CodeRedII). The datasets were collected by using
“argus” program. Each instance of dataset has 7
attributes as follows
Session id: 20-byte SHA-1 hash of the concatenated
hostname and remote IP address
Direction: one byte flag indicating as outgoing
unicast, incoming unicast, outgoing broadcast or
incoming broadcast packets
Protocol: transport-layer protocol of the packet
Source port: source port of the packet
Destination port: destination port of the packet
Timestamp: millisecond-resolution time of session
initiation in UNIX time format
Virtual key code: one byte virtual key code that
identifies the data if it is normal data or worm.
The datasets from [8] were separated into several
categories in terms of normal data and type of worm.
In addition, datasets from each end point were
collected into different groups, for example, 13 end
points have 13 groups. Example of datasets is shown
in Table 2.
Session ID
Direction
Protocol
Src Port
Des port
Time
Stamp
Key code
Table 2. Example of a data profile from [8]
Sha-1 code
4
6
8704
60419
1145362465
0
Sha-1 code
3
6
80
1113
1148139379
d9
From the table, the direction column has one byte
flag represented by an integer where 1 represents
“incoming broadcast packets”, 2 represents “outgoing
broadcast”, 3 represents “outgoing broadcast” and 4
represents “incoming unicast”. The protocol column
represents transport-layer protocols using an integer
such as 6 represents “TCP” and 17 represents “UDP”.
The Key code column is one byte virtual key code
that identifies the data types such as “d9” represents
worm behavior and others represent normal behavior.
We select datasets from [8] because these datasets
were collected from the network end points such as
homes, offices and universities. Thus, the datasets
have various behaviors. In addition, some end points
run peer to peer applications.
There are many port numbers used in normal
class data such as port numbers 22, 53, 80, 123, 135,
137, 138, 443, 445, 993 and 995 which are known
ports (0:1023) and registered ports (1024:65535) for
specific applications such as on-line Games and peer
to peer applications. Moreover some well known
ports are used by privilege users assigned by Internet
Corporation for Assigned Names and Numbers
(ICANN), while the registered ports are not assigned
by the ICANN.We downloaded the datasets from this
website [8] and selected only Direction, Protocol, Src
Port, Des Port, Time Stamp and Key Code columns
to be used as our datasets. In addition, we changed
the worm key code from “d9” to “Worm” and change
the normal data key code to “Normal”, accordingly.
We collected datasets for 3 cases of experiments
by sampling all datasets from the 13 end-points. We
explain the preprocessed datasets of each case next.
First case:
We make training and testing datasets by
inserting normal data and all types of worm (i.e.,
Zotob.G, CodeRedII, Blaster, Rbot.CCC, Rbot.AQJ,
Sdbot, Sobig and Forbot-FU). In training dataset,
there are 750 normal profiles and 150 profiles for
each type of worms. With 8 types of worms, the total
number of worm profiles used for training is 1200
profiles. The testing dataset has 1750 normal profiles
and 350 profiles for each type of worms. We consider
9 output classes. The classification model is
presented in Section 5.
Second case:
We insert 750 normal profiles and 1200 worm
profiles for the training dataset and 1750 normal
profiles and 2800 worm profiles for the testing
dataset. In addition, worm class consists of all types
of worm profiles (i.e., Zotob.G, CodeRedII, Blaster,
Rbot.CCC, Rbot.AQJ, Sdbot-AFR, SoBig.E and
Forbot-FU profiles). We combine these types of
worm into worm class. There are totally 2 classes in
these datasets as normal and worm class.
Third case:
We make 2 datasets. The training dataset is
composed of 750 normal profiles and 1200 worm
profiles by sampling 7 worm types, except one worm
which is used as unknown worm. The testing dataset
is composed of 1700 normal profiles and 2800 worm
profiles. We add one unknown/untrained worm-type
profiles into the testing dataset (resulting to the total
of 8 types of worms). There are 2 output classes
which are normal and worm. In this case, we perform
8 experiments where one unknown worm-type is
considered at a time. For example, in the first
experiment, we make a training dataset without
Blaster worm. Then we add the Blaster worm into the
testing dataset. In the next experiment, a different
worm-type is excluded from the training dataset but
is included for testing. After completing these 8
experiments, we find their average detection results.
The details of the datasets used in our
experiments are shown in Table 3.
Table 3.
Case
Training and testing datasets
Training Data
Testing Data
Output
Normal
Worm
Normal
Worm
1
750
150 x 8
1750
350 x 8
9 Classes
2
750
1200
1750
2800
2 Classes
(8 types)
3
(8 exp)
750
1200
(7 types)
2800
2 Classes
(8 types)
5. Worm Detection Models
We propose to compare three different data
mining models which are C4.5 Decision tree,
Random forest and Bayesian network for Internet
worm detection and classification. These models are
well known for data mining. We train these data
mining models to detect and classify worm by using
port and protocol profiles as shown in Figure 1.
Figure 1.
Random forest is an effective tool for
classification because it can deal with over-fitting of
large dataset and is also fast for large dataset with
many features. In addition, the Random forest is
robust with noise. A tree is built from learning
sampling dataset with replacement; about one third of
this dataset was not used to train. This model can
evaluate importance factors used in classification and
un-pruned rules that are created and evaluated by the
training dataset.
5.3. Bayesian Network
(8 types)
1750
5.2. Random forest
Bayesian network is a combination of graphical
model and probabilistic model. A Bayesian network
has several nodes or states that are correlated with
probability. The Bayesian network learns casual
relation from the training dataset to predict or classify
unknown instances. Moreover, it can avoid overfitting with large data.
We provide solutions to classify various types of
worms in the first case, to distinguish known worms
from normal data in the second case, and to detect
unknown worms in the third case.
First case:
To show the detection and classification
accuracy, we use a training dataset with 8 types of
worms and normal data to train our model and
classify the data type as shown in figure 2.
Our worm detection model
5.1. C4.5 Decision tree
The C4.5 Decision tree is an algorithm that builds
tree by using a divide-and-conquer algorithm. A
Decision tree is approximated with discrete dataset
and can avoid over-fitting on large dataset. The
Decision tree is produced by a training/learning
dataset and built from rules that are created during
the training. These rules are used to predict and
classify sample/ or later datasets. To classify an
unknown instance, the Decision tree will start at the
root and traverse to a leave node. The result of
classification and prediction occurs at the leave node.
Figure 2.
Case 1: normal and worm classification
Second case:
To identify known worms and normal data, we
use a combination of 8 worms and 1 normal dataset
to train our model and classify data into normal or
worm as shown in figure 3.
Third case:
To detect unknown worms, we use 2 datasets to
test and train our model in this case. The training
dataset and testing dataset each has 2 classes which
are normal and worm in the class attributes.
The training dataset consists of 7 types of worms
and the testing dataset consists of 8 types of worms.
We train all models with same training dataset and
test these models with the same testing dataset.
worms that we consider in case 3, showing that our
algorithms can detect unknown worms with overall
detection rate over 91%. In particular, the Decision
tree can detect unknown worms with the averaged
detection rate over 97%, while the Bayesian network
and Random forest have average detection rates from
all experiments over 96% and 80%, respectively.
Table 5.
Figure 3.
Cases 2 & 3: worm detection
6. Experimental Results
In our experiments, the performance of each
detection and classification model is measured and
compared by using the detection rates which are True
Positive (TP), False Positive (FP), True Negative
(TN), and False Negative (FN).
The performance parameters (TP, FP, TN, and
FN) are defined in Table 4 and described as follows:
• True Positive (TP): an algorithm classifies
Worm according to the actual data (Worm)
• False Positive (FP): an algorithm classifies
Worm opposite from the actual data
(Normal)
• True Negative (TN): an algorithm classifies
Normal according to the actual data (Normal)
• False Negative (FN): an algorithm classifies
Normal opposite from the actual data
(Worm)
TP + TN
The overall accuracy =
TP + FP + TN + FN
To calculate the detection rate, let NWorm be the
total number of Worm profiles and NNormal be the
total number of Normal profiles in the testing dataset.
Thus, we have
The detection rate = NWorm × TP + N Normal × TN
N Worm + N Normal
Table 4.
Performance parameters for classification
Actual
Worm
Normal
Worm
TP: True Positive
FP: False Positive
Normal
FN: False
Negative
TN: True
Negative
Result
From experiments, our models can classify and
detect known worms and unknown worms with high
detection rates without feature extraction and without
using some fixed ports from K-L divergence. The
results from cases 1 & 2 are shown in tables 5 and 6,
where the detection rates of all models are over 98%.
Table 7 presents the detection results with unknown
Results of case 1 (worm classification)
Detection
rate (%)
TP
(%)
TN
(%)
FP
(%)
FN
(%)
Bayesian
Network
98.50
98.1
99.0
1.0
1.9
Decision
tree C4.5
99.00
99.2
99.1
0.9
0.8
Random
forest
99.50
99.6
99.2
0.8
0.4
Model
Table 6.
Results of case 2 (known worm detection)
Detection
rate (%)
TP
(%)
TN
(%)
FP
(%)
FN
(%)
Bayesian
Network
99.2
99.1
99.4
0.6
0.9
Decision
tree C4.5
98.7
98.4
98.9
1.1
1.6
Random
forest
99.0
98.4
99.3
0.7
1.6
Model
Comparing the three approaches which are
Bayesian network, Decision tree and Random forest,
in Case 1, the detection rate of Random forest is the
best while in Cases 2, the detection rate of Bayesian
Network is the highest. In Case 3, the detection rate
of Decision tree is the highest. In this last case, there
is a problem in experimenting with Sdbot-AFR and
Zotob.G because these worms have port profiles
similar to those in the normal profiles. Bayesian
Network can’t handle this problem but Decision tree
and Random forest can do. Moreover, some other
worms have port profiles similar to those in the
Normal port profiles but all models can handle this
issue, as shown in Table 7. Essentially, Table 7
shows detection rate of each model with 8
experiments. These models except Bayesian Network
not only can classify many types of worms, but also
have the capability to detect unknown worms by
using port and protocol profiles on network endpoints. In our experiments, the algorithms spent less
than 1 second and consumed about 5 MB of memory
during the training and testing each model. This is
because our models use a few attributes and do not
use feature extraction to build the models.
7. Conclusion
In this paper, we present three simple and
efficient data mining techniques which are Bayesian
Network, C4.5 Decision tree and Random forest for
Internet worm detection and classification. We also
compare the performance of each data mining model
in various cases. We found that our models except
the Bayesian Network model are efficient to detect
unknown worm. They are good candidates for realtime worm detection because the models can be built
Table 7.
Zotob.G
SoBig.E
Sdbot-AFR
Rbot-AQJ
Rbot.CCC
Forbot-FU
Blaster
Average
Results of case 3 (unknown worm detection)
Code RedII
Detection rate
(%)
quickly while consuming low memory and use only
port and protocol profiles for training and testing
data. In particular, the Decision tree model is suitable
in all cases that we investigate, while other models
are suitable in some cases. Some worms have port
profiles similar to those in the normal data profiles
that may cause difficulty for worm detection. From
our experiments, the Bayesian network model is not
suitable for unknown worm detection.
Bayesian Network
96.4
40.3
95.8
38.3
94.3
99.2
98.7
81.3
80.5
Decision tree C4.5
99.3
98.6
98.8
99.0
94.4
93.8
98.8
98.2
97.6
Random forest
86.5
94.5
97.9
99.5
98.8
99.3
98.9
98.7
96.8
Average
94.1
77.8
97.5
78.9
95.8
97.4
98.8
92.7
91.6
Model
8. References
[1] N. Weaver, V. Paxson, S. Staniford and R.
Cunningham, "Taxonomy of computer worms," Proc of
the ACM workshop on Rapid malcode, WORM03,
2003, pp. 11-18
[2] S. A. Khayam, H. Radha and D. Loguinov, "Worm
Detection at Network Endpoints Using InformationTheoretic Traffic Perturbations", IEEE Inter Conf on
Communications (ICC), 2008, pp. 1561-1565.
[3] Symantec Internet Security Threat Report XI – Trends
for July – December 07,” 2007.
[4] O. Sharma, M. Girolami and J. Sventek, "Detecting
Worms Variants Using Machine Learning", Proc of the
ACM CoNEXT conference, 2007
[5] C. Smith, A. Matrawy, S. Chow and B. Abdelaziz,
"Computer Worms: Architecture, Evasion Strategies,
and Detection Mechanisms," J. of Information
Assurance and Security, 2009, pp. 69-83
[6] M. Siddiqui, M. C. Wang and J, Lee, "Detecting
Internet Worms Using Data Mining Techniques",
Cybernetics and Information Technologies, Systems
and Applications: CITSA, 2008.
[7] X. Wang, W. Yu, A. Champion, X. Fu and D. Xuan,
"Detecting Worms via Mining Dynamic Program
Execution", Security and Privacy in Communications
Networks and the Workshops 2007, Nice, France, June
24, 2008
[8] Wireless and Secure Networks (WiSNet) Research Lab
at the NUST School of Electrical Engineering and
Computer Science (SEECS), http://wisnet.seecs.edu.pk/
Download