Reporter: 林佳宜
Email: M98570015@mail.ntou.edu.tw
2020/4/11
11
Learning to Detect and Classify Malicious
Executables in the Wild. J. Zico Kolter,
Marcus A. Maloof, JMLR 2006.
2
Introduction
Classification Methodology
Experimental Design
Experimental Results
Conclusion
3
Malicious code can
cause harm or subvert the system’s intended function
Malicious executables have three categories
viruses, worms, and Trojan horses.
Describe the use of machine learning and data mining
detect and classify malicious executables
4
Detect and classify malicious executables
Use text classification
Present empirical results
from an extensive study of inductive methods for detecting and classifying malicious executables
Show that the methods achieve high detection rates
even on completely new, previously unseen malicious executables
5
Implemented in the Wakaito Environment for Knowledge Acquisition (WEKA)
IBk
naive Bayes
support vector machine (SVM)
J48
Used the AdaBoost.M1 algorithm
boost SVMs, J48, naive Bayes
6
Gathered this collection early of 2003
◦ Benign executables
1971
from Windows 2000 and XP operating systems
SourceForge
download.com
◦ Malicious executables
1651
from Web site VX Heavens
MITRE Corporation, the sponsors of this project
Recently,obtained 291 malicious executables
from VX Heavens
7
To evaluate the approach and methods
stratified ten-fold cross-validation
randomly partitioned the executables into ten disjoint sets of equal size
one as a testing set
nine to form a training set
Extracted n-grams from the executables in the training and testing sets
Selected the most relevant features from the training data
To conduct ROC analysis, for each method
8
Learning methods detected malicious executables
three experimental studies
The first was a pilot study to determine the
size of words and n-grams
the number of n-grams relevant for prediction
The second experiment consisted of applying all of the classification methods to
a small collection of executables
The third then involved applying the methodology to
a larger collection of executables
9
Pilot studies to determine three parameters
the size of n-grams
the size of words,
the number of selected features
Extracted bytes from
476 malicious executables, 561 benign executables
produced n-grams, for n = 4
Selected the best 10, 20, . . . , 100, 200, . . . ,
1000, 2000, . . . , 10000 n-grams,
Selecting 500 n-grams produced the best results
10
Fixed the number of n-grams
at 500
varied n, the size of the n-grams
Evaluated the same methods for n=1,2,....,10
n = 4 produced the best results
Varied the size of the words (one byte, two bytes, etc.)
single bytes produced better results
11
Form training examples
used the n-grams extracted from the executables
by viewing each n-gram as a Boolean attribute
Selected the most relevant attributes by
computing the information gain (IG) for each:
Selected the top 500 n-grams
12
Executables produced 68744909 distinct n- grams
Areas under these curves (AUC) with 95% confidence intervals
the boosted methods performed well
Naive Bayes did not perform as well
13
14
15
This collection consisted of
1971 benign executables
1651 malicious executables
over 255 million distinct n-grams of size four
The areas under these curves with 95% confidence intervals
boosted J48 outperformed all other methods
16
17
18
Classify malicious executables based on
function of their payload
present results for three functional categories
opened a backdoor 、 mass-mailed 、 executable virus
Reduce the previously undiscovered malicious executables
19
20
21
Compare the actual detection rates
larger collection VS the 291 new malicious
Selected three desired false-positive rates
0.01, 0.05, 0.1
Detected about 98% of the new malicious executables
boosted J48
false-positive rate of 0.05
22
23
Detecting and classifying unknown malicious executables by
machine learning, data mining, text classification
Detecting malicious executables
boosted J48 produced the best detector with an area under the ROC curve of 0.996
Classify malicious executables based on payload’s function
boosted J48 produced the best detectors with areas under the ROC curve around 0.9
24
25