A Behavior-based Methodology for Malware Detection

advertisement
Malware Detection Based on Malicious
Behaviors Using Artificial Neural Network
Student: Hsun-Yi Tsai
Advisor: Dr. Kuo-Chen Wang
2012/05/28
Copyright © 2012, MBL@CS.NCTU
Outline
•
•
•
•
Introduction
Problem Statement
Related Work
Design Approach
–
–
–
–
–
Sandboxes
Behaviors
Proposed Algorithm
Weight Training
Malicious Degree
• Evaluation
• Conclusion and Future Works
• References
2
Copyright © 2012, MBL@CS.NCTU
Introduction
• In recent years, malware has been severe threats to
the cyber security
– Virus, Worms, Trojan horse, Botnet …
• Traditional signature-based malware detection
algorithms [15] [17]
• Drawbacks of signature-based malware detection
algorithms
–
–
–
–
–
Need human and time to approve
Need to update the malicious digest frequently
Easily bypassed by obfuscation methods
Can not detect zero day malware
Increase false negative rate
3
Copyright © 2012, MBL@CS.NCTU
Introduction (Cont.)
• To conquer the shortcomings of the signaturebased malware detection algorithms, behaviorbased malware detection algorithms were
proposed
• Behavior-based malware detection algorithms [14]
[19]
– Detect the unknown malware or the variations of known malware
– Decrease false negative rate (FNR)
– Increase false positive rate (FPR)
• To decrease the FPR, we proposed a behavioral
neural network-based malware detection algorithm
4
Copyright © 2012, MBL@CS.NCTU
Problem Statement
• Given
– Several sandboxes
– l known malware Mi = {M1,M2, …, Ml} for training
– m known malware Sj = {S1, S2, …, Sm} for testing
• Objective
– n behaviors Bk = {B1,B2, …, Bn}
– n weights Wk = {W1,W2, …, Wn}
– MD (Malicious degree)
5
Copyright © 2012, MBL@CS.NCTU
Related Work
• MBF [14]
– File, process, network, and registry actions
– 16 malicious behavior feature (MBF)
– Three malicious degree: high, warning, and low
• RADUX [19]
– Reverse Analysis for Detecting Unsafe eXecution (RADUX)
– Collected 9 common malicious behaviors
– Bayes’ theorem
6
Copyright © 2012, MBL@CS.NCTU
Related Work (Cont.)
Approach
MBF [14]
RADUX [19]
Our Scheme
Main idea
Analyze behavior
features
Analyze API calls
Analyze malicious behaviors
Number of
malicious
behaviors
16
9
13
Calculating of
malicious degree
Non-weighted algorithm
Weighted algorithm
Weighted algorithm
Adjusting of
weights
None
Bayes’ theorem
Artificial neural network (ANN)
False positive rate
Low
High
Low
False negative rate
Not Available
High
Low
Accuracy rate
High
Low
High
7
Copyright © 2012, MBL@CS.NCTU
Background - Sandboxes
•
•
•
•
Dynamic analysis system
Isolated environment
Interact with malware
Record runtime behaviors
8
Copyright © 2012, MBL@CS.NCTU
Background - Sandboxes (Cont.)
• Web-based sandboxes
– GFI Sandbox [1]
– Norman Sandbox [2]
– Anubis Sandbox [3]
• PC-based sandboxes
– Avast Sandbox [4]
– Buster Sandbox Analyzer [5]
9
Copyright © 2012, MBL@CS.NCTU
Design Approach-Behaviors
• Malware Host Behaviors
–
–
–
–
–
–
–
–
–
–
–
–
–
Creates Mutex
Creates Hidden File
Starts EXE in System
Checks for Debugger
Starts EXE in Documents
Windows/Run Registry Key Set
Hooks Keyboard
Modifies Files in System
Deletes Original Sample
More than 5 Processes
Opens Physical Memory
Deletes Files in System
Auto Start
• Malware Network Behaviors
– Makes Network Connections
•
•
•
DNS Query
HTTP Connection
File Download
10
Copyright © 2012, MBL@CS.NCTU
Design Approach-Behaviors (Cont.)
GFI [1]
Norman [2]
Anubis [3]
Creates Mutex
V
V
V
Creates Hidden File
V
V
V
Starts EXE in System
V
V
V
Checks for Debugger
V
Starts EXE in Documents
V
Windows/Run Registry Key Set
V
V
V
Hooks Keyboard
V
Modifies File in System
V
Deletes Original Sample
V
More than 5 Processes
V
Opens Physical Memory
V
Delete File in System
V
Avast [4]
BSA [5]
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
V
Auto Start
V
V
DNS Query
V
V
V
HTTP Connection
V
V
V
File Download
V
V
V
V
11
Copyright © 2012, MBL@CS.NCTU
Design Approach-Behaviors (Cont.)
Ulrich Bayer et al. [10]
12
Copyright © 2012, MBL@CS.NCTU
Design Approach-Proposed Algorithm
13
Copyright © 2012, MBL@CS.NCTU
Design Approach – Weight Training
• Using Artificial Neural Network (ANN) to train
weights
14
Copyright © 2012, MBL@CS.NCTU
Design Approach – Weight Training (Cont.)
• Neuron for ANN hidden layer
13
ω𝑖,1 π‘₯𝑖 − 𝑏1 = 𝑛1
𝑖=1
𝑓
1
𝑛1
𝑒 𝑛1 − 𝑒 −𝑛1
= 𝑛
= π‘Ž1
𝑒 1 + 𝑒 −𝑛1
15
Copyright © 2012, MBL@CS.NCTU
Design Approach – Weight Training (Cont.)
• Neuron for ANN output layer
10
ω𝑖 ′π‘Žπ‘– − 𝑏′ = 𝑛′
𝑓
𝑖=1
2
𝑛′
𝑒 𝑛′ − 𝑒 −𝑛′
= 𝑛′
𝑒 + 𝑒 −𝑛′
16
Copyright © 2012, MBL@CS.NCTU
Design Approach – Weight Training (Cont.)
• Delta learning process
Mean square error: E ο€½
Weight set:
1
(d ο€­ O) 2
2
d: expected target value
 ο€½ {i , j | 1 ο‚£ i ο‚£ 13, 1 ο‚£ j ο‚£ 10} οƒˆ {k ' | 1 ο‚£ k ο‚£ 10}
 οƒŽ W ,
ο‚ΆE 
 ο€½ 
x , : learning factor; x: input value

new ο€½ old ο€­ 
17
Copyright © 2012, MBL@CS.NCTU
Design Approach-Malicious Degree
• Malicious Degree
– Malicious behaviors: 𝑿 = {π’™π’Š | 𝟏 ≤ π’Š ≤ πŸπŸ‘}
– Weights: 𝝎 = πŽπ’Š,𝒋 𝟏 ≤ π’Š ≤ πŸπŸ‘, 𝟏 ≤ 𝒋 ≤ 𝟏𝟎 ∪ 𝝎′π’Œ 𝟏 ≤ π’Œ ≤ 𝟏𝟎
– Bias: 𝑩 = 𝒃𝒋 𝟏 ≤ 𝒋 ≤ 𝟏𝟎 ∪ {𝒃′}
– Transfer function:𝒇 𝒏 =
– 𝑴𝑫 = 𝒇
𝟏𝟎
′
𝝎
𝒋=𝟏 𝒋
𝒆𝒏 −𝒆−𝒏
𝒆𝒏 +𝒆−𝒏
×𝒇
πŸπŸ‘
π’Š=𝟏 πŽπ’Š,𝒋 π’™π’Š
− 𝒃𝒋 − 𝒃′
18
Copyright © 2012, MBL@CS.NCTU
Evaluation
• Try to find the optimal MD value to make FPR and
FNR approximate to 0.
MD Threshold
Benign
Ambiguous
Malicious
19
Copyright © 2012, MBL@CS.NCTU
Evaluation (Cont.)
•
•
•
•
Matlab 7.11.0
Initial weights and bias: random by function initnw
Transfer function: tangent-sigmoid function
Architecture of ANN (Matlab 7.11.0):
20
Copyright © 2012, MBL@CS.NCTU
Evaluation (Cont.)
• Malicious sample source: Blast’s Security [6] and
VX Heaven [7] websites
• Benign sample source: Portable execution files
under windows XP SP2
• Training data and testing data
Malicious
Benign
Total
Training
500
500
1000
Testing
500
500
1000
21
Copyright © 2012, MBL@CS.NCTU
Evaluation (Cont.)
• Mean square error: 0.19
• Execution time: 2 seconds
• MD threshold (according to training data)
400
350
Nmber of samples
300
250
200
Range of threshold
150
100
50
0
0
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95
1
Malicious Degree
22
Copyright © 2012, MBL@CS.NCTU
0.35
0.36
0.37
0.38
0.39
0.4
0.41
0.42
0.43
0.44
0.45
0.46
0.47
0.48
0.49
0.5
0.51
0.52
0.53
0.54
0.55
0.56
0.57
0.58
0.59
0.6
0.61
0.62
0.63
0.64
0.65
0.66
0.67
0.68
0.69
0.7
0.71
0.72
0.73
0.74
0.75
0.76
0.77
0.78
0.79
0.8
False Rate (%)
Evaluation (Cont.)
• Choose threshold
4.5
4
3.5
3
2.5
2
1.5
False Positive Rate
1
False Negative Rate
0.5
0
Malicious Degree
23
Copyright © 2012, MBL@CS.NCTU
Evaluation (Cont.)
• Experiment results
TP
TN
FP
FN
FPR
FNR
Accuracy
483
494
6
17
1.2%
96.6%
97.7%
400
Number of Samples
350
MD Threshold = 0.44
300
Benign
Samples
250
200
Malicious
Samples
150
100
50
0
0
0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95
1
Malicious Degree
24
Copyright © 2012, MBL@CS.NCTU
Evaluation (Cont.)
TP / (TP + FN)
FN / (TP + FN)
FP / (FP + TN)
TN / (FP + TN)
96.6%
3.4%
1.2%
98.8%
Not Available
Not Available
2.13%
97.87%
95.6%
4.4%
9.8%
90.2%
Approach
Our Scheme
MBF [14]
RADUX [19]
25
Copyright © 2012, MBL@CS.NCTU
Evaluation (Cont.)
Weights
Accuracy
Rate
Weights in
Hidden Layer
Weights in
Output Layer
Random
Random
98.8%
Frequency
Random
98%
1
1
92.42%
0.5
0.5
91%
Without ANN
None
91.36%
26
Copyright © 2012, MBL@CS.NCTU
Conclusion and Future Work
• Conclusion
–
–
–
–
Collect several common behaviors of malwares
Compose Malicious Degree (MD) formula
The false positive rate and false negative rate is approximated to 0
Detect unknown malware
• Future work
–
–
–
–
Automate the system
Implement PC-based sandboxes
Add more malware network behaviors
Classify malwares according to their typical behaviors
27
Copyright © 2012, MBL@CS.NCTU
References
[1] GFI Sandbox. http://www.gfi.com/malware-analysis-tool
[2] Norman Sandbox. http://www.norman.com/security_center/security_tools
[3] Anubis Sandbox. http://anubis.iseclab.org/
[4] Avast Sandbox. http://www.avast.com/zh-cn/index
[5] Buster Sandbox Analyzer (BSA). http://bsa.isoftware.nl/
[6] Blast's Security. http://www.sacour.cn
[7] VX heaven. http://vx.netlux.org/vl.php
[8] Neural Network Toolbox. http://dali.feld.cvut.cz/ucebna/matlab/toolbox/nnet/initnw.html
[9] “A malware tool chain: active collection, detection, and analysis,” NBL, National Chiao Tung University.
[10] U. Bayer, I. Habibi, D. Balzarotti, E. Krida, and C. Kruege, “A view on current malware behaviors,”
Proceedings of the 2nd USENIX Workshop on Large-Scale Exploits and Emergent Threats : botnets,
spyware, worms, and more, pp. 1 - 11, Apr. 22-24, 2009.
[11] U. Bayer, C. Kruegel, and E. Kirda, “TTAnalyze: a tool for analyzing malware,” Proceedings of 15th
European Institute for Computer Antivirus Research, Apr. 2006.
[12] M. Egele, C. Kruegel, E. Kirda, H. Yin, and D. Song, “Dynamic spyware analysis,” Proceedings of
USENIX Annual Technical Conference, pp. 233 - 246, Jun. 2007.
[13] H. J. Li, C. W. Tien, C. W. Tien, C. H. Lin, H. M. Lee, and A. B. Jeng, "AOS: An optimized sandbox
method used in behavior-based malware detection," Proceedings of Machine Learning and Cybernetics
(ICMLC), Vol. 1, pp. 404-409, Jul. 10-13, 2011.
28
Copyright © 2012, MBL@CS.NCTU
References (Cont.)
[14] W. Liu, P. Ren, K. Liu, and H. X. Duan, “Behavior-based malware analysis and detection,”
Proceedings of Complexity and Data Mining (IWCDM), pp. 39 - 42, Sep. 24-28, 2011.
[15] C. Mihai and J. Somesh, “Static analysis of executables to detect malicious patterns,”
Proceedings of the 12th conference on USENIX Security Symposium, Vol. 12, pp. 169 - 186, Dec.
10-12, 2006.
[16] A. Moser, C. Kruegel, and E. Kirda, “Exploring multiple execution paths for malware analysis,”
Proceedings of 2007 IEEE Symposium on Security and Privacy, pp. 231 - 245, May 20-23, 2007.
[17] J. Rabek, R. Khazan, S. Lewandowskia, and R. Cunningham, “Detection of injected,
dynamically generated, and ob-fuscated malicious code,” Proceedings of the 2003 ACM workshop
on Rapid malcode, pp. 76 - 82, Oct. 27-30, 2003.
[18] K. Rieck, T. Holz, C. Willems, P. Dussel, and P. Laskov, “Learning and Classification of
Malware Behavior,” in Detection of Intrusions and Malware, and Vulnerability Assessment, Vol.
5137, pp. 108-125, Oct. 9, 2008.
[19] C. Wang, J. Pang, R. Zhao, W. Fu, and X. Liu, “Malware detection based on suspicious
behavior identification,” Proceedings of Education Technology and Computer Science, Vol. 2, pp.
198 - 202, Mar. 7-8, 2009.
[20] C. Willems, T. Holz, and F. Freiling. “Toward automated dynamic malware analysis using
CWSandbox,” IEEE Security and Privacy, Vol. 5, No. 2, pp. 32 - 39, May. 20-23, 2007.
[21] Y. Zhang, J. Pang, R. Zhao, and Z. Guo,"Artificial neural network for decision of software
maliciousness," Proceedings of Intelligent Computing and Intelligent Systems (ICIS), Vol. 2, pp.
622 - 625, Oct. 29-31, 2010.
29
Copyright © 2012, MBL@CS.NCTU
Download