Malware Detection Based on Malicious Behaviors Using Artificial Neural Network Student: Hsun-Yi Tsai Advisor: Dr. Kuo-Chen Wang 2012/05/28 Copyright © 2012, MBL@CS.NCTU Outline • • • • Introduction Problem Statement Related Work Design Approach – – – – – Sandboxes Behaviors Proposed Algorithm Weight Training Malicious Degree • Evaluation • Conclusion and Future Works • References 2 Copyright © 2012, MBL@CS.NCTU Introduction • In recent years, malware has been severe threats to the cyber security – Virus, Worms, Trojan horse, Botnet … • Traditional signature-based malware detection algorithms [15] [17] • Drawbacks of signature-based malware detection algorithms – – – – – Need human and time to approve Need to update the malicious digest frequently Easily bypassed by obfuscation methods Can not detect zero day malware Increase false negative rate 3 Copyright © 2012, MBL@CS.NCTU Introduction (Cont.) • To conquer the shortcomings of the signaturebased malware detection algorithms, behaviorbased malware detection algorithms were proposed • Behavior-based malware detection algorithms [14] [19] – Detect the unknown malware or the variations of known malware – Decrease false negative rate (FNR) – Increase false positive rate (FPR) • To decrease the FPR, we proposed a behavioral neural network-based malware detection algorithm 4 Copyright © 2012, MBL@CS.NCTU Problem Statement • Given – Several sandboxes – l known malware Mi = {M1,M2, …, Ml} for training – m known malware Sj = {S1, S2, …, Sm} for testing • Objective – n behaviors Bk = {B1,B2, …, Bn} – n weights Wk = {W1,W2, …, Wn} – MD (Malicious degree) 5 Copyright © 2012, MBL@CS.NCTU Related Work • MBF [14] – File, process, network, and registry actions – 16 malicious behavior feature (MBF) – Three malicious degree: high, warning, and low • RADUX [19] – Reverse Analysis for Detecting Unsafe eXecution (RADUX) – Collected 9 common malicious behaviors – Bayes’ theorem 6 Copyright © 2012, MBL@CS.NCTU Related Work (Cont.) Approach MBF [14] RADUX [19] Our Scheme Main idea Analyze behavior features Analyze API calls Analyze malicious behaviors Number of malicious behaviors 16 9 13 Calculating of malicious degree Non-weighted algorithm Weighted algorithm Weighted algorithm Adjusting of weights None Bayes’ theorem Artificial neural network (ANN) False positive rate Low High Low False negative rate Not Available High Low Accuracy rate High Low High 7 Copyright © 2012, MBL@CS.NCTU Background - Sandboxes • • • • Dynamic analysis system Isolated environment Interact with malware Record runtime behaviors 8 Copyright © 2012, MBL@CS.NCTU Background - Sandboxes (Cont.) • Web-based sandboxes – GFI Sandbox [1] – Norman Sandbox [2] – Anubis Sandbox [3] • PC-based sandboxes – Avast Sandbox [4] – Buster Sandbox Analyzer [5] 9 Copyright © 2012, MBL@CS.NCTU Design Approach-Behaviors • Malware Host Behaviors – – – – – – – – – – – – – Creates Mutex Creates Hidden File Starts EXE in System Checks for Debugger Starts EXE in Documents Windows/Run Registry Key Set Hooks Keyboard Modifies Files in System Deletes Original Sample More than 5 Processes Opens Physical Memory Deletes Files in System Auto Start • Malware Network Behaviors – Makes Network Connections • • • DNS Query HTTP Connection File Download 10 Copyright © 2012, MBL@CS.NCTU Design Approach-Behaviors (Cont.) GFI [1] Norman [2] Anubis [3] Creates Mutex V V V Creates Hidden File V V V Starts EXE in System V V V Checks for Debugger V Starts EXE in Documents V Windows/Run Registry Key Set V V V Hooks Keyboard V Modifies File in System V Deletes Original Sample V More than 5 Processes V Opens Physical Memory V Delete File in System V Avast [4] BSA [5] V V V V V V V V V V V V V V V V V V Auto Start V V DNS Query V V V HTTP Connection V V V File Download V V V V 11 Copyright © 2012, MBL@CS.NCTU Design Approach-Behaviors (Cont.) Ulrich Bayer et al. [10] 12 Copyright © 2012, MBL@CS.NCTU Design Approach-Proposed Algorithm 13 Copyright © 2012, MBL@CS.NCTU Design Approach – Weight Training • Using Artificial Neural Network (ANN) to train weights 14 Copyright © 2012, MBL@CS.NCTU Design Approach – Weight Training (Cont.) • Neuron for ANN hidden layer 13 ωπ,1 π₯π − π1 = π1 π=1 π 1 π1 π π1 − π −π1 = π = π1 π 1 + π −π1 15 Copyright © 2012, MBL@CS.NCTU Design Approach – Weight Training (Cont.) • Neuron for ANN output layer 10 ωπ ′ππ − π′ = π′ π π=1 2 π′ π π′ − π −π′ = π′ π + π −π′ 16 Copyright © 2012, MBL@CS.NCTU Design Approach – Weight Training (Cont.) • Delta learning process Mean square error: E ο½ Weight set: 1 (d ο O) 2 2 d: expected target value ο· ο½ {ο·i , j | 1 ο£ i ο£ 13, 1 ο£ j ο£ 10} ο {ο·k ' | 1 ο£ k ο£ 10} ο’ο· ο W , οΆE ο¨ οο· ο½ ο¨ x , : learning factor; x: input value οΆο· ο·new ο½ ο·old ο οο· 17 Copyright © 2012, MBL@CS.NCTU Design Approach-Malicious Degree • Malicious Degree – Malicious behaviors: πΏ = {ππ | π ≤ π ≤ ππ} – Weights: π = ππ,π π ≤ π ≤ ππ, π ≤ π ≤ ππ ∪ π′π π ≤ π ≤ ππ – Bias: π© = ππ π ≤ π ≤ ππ ∪ {π′} – Transfer function:π π = – π΄π« = π ππ ′ π π=π π ππ −π−π ππ +π−π ×π ππ π=π ππ,π ππ − ππ − π′ 18 Copyright © 2012, MBL@CS.NCTU Evaluation • Try to find the optimal MD value to make FPR and FNR approximate to 0. MD Threshold Benign Ambiguous Malicious 19 Copyright © 2012, MBL@CS.NCTU Evaluation (Cont.) • • • • Matlab 7.11.0 Initial weights and bias: random by function initnw Transfer function: tangent-sigmoid function Architecture of ANN (Matlab 7.11.0): 20 Copyright © 2012, MBL@CS.NCTU Evaluation (Cont.) • Malicious sample source: Blast’s Security [6] and VX Heaven [7] websites • Benign sample source: Portable execution files under windows XP SP2 • Training data and testing data Malicious Benign Total Training 500 500 1000 Testing 500 500 1000 21 Copyright © 2012, MBL@CS.NCTU Evaluation (Cont.) • Mean square error: 0.19 • Execution time: 2 seconds • MD threshold (according to training data) 400 350 Nmber of samples 300 250 200 Range of threshold 150 100 50 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Malicious Degree 22 Copyright © 2012, MBL@CS.NCTU 0.35 0.36 0.37 0.38 0.39 0.4 0.41 0.42 0.43 0.44 0.45 0.46 0.47 0.48 0.49 0.5 0.51 0.52 0.53 0.54 0.55 0.56 0.57 0.58 0.59 0.6 0.61 0.62 0.63 0.64 0.65 0.66 0.67 0.68 0.69 0.7 0.71 0.72 0.73 0.74 0.75 0.76 0.77 0.78 0.79 0.8 False Rate (%) Evaluation (Cont.) • Choose threshold 4.5 4 3.5 3 2.5 2 1.5 False Positive Rate 1 False Negative Rate 0.5 0 Malicious Degree 23 Copyright © 2012, MBL@CS.NCTU Evaluation (Cont.) • Experiment results TP TN FP FN FPR FNR Accuracy 483 494 6 17 1.2% 96.6% 97.7% 400 Number of Samples 350 MD Threshold = 0.44 300 Benign Samples 250 200 Malicious Samples 150 100 50 0 0 0.05 0.1 0.15 0.2 0.25 0.3 0.35 0.4 0.45 0.5 0.55 0.6 0.65 0.7 0.75 0.8 0.85 0.9 0.95 1 Malicious Degree 24 Copyright © 2012, MBL@CS.NCTU Evaluation (Cont.) TP / (TP + FN) FN / (TP + FN) FP / (FP + TN) TN / (FP + TN) 96.6% 3.4% 1.2% 98.8% Not Available Not Available 2.13% 97.87% 95.6% 4.4% 9.8% 90.2% Approach Our Scheme MBF [14] RADUX [19] 25 Copyright © 2012, MBL@CS.NCTU Evaluation (Cont.) Weights Accuracy Rate Weights in Hidden Layer Weights in Output Layer Random Random 98.8% Frequency Random 98% 1 1 92.42% 0.5 0.5 91% Without ANN None 91.36% 26 Copyright © 2012, MBL@CS.NCTU Conclusion and Future Work • Conclusion – – – – Collect several common behaviors of malwares Compose Malicious Degree (MD) formula The false positive rate and false negative rate is approximated to 0 Detect unknown malware • Future work – – – – Automate the system Implement PC-based sandboxes Add more malware network behaviors Classify malwares according to their typical behaviors 27 Copyright © 2012, MBL@CS.NCTU References [1] GFI Sandbox. http://www.gfi.com/malware-analysis-tool [2] Norman Sandbox. http://www.norman.com/security_center/security_tools [3] Anubis Sandbox. http://anubis.iseclab.org/ [4] Avast Sandbox. http://www.avast.com/zh-cn/index [5] Buster Sandbox Analyzer (BSA). http://bsa.isoftware.nl/ [6] Blast's Security. http://www.sacour.cn [7] VX heaven. http://vx.netlux.org/vl.php [8] Neural Network Toolbox. http://dali.feld.cvut.cz/ucebna/matlab/toolbox/nnet/initnw.html [9] “A malware tool chain: active collection, detection, and analysis,” NBL, National Chiao Tung University. [10] U. Bayer, I. Habibi, D. Balzarotti, E. Krida, and C. Kruege, “A view on current malware behaviors,” Proceedings of the 2nd USENIX Workshop on Large-Scale Exploits and Emergent Threats : botnets, spyware, worms, and more, pp. 1 - 11, Apr. 22-24, 2009. [11] U. Bayer, C. Kruegel, and E. Kirda, “TTAnalyze: a tool for analyzing malware,” Proceedings of 15th European Institute for Computer Antivirus Research, Apr. 2006. [12] M. Egele, C. Kruegel, E. Kirda, H. Yin, and D. Song, “Dynamic spyware analysis,” Proceedings of USENIX Annual Technical Conference, pp. 233 - 246, Jun. 2007. [13] H. J. Li, C. W. Tien, C. W. Tien, C. H. Lin, H. M. Lee, and A. B. Jeng, "AOS: An optimized sandbox method used in behavior-based malware detection," Proceedings of Machine Learning and Cybernetics (ICMLC), Vol. 1, pp. 404-409, Jul. 10-13, 2011. 28 Copyright © 2012, MBL@CS.NCTU References (Cont.) [14] W. Liu, P. Ren, K. Liu, and H. X. Duan, “Behavior-based malware analysis and detection,” Proceedings of Complexity and Data Mining (IWCDM), pp. 39 - 42, Sep. 24-28, 2011. [15] C. Mihai and J. Somesh, “Static analysis of executables to detect malicious patterns,” Proceedings of the 12th conference on USENIX Security Symposium, Vol. 12, pp. 169 - 186, Dec. 10-12, 2006. [16] A. Moser, C. Kruegel, and E. Kirda, “Exploring multiple execution paths for malware analysis,” Proceedings of 2007 IEEE Symposium on Security and Privacy, pp. 231 - 245, May 20-23, 2007. [17] J. Rabek, R. Khazan, S. Lewandowskia, and R. Cunningham, “Detection of injected, dynamically generated, and ob-fuscated malicious code,” Proceedings of the 2003 ACM workshop on Rapid malcode, pp. 76 - 82, Oct. 27-30, 2003. [18] K. Rieck, T. Holz, C. Willems, P. Dussel, and P. Laskov, “Learning and Classification of Malware Behavior,” in Detection of Intrusions and Malware, and Vulnerability Assessment, Vol. 5137, pp. 108-125, Oct. 9, 2008. [19] C. Wang, J. Pang, R. Zhao, W. Fu, and X. Liu, “Malware detection based on suspicious behavior identification,” Proceedings of Education Technology and Computer Science, Vol. 2, pp. 198 - 202, Mar. 7-8, 2009. [20] C. Willems, T. Holz, and F. Freiling. “Toward automated dynamic malware analysis using CWSandbox,” IEEE Security and Privacy, Vol. 5, No. 2, pp. 32 - 39, May. 20-23, 2007. [21] Y. Zhang, J. Pang, R. Zhao, and Z. Guo,"Artificial neural network for decision of software maliciousness," Proceedings of Intelligent Computing and Intelligent Systems (ICIS), Vol. 2, pp. 622 - 625, Oct. 29-31, 2010. 29 Copyright © 2012, MBL@CS.NCTU