Uploaded by Yeong Tyng Ling

Metamorphic Malware detection using Nonnegative Matrix Factorization

advertisement
Abstract of thesis presented to the Senate of Universiti Putra Malaysia in fulfilment of the
requirement for the degree of Doctor of Philosophy
EFFECTIVE METAMORPHIC MALWARE DETECTION WITH
STRUCTURAL FEATURES AND NONNEGATIVE MATRIX
FACTORIZATION
By
LING YEONG TYNG
Month and 2021
Chairman: Associate Professor Nor Fazlida Mohd Sani, Ph.D.
Faculty: Computer Science and Information Technology
Metamorphic malware is well known for evading signature-based detection. It adopts code obfuscation techniques to disguise its malicious behavior and modify the syntax or structure of itself.
Thus, it can generate variants within a malware family during each propagation and this make
detection difficult. Indeed, although some approaches have been proposed over the years, but
the results were less than ideal for metamorphic malware detection. Besides, the tools that been
used to extract file features of a binary file is platform dependent, making their scope restrained.
Moreover, the quantity of file features is large and this increase detection system workload, thus
delay the instant detection response ability.
To overcome the above issues, this research propose a framework of metamorphic malware detection which consists of three main parts. The first part is to propose a spectral-based feature
reduction method called Nonnegative Matrix Factorization for metamorphic malware detection.
The second part is to propose five alternative feature representations on raw bytes of a binary file
by using compression ratio, entropy, Jaccard similarity coefficient on hexadecimal bytes, Jaccard
similarity coefficient on integer bytes, and Chi-square statistic test. This is to reduce the prior
knowledge required during feature engineering step at the same time to leverage detection result. The third part comprises of employing Nonnegative Matrix Factorization and new feature
representations on structural similarity-based detection, machine learning based detection using
Random Forest with Conditional Inference Tree, and Hidden Markov Model based detection.
The proposed approach makes use the raw byte of executable files regardless the file format gives
the flexibility to be applied in other platforms.
iii
Experimental evaluation of the proposed approach, using existing datasets, achieved satisfactory
results. During the study, in structural similarity analysis, the experimental results demonstrated
the accuracy rate is in the range of 95% to 100% when the low rank of 1 or 2 of Nonnegative
Matrix Factorization is used on the compression ratio and entropy features. The experimental
results of Random Forest classifier have shown the efficiency of the proposed approach with an
accuracy rate of 98% ∼ 99% for metamorphic malware families. As for the Hidden Markov Model,
by using entropy feature representation, a 96% ∼ 99% range of accuracy can be achieved. Based
on the results, this study demonstrates the effectiveness of non-entropic feature representation
with machine learning algorithms for metamorphic malware detection.
iv
Download