Background Research Paper October 21, 2014 Group 08 - Intelligent Malware Detection Members: Kevin Hao Dominique Amos Alexander Finkelstein Michael Hite Joshua Suess Mentor: Dr. Yanfang Ye Instructor: Dr. Yenumula V. Reddy Background of Malware: Malware, shorthand for malicious software, refers to any software designed to damage or disrupt a system. When the rise of malware was in its infancy, most of the infectious programs were written by amateurs seeking attention or amusement. However, modern malware is often created by professional criminals, hackers, and government institutions with the goal of stealing personal, financial, and other secretive information, in addition to being used to sabotage and disrupt systems. Billions of dollars of damage are done annually due to malware. Malware is extremely prevalent today, with an estimated 1 in 14 downloads from the internet containing infectious code [1]. Users can easily be infected if they mistakenly open or run any file containing infectious code, even if it seems benign. Certain system vulnerabilities can also be exploited to infect users, often times without their action or knowledge. Anti-malware programs have become more prevalent in recent times in order to combat these malicious programs, but may not catch all malware, especially against newer methods of obfuscation such as polymorphism and metamorphism. Malware is an umbrella term for all malicious software, which includes viruses, worms, Trojans, spyware, adware, rootkits, among others. Categorization of malware is often dependent on the type of behavior of the software in addition to the method of propagation. The most prevalent categories of malware are Trojan horses at 69.90%, viruses at 16.82%, and worms at 7.77% with the three making up almost 95% of all malware samples [2]. Computer viruses are defined as code attached to software that when executed by the user, copies itself to other programs and files on the machine, “infecting” them. Computer worms on the other hand, are standalone programs that spreads through computer networks by exploiting system vulnerabilities, replicating itself in order to spread to other computers. Worms are often more dangerous than viruses, as they do not require attachment to an existing program, and can be spread without any action of the users provided their machine is vulnerable. Trojan horses are malicious programs that are non-replicating, commonly posing as legitimate software or documents. There are many ways for malware to spread. In a study done in 2011 on malware propagation techniques, Microsoft found that 44.8% of malware detected by their Malicious Software Removal Tool required user interaction. A further 26% was due to USB autorun and 17.2% was due to network autorun [2]. In order to evade anti-malware software, many concealment techniques are used in the design of malicious software. Malware can be encrypted, where the encryption algorithm, encryption keys, malicious code, and decryption algorithm is packaged together. The code is encrypted to evade anti-malware software, and later self-decrypted using the decryption algorithm and encryption keys. Malware can also be packed, where the executable is compressed (at times more than once) so it appears different to anti-malware programs. Malware code can be obfuscated by changing the code without changing the logic of the program. Examples include adding garbage commands or unnecessary jumps to the code. The best and most complex methods of malware concealment include polymorphism and metamorphism. Polymorphic code is designed to change the code every time it is run without changing the function while metamorphic code completely rewrites itself every time it is run, so that the new instance has no resemblance to the original [3]. Combinations of concealment techniques are often used to fool anti-malware software. Malware infects 2-5 million computers every day. An average $345,000 is lost or stolen per incident of accounts and passwords stolen. In 2014, malware will cost enterprises an estimated $500 billion dollars, and consumers an estimated $25 billion dollars, in addition to wasting 1.2 billion man hours dealing with its effects [4]. With these figures, it is not difficult to see the necessity of adequate anti-malware software. Background of Anti-Malware Software While anti-malware software has existed alongside malware since the early 80s, much of the malware during that time were not explicitly malicious in nature and instead focused on selfreproduction with no damage routine. As more people became aware of the potential malicious uses of malware, the number of both malware and anti-malware software grew steadily. It was not until 2007-2008 that the number of malware exploded [5]. It was during this time that most households switched from dial-up internet to the always on broadband connections. Looking at data from Kingston Cloud Security Center, it acquired 240,156 malware samples in 2006, 283,084 in 2007, 13,899,717 samples in 2008, 20,684,223 samples in 2009, and now up to 41,265,082 samples in 2013. With this huge increase in malware numbers, in addition to new methods of concealment for malware, it has gotten increasingly difficult for the user to ensure that they are protected. Anti-malware software needs constant updates to guard against the ever evolving malware. Combating malware can be described as a cat and mouse game – it attempts to bypass an anti-malware program that attempts to stop it. Counter measures are taken by both malware and the anti-malware program to combat each other, an everlasting process. Although no anti-malware algorithm can detect all threats, it is possible to achieve a good rate of detection. The most commonly used method of malware detection is signature-based detection, where signatures, short strings of bytes unique to the program, are used to identify the malicious code. However, modified and unknown executables pose a problem in that the signature extraction and generation processes can be bypassed. Heuristic-based detection is another commonly used method that attempts to detect characteristics used by malware in the file, providing protection against previously unknown and new variants of malware. Effective anti-malware software typically employ these methods among others to protect against malicious programs [6]. Today, common problems plaguing anti-malware software include degradation of system performance, false positives, and software effectiveness. Advances in technology and methodology have improved the situation, but it is still far from perfect. One of the newest approaches in malware detection involves data mining and machine learning, where features from a file are extracted and classified as malicious or benign. In conjunction with cloud based technology, system load for users can be reduced while increasing the effectiveness of malware detection. Shareholders There is a wide range of shareholders that will be affected by this product. All those who use anti-malware software are considered primary shareholders. All businesses and individuals who use computers and want adequate protection from malicious programs are included. All those who produce other anti-malware software will also be affected, especially if this product proves to be more effective. The general public is considered a secondary shareholder. If this product is successful and helps mitigate the 500 billion dollar and 1.2 billion man-hour loss suffered by businesses at the hands of malware, the profits may be passed on to the general public. In addition, sabotage attacks caused by malware may mean that the target is available to its users. Product Although a large number of anti-malware products exist, malware is still a persistent issue that is causing a large amount of damage. Despite new products and constant updates of current products, malware attacks grow in number every year. Although it may be impossible to protect against 100% of all malware, it is possible to achieve greater protection with higher efficiency. Data mining techniques that are used in malware detection is a rising area that has not fully been explored. A cloud-based product utilizing data mining and machine learning algorithms to detect malware may prove to be more effective and efficient than current products. In order to create a competitive product, the product must have three essential functionalities. The final product will be an anti-malware suite that is able to determine whether a product is malicious or benign, be run in the background and scan files before they are allowed on the system, and be able to scan files on the computer, detecting and removing malicious files. In addition, there are objectives that the product will attempt to meet. They are ranked below in order of importance. Needs Rank Reliability 1 Functionality 2 Lightweight 3 Usability 4 Cost Efficiency 5 Reliability refers to the likelihood of failure-free operation. This is especially important for anti-malware software; the computer needs to be constantly protected from attacks, so it is important that the product reliable. The product also must implement the three essential functionalities at an acceptable level. Ideally, benign files must be allowed through the system while malicious files are flagged. It is important that no benign files are flagged and most if not all malicious files are caught. Next, the product should be lightweight and not hog system resources. A common issue with today’s anti-malware software is that it may slow down the system of users, which is not ideal. Implementing a cloud based product will help alleviate the resource load by transferring it from the user to the server. Since the product will be available to a wide audience, it needs to be easy to use. The interface and all user accessible features should be user friendly. References [1]Rooney, Ben (2011-05-23). "Malware Is Posing Increasing Danger". Wall Street Journal. [2] "Evolution of Malware-Malware Trends". Microsoft Security Intelligence Report-Featured Articles. Microsoft.com. Retrieved 28 April 2013. [3] Wong, Wing; Stamp, M. (2006). Hunting for Metamorphic Engines. Journal in Computer Virology. Department of Computer Science, San Jose State University. [4] “The Link between Pirated Software and Cybersecurity Breaches” (Mar. 2006). National University of Singapore and IDC. [5]"F-Secure Quarterly Security Wrap-up for the first quarter of 2008". F-Secure. 31 March 2008. Retrieved 25 April 2008. [6]"Antivirus Research and Detection Techniques". ExtremeTech. Archived from the original on 27 February 2009. Retrieved 2009-02-24. [7] Du X.; Sumeet D. (2011-04-25). “Data Mining and Machine Learning in Cybersecurity.”