Background Research Paper (1)

advertisement
Background Research Paper
October 21, 2014
Group 08 - Intelligent Malware Detection
Members:
Kevin Hao
Dominique Amos
Alexander Finkelstein
Michael Hite
Joshua Suess
Mentor:
Dr. Yanfang Ye
Instructor:
Dr. Yenumula V. Reddy
Background of Malware:
Malware, shorthand for malicious software, refers to any software designed to damage or
disrupt a system. When the rise of malware was in its infancy, most of the infectious programs
were written by amateurs seeking attention or amusement. However, modern malware is often
created by professional criminals, hackers, and government institutions with the goal of stealing
personal, financial, and other secretive information, in addition to being used to sabotage and
disrupt systems. Billions of dollars of damage are done annually due to malware. Malware is
extremely prevalent today, with an estimated 1 in 14 downloads from the internet containing
infectious code [1]. Users can easily be infected if they mistakenly open or run any file
containing infectious code, even if it seems benign. Certain system vulnerabilities can also be
exploited to infect users, often times without their action or knowledge. Anti-malware programs
have become more prevalent in recent times in order to combat these malicious programs, but
may not catch all malware, especially against newer methods of obfuscation such as
polymorphism and metamorphism.
Malware is an umbrella term for all malicious software, which includes viruses, worms,
Trojans, spyware, adware, rootkits, among others. Categorization of malware is often dependent
on the type of behavior of the software in addition to the method of propagation. The most
prevalent categories of malware are Trojan horses at 69.90%, viruses at 16.82%, and worms at
7.77% with the three making up almost 95% of all malware samples [2]. Computer viruses are
defined as code attached to software that when executed by the user, copies itself to other
programs and files on the machine, “infecting” them. Computer worms on the other hand, are
standalone programs that spreads through computer networks by exploiting system
vulnerabilities, replicating itself in order to spread to other computers. Worms are often more
dangerous than viruses, as they do not require attachment to an existing program, and can be
spread without any action of the users provided their machine is vulnerable. Trojan horses are
malicious programs that are non-replicating, commonly posing as legitimate software or
documents. There are many ways for malware to spread. In a study done in 2011 on malware
propagation techniques, Microsoft found that 44.8% of malware detected by their Malicious
Software Removal Tool required user interaction. A further 26% was due to USB autorun and
17.2% was due to network autorun [2].
In order to evade anti-malware software, many concealment techniques are used in the
design of malicious software. Malware can be encrypted, where the encryption algorithm,
encryption keys, malicious code, and decryption algorithm is packaged together. The code is
encrypted to evade anti-malware software, and later self-decrypted using the decryption
algorithm and encryption keys. Malware can also be packed, where the executable is compressed
(at times more than once) so it appears different to anti-malware programs. Malware code can be
obfuscated by changing the code without changing the logic of the program. Examples include
adding garbage commands or unnecessary jumps to the code. The best and most complex
methods of malware concealment include polymorphism and metamorphism. Polymorphic code
is designed to change the code every time it is run without changing the function while
metamorphic code completely rewrites itself every time it is run, so that the new instance has no
resemblance to the original [3]. Combinations of concealment techniques are often used to fool
anti-malware software.
Malware infects 2-5 million computers every day. An average $345,000 is lost or stolen
per incident of accounts and passwords stolen. In 2014, malware will cost enterprises an
estimated $500 billion dollars, and consumers an estimated $25 billion dollars, in addition to
wasting 1.2 billion man hours dealing with its effects [4]. With these figures, it is not difficult to
see the necessity of adequate anti-malware software.
Background of Anti-Malware Software
While anti-malware software has existed alongside malware since the early 80s, much of
the malware during that time were not explicitly malicious in nature and instead focused on selfreproduction with no damage routine. As more people became aware of the potential malicious
uses of malware, the number of both malware and anti-malware software grew steadily. It was
not until 2007-2008 that the number of malware exploded [5]. It was during this time that most
households switched from dial-up internet to the always on broadband connections. Looking at
data from Kingston Cloud Security Center, it acquired 240,156 malware samples in 2006,
283,084 in 2007, 13,899,717 samples in 2008, 20,684,223 samples in 2009, and now up to
41,265,082 samples in 2013. With this huge increase in malware numbers, in addition to new
methods of concealment for malware, it has gotten increasingly difficult for the user to ensure
that they are protected. Anti-malware software needs constant updates to guard against the ever
evolving malware. Combating malware can be described as a cat and mouse game – it attempts
to bypass an anti-malware program that attempts to stop it. Counter measures are taken by both
malware and the anti-malware program to combat each other, an everlasting process.
Although no anti-malware algorithm can detect all threats, it is possible to achieve a good
rate of detection. The most commonly used method of malware detection is signature-based
detection, where signatures, short strings of bytes unique to the program, are used to identify the
malicious code. However, modified and unknown executables pose a problem in that the
signature extraction and generation processes can be bypassed. Heuristic-based detection is
another commonly used method that attempts to detect characteristics used by malware in the
file, providing protection against previously unknown and new variants of malware. Effective
anti-malware software typically employ these methods among others to protect against malicious
programs [6].
Today, common problems plaguing anti-malware software include degradation of
system performance, false positives, and software effectiveness. Advances in technology and
methodology have improved the situation, but it is still far from perfect. One of the newest
approaches in malware detection involves data mining and machine learning, where features
from a file are extracted and classified as malicious or benign. In conjunction with cloud based
technology, system load for users can be reduced while increasing the effectiveness of malware
detection.
Shareholders
There is a wide range of shareholders that will be affected by this product. All those who
use anti-malware software are considered primary shareholders. All businesses and individuals
who use computers and want adequate protection from malicious programs are included. All
those who produce other anti-malware software will also be affected, especially if this product
proves to be more effective.
The general public is considered a secondary shareholder. If this product is successful and helps
mitigate the 500 billion dollar and 1.2 billion man-hour loss suffered by businesses at the hands
of malware, the profits may be passed on to the general public. In addition, sabotage attacks
caused by malware may mean that the target is available to its users.
Product
Although a large number of anti-malware products exist, malware is still a persistent
issue that is causing a large amount of damage. Despite new products and constant updates of
current products, malware attacks grow in number every year. Although it may be impossible to
protect against 100% of all malware, it is possible to achieve greater protection with higher
efficiency. Data mining techniques that are used in malware detection is a rising area that has not
fully been explored. A cloud-based product utilizing data mining and machine learning
algorithms to detect malware may prove to be more effective and efficient than current products.
In order to create a competitive product, the product must have three essential
functionalities. The final product will be an anti-malware suite that is able to determine whether
a product is malicious or benign, be run in the background and scan files before they are allowed
on the system, and be able to scan files on the computer, detecting and removing malicious files.
In addition, there are objectives that the product will attempt to meet. They are ranked below in
order of importance.
Needs
Rank
Reliability
1
Functionality
2
Lightweight
3
Usability
4
Cost Efficiency
5
Reliability refers to the likelihood of failure-free operation. This is especially important
for anti-malware software; the computer needs to be constantly protected from attacks, so it is
important that the product reliable. The product also must implement the three essential
functionalities at an acceptable level. Ideally, benign files must be allowed through the system
while malicious files are flagged. It is important that no benign files are flagged and most if not
all malicious files are caught. Next, the product should be lightweight and not hog system
resources. A common issue with today’s anti-malware software is that it may slow down the
system of users, which is not ideal. Implementing a cloud based product will help alleviate the
resource load by transferring it from the user to the server. Since the product will be available to
a wide audience, it needs to be easy to use. The interface and all user accessible features should
be user friendly.
References
[1]Rooney, Ben (2011-05-23). "Malware Is Posing Increasing Danger". Wall Street Journal.
[2] "Evolution of Malware-Malware Trends". Microsoft Security Intelligence Report-Featured
Articles. Microsoft.com. Retrieved 28 April 2013.
[3] Wong, Wing; Stamp, M. (2006). Hunting for Metamorphic Engines. Journal in Computer
Virology. Department of Computer Science, San Jose State University.
[4] “The Link between Pirated Software and Cybersecurity Breaches” (Mar. 2006). National
University of Singapore and IDC.
[5]"F-Secure Quarterly Security Wrap-up for the first quarter of 2008". F-Secure. 31 March
2008. Retrieved 25 April 2008.
[6]"Antivirus Research and Detection Techniques". ExtremeTech. Archived from the original on
27 February 2009. Retrieved 2009-02-24.
[7] Du X.; Sumeet D. (2011-04-25). “Data Mining and Machine Learning in Cybersecurity.”
Download