Chapter 3 Viruses Virus Definition Recall definition from Chapter 2… Self-replicating: yes Population growth: positive Parasitic: yes When executed, tries to replicate itself into other executable code o So, it relies in some way on other code Does not propagate via a network Virus 3 parts to a virus Infection mechanism --- how it spreads o Multipartite virus uses multiple means Trigger --- decides when/how to deliver payload Payload --- what it does other than spread o Either intentional or accidental Virus Pseudocode Without infection mechanism… o It’s not a virus, it’s a logic bomb But trigger and payload are optional Generic virus pseudocode def virus(): infect() if trigger() is true: payload() Infection Pseudocode Targets must be “local” Don’t select already infected targets o Can be a double edged sword def infect(): repeat k times: target = select_target() if no target: return infect_code(target) Virus Classification Possible to classify in many ways Here, we classify in 2 ways: Target o What/where does the virus infect? Concealment strategy o What does it do to remain undetected? Classification by Target Briefly consider 3 cases Boot-sector infectors Executable file infectors Data file infectors o Macro viruses Boot Sequence Generic 1. 2. boot sequence Power on ROM-based instructions run o Self-test, device detection, initialization o Boot device IDed, boot block read from it o Control transferred to the loaded code -- this step known as primary boot Boot Sequence Continued 3. Code loaded in primary boot step loads larger, fancier program o This is secondary boot 4. Secondary boot loads/runs OS kernel Boot Sector Infector Why infect boot sector? A boot-sector infector (BSI) o Infects by copying itself to boot block May copy boot block elsewhere o Could be tricky, require lots of code o So a fixed “safe” location chosen o Different viruses may use same “safe” location (e.g., Stoned and Michelangelo) Boot Sector Infector BSI once popular, not so much now Why? o Machines don’t reboot so often o Much harder to infect, due to better defenses Multiple Infections File Infectors OS views some files as executable o Like “exe” and similar Files that can be run by a command-line "shell" also considered executable o Batch files, shell scripts, … File infector --- infects executable file o Exe, shell code, consider executable o Binary executable is most common target File Infectors Two main issues… 1. Where to put the virus within file? 2. How to execute the virus when infected file is run? Consider these two (interrelated) questions in next few slides Beginning of File Older exe formats (e.g., .COM) treat entire file as chunk of code and data o Entire file loaded into memory o Execution starts by jumping to the beginning of the loaded file Can put virus at start of such a file o That is, prepend the virus code Prepended Virus End of File Append a virus (even easier?) Then how does virus get executed? Some possibilities… Replace first line(s) with a jump to viral code --- save overwritten code Later, transfer control back to code o How to do this? End of File How to transfer control back to code? o Run saved instructions in saved location o Restore the infected code back to its original state and run it Many exe file formats specify start location in file header o If so, virus can change start location to point to its own code and jump to the original start location when done Appended Virus Overwritten into File Virus places itself atop original code Can avoid changes in file size Easy for virus to get control But… overwriting code will break the original code o Making virus easier to discover Is it possible to overwrite without breaking the code? Overwritten into File Smart ways to overwrite? Overwrite repeated data o May be trickier to execute virus Save overwritten data (like BSI) Use over-allocated space in a file Compress code to make space For these to work, virus must be small Merged with File Could try to merge virus with target I.e., intermixing virus/target code Difficult o So, it’s “rarely seen” But, supposedly, Zmist does this o So, apparently it is possible o That’s impressive… Not in File Companion virus --- separate from, but naturally executed before target No modification to infected code May take advantage of process used by OS or shell to search for exe files Like a Trojan horse but it’s a virus… o …since it’s self-replicating Companion Virus Virus is earlier in the search path o Same name as the target file, almost… E.g., MS-DOS searches for “foo” by 1. Look for foo.com 2. Look for foo.exe 3. Look for foo.bat If the target file is a foo.exe, companion virus is in file foo.com Companion Virus Windows registry associates file types with applications Can modify registry so that companion virus runs instead of exe o Then companion can transfer control to the corresponding exe In effect, all exes infected at once! Companion Virus ELF file format used on recent Unix’s Has "interpreter" specified in each exe file header o Points to run-time linker Companion time linker virus can replace the run- o As above, effect is that all exe files infected at once Companion Virus Companion viruses possible in GUI App’s icon can be overwritten with the icon for the companion virus When a user clicks on “app” icon… o Companion virus runs instead Macro Virus Some apps allow data files to have macros embedded in them Macros are short snippets of “code” interpreted by the application Such a languages often provide enough functionality to write a virus Macro Virus Macros often run automatically when file is loaded o Easy to write compared to low-level code First proof of concept in 1989 Hit “mainstream” in 1995 o Virus known as Concept o Targeted Microsoft Word (of course) o Installed in “global macros” o Infected all edited documents Macro Virus: Concept Targeted Word Docs AutoOpen macro --- runs automatically when file opened o How you get the virus from infected file FileSaveAs --- when “file save as” selected from menu o So the virus can infect other docs Macro Virus: Concept Classification by Concealment Strategy Most viruses try to hide o Why? So, how do they hide? o Encryption o Polymorphism o Etc., etc. Yet another way to classify viruses.. No Concealment Do nothing to hide This is easiest for virus writer… o …but also easiest to detect, analyze Encryption Why encrypt? Virus body is “hidden” from view o In particular, the signature is hidden Distinguish between strong encryption and obfuscation Viruses usually only obfuscated o Very weak encryption Encrypted Virus Encryption How to encrypt? o Let me count the ways… 1. Simple encryption o Rotate, increment, negate, etc. 2. Static encryption key o E.g., XOR fixed byte to all bytes 3. Variable encryption key o Like static, but key changes Encryption (Continued) 4. Substitution cipher o Permute the bytes o Could be via lookup table o Could even have multiple ciphertexts decrypt to same plaintext 5. Strong encryption o DES, AES, RC4, etc. o Might use crypto libraries Stealth Tries to hide the infection o Not just hide the virus signature Examples of stealth techniques o Change timestamp and/or other file info to pre-infection values o Intercept I/O calls to hide presence (in MS-DOS user-accessible interrupts) o Hijack secondary boot loader Stealth Stealth viruses “overlap” rootkits Rootkit --- installed on compromised machine so attacker can use it o Stealth is critical to rootkit success Some malware use rootkits o For example, Ryknos Trojan hid itself using a rootkit designed for DRM Reverse Stealth Virus What is “reverse stealth”? Make everything look infected! Why is this malicious? o Damage may be done by AV software trying to disinfect Oligomorphism Oligomorphic or semi-polymorphic Code is encrypted Decryptor code is morphed o But not too many different decryptors For example o Whale had 30 different decryptors o Memorial had 96 decryptors How to detect? Polymorphism Like oligomorphic, but lots more decryptors Essentially, an infinite number For example o Tremor has almost 6 billion decryptors So, AV software cannot have a signature for each decryptor Polymorphism 2 problems for polymorphic writer… How to generate decryptors? o Use a mutation engine o Engine is part of encrypted virus How to detect previous infections? o Data “hiding”: timestamp, file size, file system features, external storage, … o “Inoculate” system by faking infection? Mutation Engine 1. Equivalent instruction substitution o One or more instructions 2. 3. 4. 5. 6. 7. Instruction reordering Register swap Reorder data Spaghetti code Insert junk code Run-time code modification/generation Mutation Engine Subroutine permutation 9. DIY virtual machine 10. Concurrency --- threads 11. Inlining/outlining 12. “Threaded” code --- not threads 8. Jump directly from one subroutine to another, without returning 13. Subroutine interleaving Mutation Engine Many, many other possibilities Possible overlap with optimizing compilers? o Seems more like de-optimizing… Equivalent Instructions All of these lines set register r1 to 0 clear r1 xor r1,r1 and 0,r1 move 0,r1 Concurrency Example r1 = 12 r2 = 34 r3 = rl + r2 => start thread T r1 = 12 wait for signal r3 = r1 + r2 ... T: r2 = 34 send signal exit thread T Concurrency Aside: Concurrency may be very effective anti-reversing technique o Use multiple threads o Intentional deadlock o “Junk” threads Described in masters project: Improved software activation using multithreading Mutation Mutation 1. 2. also can be used for good Makes reverse engineering attacks more difficult Make software more “diverse” Metamorphism Apply polymorphism to virus body o Aka, “body polymorphic” No encryption/decryption needed Body must change a lot o Goal is to have no common signature Mutation code must be mutated too! o Otherwise, a signature will exist o Different from polymorphic (why?) Metamorphism Two types of metamorphic generators o Both types difficult to produce 1. Standalone o Apply generator offline o Easy to make old malware into “new” 2. Malware “carries its own generator” o Necessary if self-propagating o A much more difficult problem Metamorphism: Apparition Apparition --- metamorphic virus Delivered in source code (Pascal) If compiler is present… o Insert junk code and compile A very lame approach Real metamorphism must be done in assembly or (better yet) machine code Metamorphism: Simile Simile --- metamorphic virus Simile’s metamorphic generator o 12,000 lines of assembly o Translate Simile to intermediate form o Then remove all old transformations o Obtains a base form of virus o Apply new set of transformations o Generate new (morphed) machine code Metamorphism: MetaPHOR Metamorphic Permutating HighObfuscating Reassembler o That is, MetaPHOR Described in How I Made Metaphor and What I’ve Learnt by The Mental Driller Complex expander/shrinker strategy Almost impossible to analyze Metamorphism: MWOR Metamorphic Worm, i.e., MWOR Experimental metamorphic malware designed by former masters student Modeled on MetaPHOR, but… o Easier to understand o Better for experiments and testing o A useful research tool How to detect? Metamorphism The bottom line… Metamorphics difficult to detect o Machine learning works well on hacker malware, but can be defeated Metamorphics also difficult to write o Most “metamorphic” generators aren’t Current state of the art? o “Undetectable” metamorphic viruses Strong Encryption What is strong encryption? Use a real cipher For this to be useful, must not store key with code o Why not? But must decrypt the virus How to get the key to the code? Strong Encryption: Key Store key on the web o Then must go fetch the key o But then how to get the key? Binary virus --- 2 parts o Low probability that both parts arrive “Environmental” key generation o Key based on machine-specific info o Key derived at runtime o Harder to analyze Other??? Virus Kits Many malware construction kits o See VX Heavens Many kits claim to be metamorphic o Or polymorphic, or encrypted, or … o You should be very skeptical of claims o Some have nice GUI interface Success is failure? o The more successful, the more likely it has been studied and can be detected