Malware original slides provided by Prof. Vern Paxson University of California, Berkeley Host-Based Intrusion Detection Systems (HIDS) (also known as anti-virus software) • URL/Web access blocking: – Prevent users from going to known bad locations • Protocol scanning of network traffic – Detect & block known attacks (e.g., port scans) – Detect & block known malware communication (e.g., contacting a botnet) • Payload scanning – Detect & block known malware • (Auto-update of signatures for these) • Cloud queries regarding reputation – Who else has run this executable and with what results? – What’s known about the remote host / domain / URL? Inside a Modern HIDS, con’t • Sandbox execution – Run selected executables in constrained/monitored environment – Analyze: • System calls • Changes to files / registry • Self-modifying code (polymorphism/metamorphism) • File scanning – Look for known malware that installs itself on disk • Memory scanning – Look for known malware that never appears on disk • Runtime analysis – Apply heuristics/signatures to execution behavior Network Intrusion Detection Systems (NIDS) • Deployment inside network as well as at border – Greater visibility, including tracking of user identity • Full protocol analysis – Including extraction of complex embedded objects – In some systems, 100s of known protocols • Signature analysis (also behavioral) – Known attacks, malware communication, blacklisted hosts/domains – Known malicious payloads – Sequences/patterns of activity • Shadow execution (e.g., Flash, PDF programs, in sandbox) • Extensive logging (in support of forensics) • Auto-update of signatures, blacklists NIDS vs. HIDS • NIDS benefits: – Can cover a lot of systems with single deployment • Much simpler management – Easy to “bolt on” / no need to touch end systems – Doesn’t consume production resources on end systems – Harder for an attacker to subvert / less to trust • HIDS benefits: – Can have direct access to semantics of activity • Better positioned to block (prevent) attacks • Harder to evade – Can protect against non-network threats – Visibility into encrypted activity – Performance scales much more readily (no chokepoint) • No issues with “dropped” packets The Problem of Malware • Malware = malicious code that runs on a victim’s system • How does it manage to run? – Attacks a network-accessible vulnerable service – Vulnerable client connects to remote system that sends over an attack (a driveby) – Social engineering: trick user into running/installing – “Autorun” functionality (esp. from plugging in USB device) – Slipped into a system component (at manufacture; compromise of software provider; substituted via MITM) – Attacker with local access downloads/runs it directly • Might include using a “local root” exploit for privileged access What Can Malware Do? • Pretty much anything – Payload generally decoupled from how manages to run – Only subject to permissions under which it runs • Examples: – – – – – – Brag or exhort or extort (pop up a message/display) Trash files (just to be nasty) Damage hardware (Stuxnet?) Launch external activity (spam, click fraud, DoS) Steal information (exfiltrate) Keylogging; screen / audio / camera capture • Robbins v. Lower Merion School District – Encrypt files (ransomware) • Possibly delayed until condition occurs – “time bomb” / “logic bomb” Malware That Automatically Propagates • Virus = code that propagates (replicates) across systems by arranging to have itself eventually executed, creating a new additional instance – Generally infects by altering stored code – Typically with the help of a user • Worm = code that self-propagates/replicates across systems by arranging to have itself immediately executed (creating new addl. instance) – Generally infects by altering running code – No user intervention required • (Note: line between these isn’t always so crisp; plus some malware incorporates both styles) The Problem of Viruses • Opportunistic = code will eventually execute – Generally due to user action • Running an app, booting their system, opening an attachment • Separate notions: how it propagates vs. what else it does when executed (payload) • General infection strategy: find some code lying around, alter it to include the virus • Have been around for decades … – … resulting arms race has heavily influenced evolution of modern malware http://vxheaven.org/lib/vml01.html Propagation • When virus runs, it looks for an opportunity to infect additional systems • One approach: look for USB-attached thumb drive, alter any executables it holds to include the virus – Strategy: when drive later attached to another system & altered executable runs, it locates and infects autorun is executables on new system’s hard drive handy here! • Or: when user sends email w/ attachment, virus alters attachment to add a copy of itself – Works for attachment types that include programmability – E.g., Word documents (macros), PDFs (Javascript) – Virus can also send out such email proactively, using user’s address book + enticing subject (“ILOVEYOU”) LOVE-LETTER-FOR-YOU.txt.vbs Original program instructions can be: Entry point Original Program Instructions Virus Entry point • Application the user runs • Run-time library / routines resident in memory Original Program Instructions • Disk blocks used to boot OS • Autorun file on USB device 3. JMP Original Program Instructions 2. JMP Virus 1. Entry point •… Other variants are possible; whatever manages to get the virus code executed Detecting Viruses • Signature-based detection – Look for bytes corresponding to injected virus code – High utility due to replicating nature • If you capture a virus V on one system, by its nature the virus will be trying to infect many other systems • Can protect those other systems by installing recognizer for V • Drove development of multi-billion $$ AV industry (AV = “antivirus”) – So many endemic viruses that detecting well-known ones becomes a “checklist item” for security audits • Using signature-based detection also has de facto utility for (glib) marketing – Companies compete on number of signatures … • … rather than their quality (harder for customer to assess) (will check your file with 40+ antivirus programs) Virus Writer / AV Arms Race • If you are a virus writer and your beautiful new creations don’t get very far because each time you write one, the AV companies quickly push out a signature for it …. – …. What are you going to do? • Need to keep changing your viruses … – … or at least changing their appearance! • How can you mechanize the creation of new instances of your viruses … – … so that whenever your virus propagates, what it injects as a copy of itself looks different? Polymorphic Code • We’ve already seen technology for creating a representation of data apparently completely unrelated to the original: encryption! • Idea: every time your virus propagates, it inserts a newly encrypted copy of itself – Clearly, encryption needs to vary • Either by using a different key each time • Or by including some random initial padding (like an IV) – Note: weak (but simple/fast) crypto algorithm works fine • No need for truly strong encryption, just obfuscation • When injected code runs, it decrypts itself to obtain the original functionality Virus Original Program Instructions Instead of this … Original Program Instructions Virus has this initial structure } Key Decryptor Encrypted Glob of Bits When executed, decryptor applies key to decrypt the glob … Key Decryptor Main Virus Code Jmp … and jumps to the decrypted code once stored in memory Polymorphic Propagation Key Decryptor Encrypted Glob of Bits } Jmp Encryptor Key Decryptor Main Virus Code Key2 Decryptor Different Encrypted Glob of Bits Once running, virus uses an encryptor with a new key to propagate New virus instance bears little resemblance to original Arms Race: Polymorphic Code • Given polymorphism, how might we then detect viruses? • Idea #1: use narrow sig. that targets decryptor – Issues? • Less code to match against more false positives • Virus writer spreads decryptor across existing code • Idea #2: execute (or statically analyze) suspect code to see if it decrypts! – Issues? • Legitimate “packers” perform similar operations (decompression) • How long do you let the new code execute? – If decryptor only acts after lengthy legit execution, difficult to spot • Virus-writer countermeasures? Metamorphic Code • Idea: every time the virus propagates, generate semantically different version of it! – Different semantics only at immediate level of execution; higher-level semantics remain same • How could you do this? • Include with the virus a code rewriter: – Inspects its own code, generates random variant, e.g.: • • • • • Renumber registers Change order of conditional code Reorder operations not dependent on one another Replace one low-level algorithm with another Remove some do-nothing padding and replace with different donothing padding (“chaff”) – Can be very complex, legit code … if it’s never called! Polymorphic Code In Action At some point in time can take a snapshot of completely decrypted virus. Hunting for Metamorphic, Szor & Ferrie, Symantec Corp., Virus Bulletin Conference, 2001 https://www.symantec.com/avcenter/reference/hunting.for.metamorphic.pdf Metamorphic Code In Action Zmist virus hides itself inside original program, just as T-1000 can hide in the floor. Hunting for Metamorphic, Szor & Ferrie, Symantec Corp., Virus Bulletin Conference, 2001 https://www.symantec.com/avcenter/reference/hunting.for.metamorphic.pdf Detecting Metamorphic Viruses? • Need to analyze execution behavior – Shift from syntax (appearance of instructions) to semantics (effect of instructions) • Two stages: (1) AV company analyzes new virus to find behavioral signature; (2) AV software on end systems analyze suspect code to test for match to signature • What countermeasures will the virus writer take? – Delay analysis by taking a long time to manifest behavior • Long time = await particular condition, or even simply clock time – Detect that execution occurs in an analyzed environment and if so behave differently (e.g., VW emissions controls activation) • E.g., test whether running inside a debugger, or in a Virtual Machine • Counter-countermeasure? – AV analysis looks for these tactics and skips over them • Note: attacker has edge as AV products supply an oracle How Much Malware Is Out There? • A final consideration re polymorphism and metamorphism: – Presence can lead to mis-counting a single virus outbreak as instead reflecting 1,000s of seemingly different viruses • Thus take care in interpreting vendor statistics on malcode varieties – (Also note: public perception that many varieties exist is in the vendors’ own interest) https://www.av-test.org/en/statistics/malware/ Infection Cleanup • Once malware detected on a system, how do we get rid of it? • May require restoring/repairing many files – This is part of what AV companies sell: per-specimen disinfection procedures • What about if malware executed with adminstrator privileges? “nuke the entire site from orbit. It's the only way to be sure” https://www.youtube.com/watch?v=aCbfMkh940Q - Aliens – i.e., rebuild system from original media + data backups • Malware may include a rootkit: kernel patches to hide its presence (its existence on disk, processes) Infection Cleanup, con’t • If we have complete source code for system, we could rebuild from that instead, couldn’t we? • No! • Suppose forensic analysis shows that virus introduced a backdoor in /bin/login executable – (Note: this threat isn’t specific to viruses; applies to any malware) • Cleanup procedure: rebuild /bin/login from source … Compiler Regular compilation process of building login binary from source code /bin/login executable Compiler /bin/login executable Compiler infected (perhaps a long time ago) recognizes when it’s compiling /bin/login source and inserts extra back door when seen X No problem: first step, rebuild the compiler so it’s uninfected Infected Compiler Correct compiler executable Infected Compiler Oops - infected compiler recognizes when it’s compiling its own source and inserts the infection! Infected Compiler No amount of careful source-code scrutiny can prevent this problem. Reflections on Trusting Trust Turing-Award Lecture, Ken Thompson, 1983