Malevolution: The Evolution of Evasive Malware Giovanni Vigna Department of Computer Science University of California Santa Barbara http://www.cs.ucsb.edu/~vigna Lastline, Inc. http://www.lastline.com Well, I had it all planned out…. Until this guy came out with his story! Malware can take many forms… Who Is He? • One of the top security researchers in Europe – Hire him! • Came to Berlin’s airport • Guy told him he was in the right taxi line • ‘Hey you don’t have a display with the money’ – Do not worry: The German government is creating a taxitracking program based on GPS so that no taxi driver needs a billing device: awesome!!! – Nick: GPS?!? Tracking!?! No money!?! Awesome!!!! • Scam cost Nick 200 Eur (normal charge would be 30) The Taxi The Taxi Cyberattack (R)Evolution $$ Damage Targeted Attacks and Cyberwarfare Billions Millions Cybercrime Hundreds of Thousands Thousands !!! Cybervandalism $$$ #@! Hundreds Time Cyberattack (R)Evolution Nobody Is Safe… Targeted attacks are mainstream news. Every week, new breaches are reported. In the last few months alone … Drive-by-download Attack www.semilegit.com www.bank.com www.badware.com <iframe src=“http://semilegit.com” height=“0” width=“0”></iframe> www.grayhat.com POST /update?id=5’,’<iframe>..’)-www.evilbastard.com Personal Data, Docs Arms Race(s) Malicious Binary Malicious JavaScript Signature-based Anti-virus Signature-based Web Gateways Obfuscated Polymorphic Malicious Binary Obfuscated Polymorphic Malicious JavaScript sandbox Behavior-based Anti-malware honeyclient Behavior-based Anti-malware Evasive Malicious Binary Evasive Malicious JavaScript An Evasion Framework Labels/Blocks Analysis System Target System Activates Artifact, Provenance Producer Executes/Displays Known Malicious Artifacts, Provenance Known Benign Artifacts, Provenance Consumer An Evasion Framework Analysis System Target System Consumer SPAM X N/A N/A Phishing X N/A X N/A N/A X N/A (*) N/A X Malicious Documents X X X Malicious Web Pages X X N/A Malicious Binaries X N/A N/A Social Engineering Malware Installs (*) First downloader PBKAC: Make the user smarter • • • • Evasion of the user good judgment (SPAM: please don’t go!) PHISHING: educate about provenance MALWARE INSTALLS: educate about Fake AV, codecs – The “Can I haz kittens?” problem • MALICIOUS DOC: don’t open (good luck with that) – Anything with “budget”, “salary”, etc. WILL BE OPENED Harden The Target • Evasion of the mechanisms to limit/control execution • Windows 2023 Ultimate Edition will be able to identify things that just should not be executed • MS Office Professional 56.2 will actually prevent documents from executing arbitrary code • Internet Explorer 23 will detect memory corruption attacks Analysis Systems • Evasion of detection/labeling • Determine if an artifact is malicious based on previous history • Leverage both static and dynamic analysis • Additional information can be leveraged if other components need to be evaded as well Evading Static Analysis • Static analysis techniques can be evaded by making the (relevant) code unavailable – Packing – Delayed inclusion of code • Static analysis techniques can be evaded by exploiting differences in the parsing capabilities of the target system vs. analysis system – Parsing the executable (target is OS) – Parsing the document (target is office application) Evading Static Analysis Source: Binary-Code Obfuscations in Prevalent Packer Tools, Tech Report, University of Wisconsin, 2012 Evading Dynamic Analysis • Dynamic analysis techniques can be evaded by fingerprinting the environment (and not execute) – Detection of modified environment (instrumented libs) – Detection of specific HW/SW configurations • Devices • Users • File names Evading Dynamic Analysis Evading Dynamic Analysis • Dynamic analysis techniques can be evaded by exploiting differences in the execution capabilities of the target system vs. analysis system – Semantics (virtualization/emulation introduces differences) – Speed (dynamic systems are usually slower) – Available resources (analysis has a finite, limited time) • Sleeping • Stalling loops – User activity monitoring Evading Dynamic Analysis • Dynamic evasion – stalling loops Combating Evasion • Static analysis – Use availability and parsing failures as a signal for detection • Benign software is packed • Benign software is obfuscated • Artifacts are often generated in a benign, wrong way – Modify the sample to make it harmless • Normalize • Remove functionality that cannot be analyzed • Might break functionality Combating Evasion • Dynamic analysis – Reduce differences between analysis and target environment • Run on bare metal • Exploit hardware-supported virtualization • Use out-of-the-VM instrumentation – Detect environment checks • Identify conditional execution based on triggers • Return non-static information about the environment – Modify the sample to make it run • Multipath execution Combating Evasion • Exploit the characteristics of multiple evasions – Phishing pages need to evade detection from the analysis system AND by the user • If the page does not look like the impersonated organization the attack will fail – Malicious documents need to evade detection from the analysis system, the target platform, AND the user • If the attachment does not look interesting it will not be activated Why Do I Care? Feature Extractor Malicious Pages Terms Extractor Exploit Site EvilSeed http://www.easymoney.com http://cheapfarma.ru http://rateyourcar.com http://nudecelebrities.it C&C Site Crawler Prophiler Public Portal Possibly Malicious Pages Honeyclient Honeyclient Honeyclient Malicious Pages Benign Pages Cloud Benign Pages Anubis Wepawet Threat Intel Block A Few Stats • ANUBIS – Number of unique IPs that submitted to Anubis: 433,290 – Number of files analyzed by Anubis: 59,199,463 (unique files: 45,730,419) – Registered users: 25,404 • WEPAWET – Number of unique IPs that submitted to Wepawet: 141,463 – Number of pages visited and analyzed by Wepawet: 67,424,459 – Number of malicious pages identified as malicious: 2,239,335 An Example: Detecting Split Personalities • Detect when a malware sample exhibits multiple personalities • Signaturebased techniques are impractical • Behavioral based techniques seem more promising... – Different behaviors are reliable indicators for split personalities The Idea • Definition: Two systems are executionequivalent if all programs start with the same initial state, and receive exactly the same inputs – “Initial state” means same OS components, memory and registers are initialized with the same values – “Same inputs” means the access to disk, network, registry, time, and IPC returns the same value • Hypothesis: When a program is executed in two executionequivalent systems, it should exhibit the same behavior – “Same behavior” is output and sequence of system calls Split Personalities • A program that has different behavior on two execution-equivalent systems implies that: – Some instruction yielded some observable effects – The program used (intentionally or not) these effect to follow a different execution path – This is likely the consequence of an attack based on CPU semantics or timing • The hard part is providing exactly the same inputs… – Efficient Detection of Split Personalities in Malware • Davide Balzarotti, Marco Cova, Christoph Karlberger, Christopher Kruegel, Engin Kirda, Giovanni Vign in Proceedings of the Network and Distributed System Security Symposium (NDSS), San Diego, CA, February 2010. The Approach: Log and Replay Reference System Analysis System Windows Windows Log Driver Replay Driver syscall log (malware) sample (malware) sample Split personlaity Some Caveats • Not everything can be replayed – Some operations have results that must be consistent with the internal state of the operating system • Memory allocation – Some operations use handles the were created by passthrough system calls • The definition of “same behavior” needs to be relaxed to tolerate small, temporary deviations Results An Example: Wepawet and Revolver • State-of-the-art in honeyclients – High-interaction honeyclients visit web pages and record modifications to the underlying system (file system, registry, processes) – Unexpected changes are attributed to attacks • Limitations – Defenders need to know in advance the components that will be targeted by attacks – Configuration can be complex and incomplete • Some of the vulnerable components are incompatible with each other – Limited explanatory power Wepawet • Characterizes the behavior of the browser as it visits web pages – Monitors events that occur during visit – Characterizes properties of these events with features – Uses statistical models to determine if feature values are normal or anomalous • In the training phase, learns the characteristics of benign pages • In the detection phase, flags as suspicious pages that result in anomalous behavior – Detection and Analysis of Drive-by-Download Attacks and Malicious JavaScript Code Marco Cova, Christopher Kruegel, Giovanni Vigna in Proceedings of the World Wide Web Conference (WWW), Raleigh, NC, April 2010 Wepawet Features • Exploit preparation – Number of bytes allocated (heap spraying) – Number of likely shellcode strings • Exploit attempt – Number of instantiated plugins and ActiveX controls – Values of attributes and parameters in method calls – Sequences of method calls • Redirections and cloaking – Number and target of redirections – Browser personality- and history-based differences • Obfuscation – String definitions/uses – Number of dynamic code executions – Length of dynamicallyexecuted code Wepawet Extensions • PDF analyzer – Analyzes the JavaScript within PDF documents • Flash component analyzer – Uses execution tracing to identify both malicious behavior and other network endpoints • Java Applet analyzer – Uses execution tracing to identify known exploits • Shellcode analyzer – Uses emulation to extract URLs pointing to additional malware 0-day Detection • • • • “Aurora” attack 0-day exploit against IE6 Use-after-free vulnerability Successfully compromised Google and other companies • Posted to Wepawet before having been made public • Soon after incorporated into Metasploit Practical Impact • Routinely used for takedown requests and further analysis • Used to generate blacklist of malicious sites Impact on Attackers 40 Revolver: Detecting Evasions in Web-based Malware • Providing an oracle available to the public has drawbacks – Malware can be tested before deployment • Exploitation of discrepancies leads to failed detection – Revolver: An Automated Approach to the Detection of Evasive Web-based Malware A. Kapravelos, Y. Shoshitaishvili, M. Cova, C. Kruegel, G. Vigna in Proceedings of the USENIX Security Symposium Washington, D.C. August 2013 Evasion: Scope Handling function foo() { ... //W6Kh6V5E4 is filled with non-alphanumeric data Bm2v5BSJE=""; W6Kh6V5E4 = W6Kh6V5E4.replace(/\W/g,Bm2v5BSJE); ... // W6Kh6V5E4 now contains valid JavaScript } function foo(){ ... var enryA = mxNEN+F7B07; F7B07 = eval; {} enryA = F7B07('enryA.rep' + 'lace(/\\W/g,CxFHg)'); ... } Evasion: Interpreter Idioms OlhG='evil_code' wTGB4=eval wTGB4(OlhG) OlhG='evil_code' wTGB4="this"["eval"] // Only works in Adobe’s JS wTGB4(OlhG) Evasion: Exception Paths function deobfuscate(){ ... // Define variable xorkey // and compute its value for(...) { ... // XOR decryption with xorkey } eval(deobfuscated_string); } try { eval('deobfuscate();') } catch (e){ alert('err'); } function deobfuscate(){ try { ... // is variable xorkey defined? } catch(e){ xorkey=0; } ... // Compute value of xorkey VhplKO8 += 1; // throws exception first time for(...) { ... // XOR decryption with xorkey} eval(deobfuscated_string); } try { eval('deobfuscate();') } // 1st call catch (e){ // Variable VhplKO8 is not defined try { VhplKO8 = 0; // define variable eval('deobfuscate();'); // 2nd call } catch (e){ alert('err'); } } Evasion: Liberal Configuration var nop="%uyt9yt2yt9yt2"; var nop=(nop.replace(/yt/g,"")); var sc0="%ud5db%uc9c9%u87cd..."; var sc1="%"+"yutianu"+"ByutianD"+ ...; var sc1=(sc1.replace(/yutian/g,"")); var sc2="%"+"u"+"54"+"FF"+ "%u"+"BE"+...+"A"+"8"+"E"+"E"; var sc2=(sc2.replace(/yutian/g,"")); var sc=unescape(nop+sc0+sc1+sc2); try { new ActiveXObject("yutian"); } catch (e) { var nop="%uyt9yt2yt9yt2"; var nop=(nop.replace(/yt/g,"")); var sc0="%ud5db%uc9c9%u87cd..."; var sc1="%"+"yutianu"+"ByutianD"+ ...; var sc1=(sc1.replace(/yutian/g,"")); var sc2="%"+"u"+"54"+"FF"+ "%u"+"BE"+...+"A"+"8"+"E"+"E"; var sc2=(sc2.replace(/yutian/g,"")); var sc=unescape(nop+sc0+sc1+sc2); } Detecting Evasion: Challenges • • • • Code is obfuscated Code is generated on-the-fly Code might probe for arcane versions of a browser Not all code changes are relevant Revolver Pages Web ASTs Candidate pairs Oracle IF … VAR <= NUM IF … VAR <= NUM Similarity computation … {bi, mj} … Malicious evolution Data-dependency JavaScript infections Evasions Optimizations • The comparison step requires determining the edit distance between n benign scripts and m malicious scripts (which is usually infeasible) • We eliminate duplicate ASTs • We compute sequence summaries, which are vectors with the frequencies of the possible 88 operations • We extract the k nearest neighbors sequence summaries and we apply the similarity over the associated ASTs Classification • Data-dependency: categorizes script differences that are associated with transforming data into code – Same packers usually produce different code: if generating code is same and generated code is very different, do not flag as evasion • Injection: categorizes script differences that are due to addition of code to a previously-benign script – Site gets compromised and attacker adds code to well-known JavaScript libraries (e.g., jQuery) • Evasion: categorizes script differences that are mostly composed of control-flow nodes added to the previouslymalicious script – Control-flow decisions are made to avoid executing the malicious functionality Evaluation: Evasion • Collected 6,468,623 pages, of which 265,692 malicious • Extracted 20,732,766 benign scripts, and 186,032 malicious scripts • Derived 705,472 unique ASTs and 55,701 malicious ASTs • For each benign AST, found ~70 malicious neighbors • Computed 208K candidate pairs – – – – 6,996 Injections (701 classes) 101,039 Data dependencies (475 classes) 4,147 Evasions (155 classes) 2, 490 Evolutions (273 classes) Limitations • If we only see the evasive version of the code, we cannot detect it (and identify the evasion) • This approach can only operate on client-side evasion • If an evasion is performed before upacking/eval-ing of code, similarity to other malicious code cannot be computed – However, the attacker has to “expose” their evasion technique, instead of hiding it in the malicious code http://revolver.cs.ucsb.edu • Revolver is a service accessible to the public – You need to be vetted to access the service • We would like to make the evasion of the anti-evasion system harder • Please sign up and let us know what you think! http://revolver.cs.ucsb.edu Conclusions • Malicious code is in continuous evolution • Evasion of dynamic analysis-based detection has become prevalent – Humans cannot keep up • Next steps in the arms race: – Automatic detection of evasion attempts in binaries • Possibly without re-execution – Automatic detection of evasion attempts in web-malware • See revolver.cs.ucsb.edu – Automated evasion remediation Questions? EvilSeed • Challenge: Find the needle in the haystack • Approach: Search the web in a smart way • The goal of EvilSeed is to generate a URL input stream with “high toxicity” • EvilSeed starts with a set of malicious web pages and uses “gadgets” to find likely additional malicious web pages – – – – – Links gadget Content dork gadget Popular terms gadget SEO gadget DNS queries gadget • Some level of random crawling is still necessary to find completely new malicious web pages Prophiler • Quick identification of possible drive-by-download web pages – Each web page is deemed benign or possibly malicious – Detection models derived through supervised machinelearning • System as filter between a crawler and a more costly (and more precise) dynamic analysis system – The filter can allow high FP rates, as they are later discarded by the dynamic analysis system Learning Approach • 77 static features are extracted from each URL and web page – HTML (19): web page content – JavaScript (25): web page code – URL and host-based (33): URL and URLs included in the content, taking into account host characteristics (WHOIS, DNS) • Supervised machine learning – Learning: the system is fed with a labeled dataset • Both known malicious and benign samples – A model is generated by the system – 10-fold cross validation is used to evaluate the effectiveness of each model – The models can then be used for detection Anubis and Wepawet • Web pages and binary components need to be analyzed – To identify their nature (malicious, benign) – To identify their relationships with other components (e.g., C&C sites, distribution sites, malware components) • Anubis: Binary program analyzer – Available at http://anubis.cs.ucsb.edu • Wepawet: Web page analyzer – Available at http://wepawet.cs.ucsb.edu