Rozzle De-Cloaking Internet Malware Presenter: Yinzhi Cao Slides by Ben Livshits with Clemens Kolbitsch, Ben Zorn, Christian Seifert, Paul Rebriy Microsoft Research Static – Dynamic Analysis Spectrum + High precision - High overhead - Low coverage precision + High DART, SAGE, KLEE + High scalability + High precision + Scales reasonably well + High coverage + High coverage ? High - Low precision - Watch out coverage for resource usage - May not scale Entirely static Symbolic execution Multi-execution Entirely runtime 2 Blacklisting Malware in Search Results 3 Motivation Haha, I cannot belive this guy actually does this!! LOL 4 5 Drive-by Malware Detection Landscape offline (honey-monkey) Nozzle [Usenix Security ’09] online (browser-based) • Instrumented browser • Looks for heap sprays • Moderately high overhead runtime static Zozzle [Usenix Security ’11] • Mostly static detection • Low overhead, high reach • Can be deployed in browser 6 Search Engine Crawling 7 Malware Cloaking <script> if (navigator.userAgent.indexOf(‘IE 6’)>=0) { var x=unescape(‘%u4149%u1982%u90 […]’); eval(x); } </script> Server side Client detect vulnerable detect crawler target • • •• Fingerprint browser Source & pluginIPversions Request (UserDo this using Agent, Browser ID) JavaScript 8 Client-side Cloaking Defense Traditional Rozzle • Single browser, one visit • Appear as vulnerable as possible 9 Background and Related Work • Drive-by Download – Exploit a client-side browser vulnerability and thus trigger a malware downloaded into client – It can be divided into Six steps. • Description of Six Steps • Comparing with JShield 10 Step One: visiting malicious web site • On this stage, a benign user is visiting a malicious web site. • Defense mechanism: if this web site is detected as malicious, a warning can be shown to prevent the benign user from visiting the web site (like Google Safe Browsing). 11 Step Two: Executing JavaScript • After downloading contents to the client, JavaScript get executed. • Related Work: Current works are trying to fully execute JavaScript to get a better detection rate. – Zozzle ([Usenix, 2011]): Executing JavaScript – Wepawet: Executing JavaScript and extracting JavaScript from pdf – Rozzle ([Oakland, 2012]): Symbolic execution + fingerprinting JavaScript 12 Step Three: Heap Spraying • JavaScript will fill the heap with shellcode. • Defense Mechanism: – Zozzle ([Usenix, 2011]): Machine learning – Nozzle ([Usenix, 2009]): Detecting if the heap is executable or not. 13 Step Four: Exploit a certain vulnerability • JavaScript code will use certain pattern to trigger a native browser vulnerability. • Defense mechanism: – BrowserShield: Rewrite JavaScript and check if an operation will trigger the vulnerability or not. For example, the length of an operation. 14 Step Five: Downloading malware • After exploiting native browser vulnerability, malware is downloaded into client. • Defense Mechanism: – Blade ([CCS, 2010]): Detect the GUI of downloading. If it is not from a normal GUI, reject the downloaded file. 15 Step Six: Executing malware • After malware is downloaded, it will get executed. • Related Work: – SpyProxy ([Usenix, 07]) – WebShield ([NDSS, 11]) – Provos et al. ([Usenix, 08]) 16 Overview • Background & Motivation: Cloaking • Detecting Internet Malware • Rozzle: Fighting Evasion • Experiments Detecting Internet Malware Nozzle: A Defense Against Heap-spraying Zozzle: CodeLow-overhead Mostly Static JavaScript Injection Attacks Malware Detection [Usenix Security 2011] • Bayesian classification of hierarchical features of the Scan heap allocated objects to identify validJavaScript x86 codeabstract syntax tree. In the browser (after sequences unpacking) [Usenix Security 2009] • Dynamic Detection Static Detection Nozzle Zozzle 18 Nozzle: Runtime Heap Spraying Detection Normalized attack surface (NAS) good bad 19 Zozzle: Static/Statistical Detection // Shellcode var shellcode shellcode=unescape(‘%u9090%u9090%u9090%u9090%uceba%u11fa%u291f%ub1c9%udb33 unescape […]′); bigblock=unescape(“%u0D0D%u0D0D”); bigblock unescape %u0D0D%u0D0D headersize=20;shellcodesize=headersize+shellcode.length; shellcodesize shellcode.length while(bigblock.length<shellcodesize){bigblock+=bigblock;} heapshell=bigblock.substring(0,shellcodesize); bigblock.substring shellcodesize nopsled=bigblock.substring(0,bigblock.length-shellcodesize); nopsled bigblock.substring bigblock.length shellcodesize while(nopsled.length+shellcodesize<0×25000){nopsled=nopsled+nopsled+heapshell} nopsled.length shellcodesize nopsled nopsled nopsled heapshell // Spray var spray spray=new Array(); spray nopsled shellcode for(i=0;i<500;i++){spray[i]=nopsled+shellcode;} // Trigger function trigger(){ var varbdy = document.createElement(‘body’); varbdy.addBehavior #default#userData varbdy.addBehavior(‘#default#userData’); document.appendChild varbdy document.appendChild(varbdy); try { for (iter=0; iter<10; iter++) { varbdy.setAttribute varbdy.setAttribute(‘s’,window); } } catch(e){ } window.status+=”; } butid document.getElementById(‘butid’).onclick(); 20 Overview • Background & Motivation: Cloaking • Detecting Internet Malware • Rozzle: Fighting Evasion • Experiments Environment Fingerprinting Prevents Detection Nozzle Zozzle <script> • In 7.7% of JS files, code gets a var adobe=new ActiveXObject(‘AcroPDF.PDF’); <script> var (‘$version’); reference to environment if adobeVersion=adobe.GetVariable (navigator.userAgent.indexOf(‘IE 6’)>=0) Is this a practical problem for if (navigator.userAgent.indexOf(‘IE 6’)>=0 && { • ==In’9.1.3’) 1.2%, code branches on our malware detectors? adobeVersion var x=unescape(‘%u4149%u1982%u90 […]’); { eval(x); such sensitive values […]’); }var x=unescape(‘%u4149%u1982%u90 • 89.5% of malicious JS branches eval(x); </script> } on such values </script> 22 Typical Malware Cloaking 23 More Complex Fingerprinting Fingerprint: Q0193807F127J14 24 Avoiding Dynamic Crawlers 25 Avoiding Static Detection 26 How to Allocate Detection Resources? Rozzle 1.4 1.5 2.0 9.0 9.1 10.0 8 9 10 … … How many resources Clearly does not scale should allocated to is What ifbe the site simply filter not malicious sites? malicious? 27 Rozzle Multi-path execution framework for JavaScript What it is/does What it is not • Multiple browser profiles on single machine • • Branch on environmentsensitive checks No forking No snapshotting • Symbolic execution: reverting to a previous state similar to running multiple browsers in parallel • • • Execute individual branches sequentially to increase coverage Cluster of machines: too resource consuming • Static analysis: Retain much of runtime precision 28 Multi-Execution in Rozzle <script> var adobe=new ActiveXObject(‘AcroPDF.PDF’); var adobeVersion=adobe.GetVariable (‘$version’); if (navigator.userAgent.indexOf(‘IE 7’)>=0 && adobeVersion == ’9.1.3’) { var x=unescape(‘%u4149%u1982%u90 […]’); eval(x); } else if (adobeVersion == ’8.0.1’) { var x=unescape(‘%u4073%u8279%u77 […]’); eval(x); } … </script> 29 Challenges Consistent updates of variables Introduce concept of Symbolic Memory: • Multiple concrete values associated with one variable • New JavaScript data type Symbolic • 3 subtypes • symbolic value / formula / conditional • Weak updates for conditional assignments 30 Symbolic Memory Variable : userAgentString Value :0 < navigator.userAgent > Symbolic : no yes <script> var userAgentString=0; userAgentString = navigator.userAgent; var isIE; isIE = (userAgentString.indexOf(‘IE’)>=0); … Hooks into engine, return symbolic values for • Sensitive global objects: navigator.userAgent, Variable : isIE navigator.platform, … Value : < navigator.userAgent. indexOf(‘IE’) >= 0 > • Sensitive functions: ScriptEngine(), allocation of Symbolic : yes ActiveXObject, … 31 Symbolic Memory Variable : isIE Variable : isIE Value : < nav.userAgent.indexOf(…)>=0 Value : false > ? true : false Symbolic : yes Symbolic : no Variable : isIE7 Value false <…> : Symbolic : no yes <script> var isIE=false; var isIE7=false; if (navigator.userAgent.indexOf(‘IE’)>=0) { isIE=true; if (navigator.userAgent.indexOf(‘IE 7’)>=0) { isIE7=true; Current Current path path predicate predicate } Value :: << nav.userAgent.indexOf(..)>=0 Value nav.userAgent.indexOf(..)>=0 >> && } Symbolic : yes < nav.userAgent.indexOf(..)>=0 > if (isIE7) Symbolic : yes { … 32 Symbolic Memory Variable : isIE Value : < nav.userAgent.indexOf(…)>=0 > ? true : false Symbolic : yes ? >= true false index Of 0 navigator. userAgent ‘IE’ 33 Challenges • try-blocks regularly used to test availability • Handling symbolic values when they are… of plugins (ActiveXObjects) — … written to the DOM • catch-blocks default values, cannot be Consistent updates — …set sent to a remote server ignored — … executed (as part of eval) • Execute -statementtosimilar to else • catch Lazy evaluation concrete values (only when branch, add virtual if-condition: “ ActiveX needed) supported” of variables Handling loops • Loop condition might be symbolic, number of iterations unknown! • • Indirect control Unroll k iterations (currently k=1) flow: Exception I/O Instruction pointer checks (endless handling loops/recursion) 34 Experiments Offline • Controlled Experiment • 7x more Nozzle detections Online • Similar to Bing crawling • Almost 4x more Nozzle detections • 10.1% more Zozzle detections Overhead • 1.1% runtime overhead • 1.4% memory overhead 35 Offline • Exploits hosted on our server • Minimize external influences • 70,000 known malicious scripts (flagged by Zozzle) • Fully unrolled/de-obfuscated exploits, wrapped in HTML Shared New Detections Errors 10,381 +595% runtime detections -2,000 0 2,000 4,000 6,000 8,000 10,000 12,000 36 Online • Dedicated machine for crawling the web • Clone of the Bing malware crawler • List of URLs recently crawled by Bing • Pre-filtering: Increase likelihood of finding malicious sites • 57,000 URLs over the last week Nozzle Detections Zozzle Detections 225 24 156 50 174 2,510 +203% runtime detections 37 Overhead • Average numbers of 3 repeated runs per configuration • Base runs (cookie setup) • 500 randomly selected URLs crawled by Bing • Slightly biased towards malicious sites (pre-filtering) Runtime Overhead Memory Overhead Median: 0.0% Median: 0.6% 80th Percentile: 1.1% 80th Percentile: 1.4% 39 1.0 40 2.140 2.104 2.068 1 2.032 1.996 1 1.960 1.924 2 1.888 1.852 121 1.816 1.780 1.744 1 1.708 1 1.672 1.636 1.600 1 1.564 1.528 2 1.492 1.456 1122 1.420 1.384 1 1.348 11 1.312 1.276 223321 1.240 1.204 54 1.168 7 610 1.132 5 1.096 12 1.060 30 1.024 3 112 0.988 6 0.952 0.916 32 0.880 0.844 22 0.808 0.772 0.736 0.700 Overhead Numbers 100 90 88 80 70 70 60 50 40 25 20 13 1 Take Away For most sites, virtually no overhead Tremendous impact on runtime detector due to increased path coverage Visible impact on static detector More important with growing trend to obfuscation Also improves other existing tools: Exposes detectors to additional site content 41 Online "\x6D"+"\x73\x69\x65 "+"\x20\x36" … an example pulled from = our DB… "msie 6" if (navigator.userAgent.toLowerCase().indexOf( "\x6D"+"\x73\x69\x65"+"\x20\x36")>0) document.write("<iframe src=x6.htm></iframe>"); if (navigator.userAgent.toLowerCase().indexOf( "\x6D"+"\x73"+"\x69"+"\x65"+"\x20"+"\x37")>0) document.write("<iframe src=x7.htm></iframe>"); "\x6D"+"\x73"+"\x69"+"\ try { var a; var aa=new ActiveXObject("Sh"+"ockw"+"av"+"e"+"Fl"+[…]); x65"+"\x20"+"\x37" } catch(a) { } finally { = if (a!="[object Error]") "msie 7" document.write("<iframe src=svfl9.htm></iframe>"); } try { var c; var f=new ActiveXObject("O"+"\x57\x43"+"\x31\x30\x2E\x53"+[…]); } catch(c) { } finally { "O"+"\x57\x43"+"\x31\x30\x2E\x5 if (c!="[object Error]") { aacc = "<iframe src=of.htm></iframe>"; 3"+"pr"+"ea"+"ds"+"he"+"et" setTimeout("document.write(aacc)", 3500); = } } "OWC10.Spreadsheet" 42 Summary • Rozzle: Multi-profile execution – Look as vulnerable as possible – Improve existing malware detectors • Implementation: – Implemented on top of IE9’s JavaScript engine – Still some flaws, promising results • Idea of multi-execution is promising in other contexts 45 Static – Dynamic Analysis Spectrum + High precision - High overhead - Low coverage DART, SAGE + High precision + High precision + High scalability + Scales reasonably well + High coverage + High coverage ? Highout coverage - Low precision - Watch for resource usage - May not scale Entirely static Symbolic execution Multi-execution Entirely runtime 46