Polymorphic Malware Detection Connor Schnaith, Taiyo Sogawa 9 April 2012 Motivation • “5000 new malware samples per day” • --David Perry of Trend Micro • Large variance between attacks • Polymorphic attacks • Perform the same function • Altered immediate values or addressing • Added extraneous instructions • Current detection methods insufficient • Signature-based matching not accurate • Behavioral-based detection requires human analysis and engineering Malware Families •Classified into related clusters (families) •Tracking of development •Correlating information •Identifying new variants •Based on similarity of code •Koobface •Bredolab •PoisonIvy •Conficker (7 mil. Infected) Source: Carrera, Ero, and Peter Silberman. "State of Malware: Family Ties." Media.blackhat.com. 2010. Web. 7 Apr. 2012. <https://media.blackhat.com/bh-eu-10/presentations/Carrera_Silberman/BlackHat-EU-2010-Carrera-Silberman-State-of-Malwareslides.pdf>. ~300 samples of malware with 60% similarity threshold Current Research • Techniques for identifying malicious behavior • Mining and clustering • Building behavior trees • Industry • ThreatFire and Sana Security developing behavioral-based malware detection Design challenges • Discerning malicious portions of code o o Dynamic program slicing accounting for control flow dependencies • Reliable automation o o Must be able to be reliable w/o human intervention Minimal false positives Holmes: Main Ideas • Two major tasks o Mining significant behaviors from a set of o • samples Synthesizing an optimally discriminative specification from multiple sets of samples Key distinction in approach o "positive" set - malicious o "negative" set - benign o Malware: fully described in the positive set, while not fully described in the negative set Main Ideas: behavior mining • • • Extracts portions of the dependence graphs of programs from the positive set that correspond to behaviors that are significant to the programs’ intent. The algorithm determines what behaviors are significant (next slide) Can be thought of as contrasting the graphs of positive programs against the graphs of negative programs, and extracting the subgraphs that provide the best contrast. Main ideas: behavior mining • • A "behavior" is a data dependence graph G = (V, E, a, B) o V is the set of vertices that correspond to operations (system calls) o E is the edges of the graph and correspond to dependencies between operations o a is the labeling function that associates nodes with the operations they represent o B is the labeling function that associates the edges with the logic that represents the dependencies Main ideas: behavior mining • • A program P exhibits a behavior G if it can produce an execution trace T with the following properties o Every operation in the behavior corresponds to an operation invocation and its arguments satisfy certain logical constraints o the logic formula on edges connecting behavior operations is satisfied by a corresponding pair of operation invocations in the trace Must capture information flow in dependence graphs o two key characteristics the path taken by the data in the program security labels assigned to the data source and the data sink Security Label Description NameOfSelf The name of the currently executing program IsRegistryKeyForBootLis t A Windows registry key lsiting software set to start on boot IsRegistryKeyForWindows A registry key that contains configuration settings for the operating system IsSystemDirectory The Windows system directory IsRegistryKeyForBugfix The Windows registry key containing list of installed bugfixes and patches IsRegistryKeyForWindows Shell The Windows registry key controlling the shell IsDevice A named kernel device IsExecutableFile Executable file Main ideas: behavior mining • • • Information gain is used to determine if a behavior is significant. A behavior that is not significant is ignored when constructing the dependency graph Information gain is defined in terms of Shannon entropy and it means gaining additional information to increase the accuracy of determining if a G is in G+ or G- Shannon entropy o H(G+ U G-) corresponds to the uncertainty that a graph G belongs to G+ or Go partition G+ and G- into smaller subsets to decrease that uncertainty o process called subgraph isomorphism Main ideas: behavior mining • A significant behavior g is a subgraph of a dependence graph in in G+ such that: Gain(G+ U G- , g) is maximized • • Information gain is used as the quality measure to guide the behavior mining process Some non-significant actions can get passed as significant o these actions may or may not throw off the algorithm that determines if the program is malicious Main ideas: behavior mining • • Significant behaviors mined from malware Ldpinch o Leaking bugfix information over the network o Adding a new entry to the system autostart list o Bypassing firewall to allow for malicious traffic Could say any program that exhibits all three of these behaviors should be flagged malicious o This is too specific of a statement i. Doesn't account for variations within a family ii. It is known that smaller subsets of behaviors that only include one of these actions could still be malicious iii. Need discriminative specifications Main ideas: discriminative specifications • Creates clusters of behaviors that can be classified into as characteristic subset o Program matches specification if it matches all of the behaviors in a subset o "Discriminative" in that it matches the malicious but not the benign programs Main ideas: discriminative specifications • Each set of subset of behaviors induces a cluster of samples o Malicious and benign samples are mined are organized into these clusters o Goal: find an optimal clustering technique to organize the malicious into the positive subset and the benign into negative subset Main ideas: discriminative specifications • • Three part algorithm o Formal concept analysis o Simulated annealing o Constructing optimal specifications Formal concept analysis o O is a cluster of samples o A is the set of mined behaviors in O o A concept is the pair (A, O) Set of concepts: {c1, c2, c3 , ... , cN) Behavior specification: S(c1, c2, c3, ... , cN) Main ideas: discriminative specifications Formal Concept Analysis (continued) Begins by constructing all concepts and computes pairwise intersection of the intent sets of these concepts • Repeated until a fixpoint is reached and no new concepts can be constructed • When algorithm terminates, left with an explicit listing of all of the sample clusters that can be specified in terms of one or more mined behaviors • Goal is to find {c1, c2, c3, ... , cN} such that S(c1, c2, c3, ... , cN) is optimal (based on threshold) • Main ideas: discriminative specifications Simulated annealing • • • • Probabilistic technique for finding approximate solution to global optimization problem At each step, a candidate solution i is examined and one of its neighbors j is selected for comparison The algorithm moves to j with some probability A cooling parameter T is reduced throughout process and when it gets to a minimum the process stops Main ideas: discriminative specifications Constructing Optimal Specifications • • Threshold t, a set containing positive and negative samples, and a set of behaviors mined with the previous process Called SpecSynth o Constructs full set of concepts o Removes redundant concepts o Run simulated annealing until convergence, then return the best solution Holmes: Mining an Clustering Evaluation and Results: Holmes • Used six malware families to develop specifications • Tested final product against 19 malware families • Collected 912 malware samples and 49 benign Holmes Continued • Experiments carried over varying threshold values (t) • Demonstrates high sensitivity to system accuracy • Perhaps only efficient for a specific subset of malware Holmes Scalability • Worst-case complexity is exponential • Behaviors of repeated executions (Stration and Delf) took 12-48 hours to analyze • Scalability for Holmes is a nightmare! “scary and scaled” USENIX • The Advanced Computing Systems Association • (Unix Users Group) • 2009 article: automatic behavior matching o o o o Behavior graphs (slices) Tracking data and control dependencies Matching functions Performance evaluations Source: Kolbitsch, Clemens. "Effective and Efficient Malware Detection at the End Host." Usenix Security Symposium (2009). Web. 8 Apr. 2012. <http://www.iseclab.org/papers/usenix_sec09_slicing.pdf>. USENIX: Producing Behavior Graphs • Instruction log o o Trace instruction dependencies Slicing doesn't reflect stack manipulation • Memory log o Access memory locations Partial behavior graph of Netsky (Kolbitsch et al) USENIX: Behavior Slices to Functions • Use instruction and memory log to determine input arguments • Identify repeated instructions as loops • Include memory read functions • We can now compare to known malware Evaluation Six families used for development (mostly mass-mailing worm) Expanded test set Performance Evaluation • • Installed Internet Explorer, Firefox, Thunderbird, Putty, and Notepad on Windows XP test machine Single-core, 1.8 GHz, 1GB RAM, Pentium 4 processor USENIX Limitations • Evading system emulator o o o o USENIX detector uses Qemu emulator delays time-triggered behavior command and control mechanisms • Modifying algorithms behavior o A more fundamental change, but cannot be detected using same signatures • End-host based system o Cannot track network activity Questions/Discussion