Code Obfuscation Yaakob Iyun Guided by Eitan Koch 2009-2010 Table of Contents 1. Preface 2. Literature Survey Summary 3. Opaque Predicate a. Definition of Opaque Predicate b. Manufacturing of Opaque Predicate c. Randomness d. Advantages and Disadvantages 4. Obfuscator Application features 5. Implementation Details 6. Obfuscation Quality and Measurements 7. Obfuscation Example 1) Preface Defending intellectual property of source code , has become more and more difficult. Most of the used programmed langue (Like JAVA , C# ect) today use an easy to reverse engineering byte code that compile later to a machine code , depending on the machine that the application is will be run on. One obvious method to protect the intellectual property from reverse engineering is to encrypt the byte code, so only the client that has the decryption key and code could decrypt the application and run it. The problem with encryption is the need of password and key exchange, it requires more time and resource for decryption. So other method was proposed – Code Obfuscation. The purpose of an obfuscation process on a given code, is to make it unreadable , confusing and misinform code. The obfuscation as an idea has a lot of obfuscation techniques , which will be shown on the Literature Summary section. Note that obfuscating a given code , doesn't grantee us that an attacker will not succeed in reverse engineer our code , but the more obfuscated our code , the more difficult for the attacker to reverse engineer it. The project purpose is to a create an application that for a given C++ source code , generates an obfuscated one that has the exact same functionality as the original one. 2) Literature Survey Summary What Is Code Obfuscation Code Obfuscation a transformation from original code P to P' that holds: 1. Original code P and obfuscated code P' have the same functionality. 2. Obfuscated code P' is much more difficult to understand by human eyes or for automatic deobfuscator , then the Original code P. Code Obfuscation Measurement Potency: how much obscurity is being added to the program. Cost: how mush computation and memory overhead is being added to the original program. Stealth: how much added obscure code resembles the original code Resilience: how difficult is for an automated obfuscator to deobfuscate the code. Code Obfuscation Operations Obfuscating inheritance data Modifying inheritance relations – one can artificially longer the inheritance relations , by adding unnecessary classes , merge 2 class that don't have any relationship or split a class to several class , while maintaining the functionality. By Modifying inheritance relations, we can increase the program complexity. Obfuscating Procedural Abstractions Inline Methods - inline functions : remove the function abstraction and simply insert the function commands into the place where the function is being called. Clone Methods – Obfuscator can clone method to several methods that behave the same, but are called differently and by so, confuse the deobfuscator that will deduce that their operation is different. Obfuscating built-in data types Convert static to procedural data - Static data, like const integers and strings , appear in large amount in any program, Obfuscator and add method to make these const objects in runtime , so the obfuscator will not be able to deduce their values by static analysis. Split or Merge scalar variables – for a limited range data type, the obfuscator can make new merged data types or new splited data types to indicate a different data type use for the deobfuscator. Obfuscating Meaningful Strings Many Meaningful Strings are being used in an application, from variable names on to method names and classes names. Obfuscator can change those names to other less Meaningful names. For example the function name "GetDatabasName()" can be changed to "__()". Obfuscating Control-flow Transformations The main problem of all reverse engineering programs is to deduce the Control-flow of a program. By inserting more predicates and changing and complicating if and while statements the obfuscator can increase the workload and computation to detect the Control-flow Transformations. One efficient way to obfuscate code , is to insert opaque predicates variable and constructors into the code. Those opaque predicates are known to the obfuscator and unknown to the deobfuscator and so the decision of the opaque predicates result are hard to deduce. For example, we can split set of command, S, to: S1 – P – S2 – S3, where P is opaque predicate, that always evaluate to true , and continue the program at S3 , based on the application invariant. Because the deobfuscator doesn't know any of the application invariants, it can deduce the P at run time , and can deduce that S2 is a part of the program when it is actually a misleading peace of code. Christian Collberg, Clark Thomborson, and Douglas Low : Manufacturing cheap, resilient, and stealthy opaque constructs In their article, Christian Collberg, Clark Thomborson, and Douglas Low , emphases the need for code obfuscation and design a java code object code obfuscator. Basic obfuscator operation: 1. Getting input parameters: Java object code, required level of obfuscation (potency) , maximum time/space cost penalty (cost) and java profiling data. 2. Understanding the program, through making Symbol table, Inheritance graph and control flow graphs that represent the application. 3. Continues Iterations , where the obfuscator inserts code transformation until the potency or cost level is achieved. 4. Outputting a new obfuscated Object code that has the same functionality as the original code. Obfuscation code transformation: An Obfuscation code transformation is a transformation to the code , T(Program)=Obfuscated Program , that holds the following: If the Program fails to terminate or terminated with an error code then the Obfuscated Program may not terminate. Otherwise the Obfuscated Program must terminate and produce the same result as the Program. One of the most used code transforms are the Control transforms that use the opaque predicates. An opaque predicate is known to the obfuscator and unknown to the deobfuscator and so the decision of the opaque predicates result are hard to deduce. With opaque predicates we can get the following methods and much more: 1. Insert Dead or Irrelevant code: Consider basic block S=s1,s2,s3,…,sn. We can split the block and insert dead or wrong code to it and a predicate that allows evaluate to false so the inserted code couldn't be reached. 2. Extended loop condition: We can add opaque that always evaluates to true to the loop condition by an AND operator and so it complexes the condition but doesn't harm the number of iteration and so the functionally stays the same. Manufacturing Opaque Constructors: As mentioned, opaque predicates result need to be easy for the obfuscator to deduce and hard for the deobfuscator to deduce. In addition opaque predicates should be stealthy (blend with the code) and not in high computational cost. One way to manufacture opaque predicate is to exploit the general difficulty of alias analysis problem and the current algorithms for detecting recursive data structures. The basic technique is: 1. Add an obfuscated code that builds set of dynamic structures. 2. Keep a set of pointers to those structures. 3. The inserted obfuscated code should update the dynamic structures and the pointes location to it. But to maintain certain invariant on the pointes and dynamic structures. (Such as – "there will be a path from p1 to p2 always"). 4. Construct an opaque predicate that depend on the invariants. The article defines An Graph ADT, with operations on it. The operations vary from adding node, deleting node merging two Graph, splitting a graph and much more. The graph code and the statements that change it should be embedded in the code , for example: The graph code should comply with these three conditions to more disguise itself: 1. The obfuscator should keep large library of the GRAPH variants, and use several variants in several parts of the code. 2. The added code (graph class and the statement that use it) should be deobfuscated to. 3. Including the Graph ADT in a user class to blend it with the original code. One useful improvement is relying on the thread concurrency capability of java language. Threads and their scheduling mechanism is hard to analyze, both because it is not a language depended mechanism , and because the actual scheduling of a thread is by asynchrony events (user interaction , network state and ect.). Drawbacks of opaque predicates from dynamic structures Code Size Increase: the code size is dramatically increase because of adding dead / irrelevant code, and dynamic graph code and the instructions that use it. The increased size can harm a web deployed applications and mobile application that require the application size to be small. Dynamic Data Handling Increase: from the adding of the dynamic type and performing operation like add and remove on it , the new obfuscated program requires a lot of use with dynamic allocation that causes time overhead increase and memory usage increase to hold those data structures. 3) Opaque Predicate In order to obfuscate our code we will insert if statements with conditions that will hold the opaque predicates. a. Definition of Opaque Predicate Opaque Predicate : an expression that evaluates to either "true" or "false“ , which its outcome is known to the obfuscator and is very difficult and even impossible for the deobfuscator to evaluate , without running the application itself. b. Manufacturing of Opaque Predicate In order to manufacture to condition of the IF statements , we will use an opaque predicate. First an obfuscated Graph class is being introduced to the code. We will Introduce a code that manipulates (build and changes) the a Graph type object. Then we build Opaque Predicate with user input names according to specific invariants that the graph structure holds (for example , that 2 pointers that will travel the graph object will always have a path between them). Then finally we will add randomness for the IF conditions together with the predicate. For example we can see in the following image , we split the original program (S1;S2,,,,SN;) and add predicate that will always evaluate to "aTrue". c. Randomness In Order for the compiler not to identify and remove a certain block that is a dead code that we never enter (because the predicate is always false) , I inserted a randomness mechanism. tune1: int tune2 = rand() % 2; if (melody->__volume(tune) || tune2) { tune->singer(); melody->singer();tune->singer(); tune = tune->__singer(); tune->singer(); melody->singer(); melody= melody->__singer(); tune->___tune(); melody->singer(); melody->singer(); goto tune1; melody->___tune(); } else { tune->singer(); melody->singer(); melody->singer(); melody->___tune(); } We can see that even if the programs enter the first if section it returns to the "tune1" label and until the value of "tune2" changes it remains in that loop and performs an misleading code (not a part of the original program). In this way the obfuscated program can enter the if section and performs it. The code in that section will not be considered a dead code to be removed by the compiler and by the naked eye of the attacker. d. Advantages and Disadvantages The obfuscation method of inserting Opaque predicates to a given code, has its Advantages and Disadvantages. Advantages: I. Increased number of predicates to deduce by the attacker more resources need to be used by the attacker. II. Inserting Dead and irrelevant code to the code in order to confuse the attacker. III. The Introduced Graph Class code and the code that manipulates the Graph , blends well with the source code because it uses user input names. IV. The predicate names are user defined so they resemble the given code. V. In Run Time the behavior is not deterministic , because of the randomness mechanism. Disadvantages: I. In Run Time The application needs more space resources for the graph that is being build , that is used for the opaque predicates conditions. II. The application includes more code lines to be executed (dead code, graph manipulation code) the program needs more time to be executed. This disadvantage can harm an application that uses its resources cerfually (like cellular applications). 4) Obfuscator Application features The referenced application is a C++ Obfuscator. Its purpose it to covert a given C++ code to an obfuscated C++, with the same functionality as the original code. The application is written in C++. User Interface: The Application is a command line application. The user needs to move all the files that he wants them to be obfuscated to the "Working Folder" specified in the configuration file. The user will be able to choose the Obfuscation methods he wants to be performed on his code (from the tree listed below). The Obfuscation methods are: Opaque -------> Transformations - Adding Obfuscating Control-flow Opaque Predicates and dead code Strings -------> Obfuscating Meaningful Strings Example: Obfuscate.exe config.txt Opaque Strings Obfuscation methods used by the application (explanation can be found on the Literature Survey section): 1. Obfuscating Control-flow Transformations - Adding Opaque Predicates and dead code – "Opaque" option 2. Obfuscating Meaningful Strings – "Strings" option Config file example: /*************************************************/ /* Configuration File For Koby's Obfuscation App */ /*************************************************/ /* Working Folder */ WorkingFolder=C:\Users\koby\Desktop\New folder /* Source Files */ SourceFileList=folder.cpp /**********************************/ /* Parameters for "Opaque" Option */ /**********************************/ /* Related Project Names */ RelatedProjectNames=melody,tune,volume,artist,singer /* Number Of Predicates To Add */ NumberOfPredicatesToAdd=2 /* Function Names To Insert The Predicates */ FunctionNamesToInsertThePredicates=Play_Song /***********************************/ /* Parameters for "Strings" Option */ /***********************************/ /* Function Names To Be Obuscated */ FunctionNamesToBeObuscated=Play_Song,Print_Folder_Songs /* Variable Names */ VariableNames=Folder,song_name,temp_song RelatedProjectNames can't be one of the following strings: goto , to , go , random. 5) Implementation Details The obfuscator is written in C++ code , and is complied with visual 2008 compiler. Source code details: The source code for the obfuscator includes the flowing files: 1. ControlFlowTransformations.cpp , ControlFlowTransformations.h: Defines and implements the "Opaque" option of the application. 2. MeaningfulStringsObfuscation.cpp , MeaningfulStringsObfuscation.h: Defines and implements the "Strings" option of the application. 3. FileTokenizer.cpp , FileTokenizer.h: Defines and implements a file handling class that gives the option of inserting tokens to afile , replacing token , ect. 4. Database.cpp , Database.h: Database class that parse the input config file supplied by the user , and holds the relevant parameters for the obfuscator to work. 5. main.cpp : This is the main module that initiate the application options according to the user input. 6. graphManipulationCode Folder: Folder that holds templates for building and manipulation Graph , and templates for the opaque predicates code. 7. GraphLibFolder - _include_.cpp _include_.h The used Graph class for holding the invariants which the predicates are based on. 6) Obfuscation Quality and Measurements In this project , to measure the obfuscation quality , I used the McCabe metric (Cyclomatic complexity) , whitch is a known metric that trys to measure code complexity. The cyclomatic complexity of a section of source code is the count of the number of linearly independent paths through the source code How to compute the “Cyclomatic complexity”: Build control flow graph of the program. Compute: E − N + 2P Where E = the number of edges of the graph N = the number of nodes of the graph P = the number of connected components An example: Cyclomatic complexity = 9 - 8 + 2 = 3 **note that Note that for a program without any IF statements or WHILE, FOR , CASE statements the Cyclomatic complexity is 1 , because E=0 , N=1 , P=1. 7) Obfuscation Example Config File: /*************************************************/ /* Configuration File For Koby's Obfuscation App */ /*************************************************/ /* Working Folder */ WorkingFolder=C:\Users\koby\Desktop\New folder /* Source Files */ SourceFileList=folder.cpp /**********************************/ /* Parameters for "Opaque" Option */ /**********************************/ /* Related Project Names */ RelatedProjectNames=melody,tune,volume,artist,singer /* Number Of Predicates To Add */ NumberOfPredicatesToAdd=2 /* Function Names To Insert The Predicates */ FunctionNamesToInsertThePredicates=Play_Song /***********************************/ /* Parameters for "Strings" Option */ /***********************************/ /* Function Names To Be Obuscated */ FunctionNamesToBeObuscated=Play_Song,Print_Folder_Songs /* Variable Names */ VariableNames=Folder,song_name,temp_song Original code: bool Folder::Print_Folder_Songs (const string& curr_artist_name) {Iterator<Song> itr =songs_list.GetIterator(); Song temp_song; if (curr_artist_name=="temp_not_given") {while (itr.HasNext()) {itr.Next().Print_Song_Name();}} else {while (itr.HasNext()) {temp_song=itr.Next(); if (temp_song.Get_Artist_Name()==curr_artist_name) {temp_song.Print_Song_Name();}}} return true;} bool Folder::Play_Song (const string& song_name) {Song curr_song; Iterator<Song> itr =songs_list.GetIterator(); bool found=false; while (itr.HasNext()&&(!found)) {curr_song=itr.Next(); if (curr_song.song_name==song_name) {curr_song.Play_Song(); found=true;}} return found;} // not checked Obfuscated Code – Opaque Option only: bool Folder::Print_Folder_Songs (const string& curr_artist_name) {Iterator<Song> itr =songs_list.GetIterator(); Song temp_song; if (curr_artist_name=="temp_not_given") {while (itr.HasNext()) {itr.Next().Print_Song_Name();}} else {while (itr.HasNext()) {temp_song=itr.Next(); if (temp_song.Get_Artist_Name()==curr_artist_name) {temp_song.Print_Song_Name();}}} return true;} bool Folder::Play_Song (const string& song_name) {Song curr_song; srand (time(NULL) ); artist* melody = new artist(); melody->singer(); melody= melody->__singer(); melody->singer(); melody= melody->__singer(); melody->singer(); melody->singer(); melody->___tune(); melody= melody->__singer(); melody->singer(); melody->singer(); melody->___tune(); melody= melody->__singer(); melody->singer(); melody->singer(); melody->___tune(); artist* tune; tune=melody->__artist(); tune->singer(); tune = tune->__singer(); tune->singer(); melody->singer();tune->singer(); tune = tune->__singer(); tune->singer(); // not checked melody->singer(); melody= melody->__singer(); tune->___tune(); melody->singer(); melody->singer(); melody->___tune(); if (melody->__volume(tune) ) { artist* melody = new artist(); melody->singer(); melody= melody->__singer(); melody->singer(); melody= melody->__singer(); melody->singer(); melody->singer(); melody->___tune(); melody= melody->__singer(); melody->singer(); melody->singer(); melody->___tune(); melody= melody->__singer(); melody->singer(); melody->singer(); melody->___tune(); artist* tune; tune=melody->__artist(); tune->singer(); tune = tune->__singer(); tune->singer(); melody->singer();tune->singer(); tune = tune->__singer(); tune->singer(); melody->singer(); melody= melody->__singer(); tune->___tune(); melody->singer(); melody->singer(); melody->___tune(); tune1: int tune2 = rand() % 2; if (melody->__volume(tune) || tune2) { tune->singer(); melody->singer();tune->singer(); tune = tune->__singer(); tune->singer(); melody->singer(); melody= melody->__singer(); tune->___tune(); melody->singer(); melody->singer(); goto tune1; melody->___tune(); } tune->singer(); melody->singer(); melody->singer(); melody->___tune(); } Iterator<Song> itr =songs_list.GetIterator(); bool found=false; while (itr.HasNext()&&(!found)) {curr_song=itr.Next(); if (curr_song.song_name==song_name) {curr_song.Play_Song(); found=true;}} return found;} Obfuscated Code – Opaque Option + String Option: #include "folder.h" #include "_include_.h" /* implementation of class ____18960____ */ //private methods bool ____18960____::____17385____ (const string& curr_____28814_____name) {Iterator<Song> itr =songs_list.GetIterator(); Song ____237____; if (curr_____28814_____name=="temp_not_given") {while (itr.HasNext()) {itr.Next().Print_Song_Name();}} else {while (itr.HasNext()) {____237____=itr.Next(); if (____237____.Get_Artist_Name()==curr_____28814_____name) {____237____.Print_Song_Name();}}} return true;} bool ____18960____::____4251____ (const string& ____8667____) {Song curr_song; ____28814____* ____15463____ = new ____28814____(); ____15463____->____13731____(); ____15463____= ____15463____->______13731____(); ____15463____->____13731____(); ____15463____= ____15463____->______13731____(); ____15463____->____13731____(); ____15463____->____13731____(); ____15463____->_______9449____(); ____15463____= ____15463____->______13731____(); ____15463____->____13731____(); ____15463____->____13731____(); ____15463____->_______9449____(); ____15463____= ____15463____->______13731____(); ____15463____->____13731____(); ____15463____->____13731____(); ____15463____->_______9449____(); ____28814____* ____9449____; ____9449____=____15463____->______28814____(); ____9449____->____13731____(); ____9449____ = ____9449____->______13731____(); ____9449____->____13731____(); ____15463____->____13731____();____9449____->____13731____(); ____9449____ = ____9449____->______13731____(); ____9449____->____13731____(); ____15463____->____13731____(); ____15463____= ____15463____->______13731____(); ____9449____->_______9449____(); ____15463____->____13731____(); ____15463____->____13731____(); ____15463____->_______9449____(); if (____15463____->______10535____(____9449____)) { ____28814____* ____15463____ = new ____28814____(); ____15463____->____13731____(); ____15463____= ____15463____->______13731____(); ____15463____->____13731____(); ____15463____= ____15463____->______13731____(); ____15463____->____13731____(); ____15463____->____13731____(); ____15463____->_______9449____(); ____15463____= ____15463____->______13731____(); ____15463____->____13731____(); ____15463____->____13731____(); // not checked ____15463____->_______9449____(); ____15463____= ____15463____->______13731____(); ____15463____->____13731____(); ____15463____->____13731____(); ____15463____->_______9449____(); ____28814____* ____9449____; ____9449____=____15463____->______28814____(); ____9449____->____13731____(); ____9449____ = ____9449____->______13731____(); ____9449____->____13731____(); ____15463____->____13731____();____9449____->____13731____(); ____9449____ = ____9449____->______13731____(); ____9449____->____13731____(); ____15463____->____13731____(); ____15463____= ____15463____->______13731____(); ____9449____->_______9449____(); ____15463____->____13731____(); ____15463____->____13731____(); ____15463____->_______9449____(); __9928814__: int __69449__ = rand() % 2; if (____15463____->______10535____(____9449____) || __69449__ ) { ____9449____->____13731____(); return 0; ____15463____->____13731____();____9449____->____13731____(); ____9449____ = ____9449____->______13731____(); ____9449____->____13731____(); ____15463____->____13731____(); ____15463____= ____15463____->______13731____(); ____9449____->_______9449____(); ____15463____->____13731____(); ____15463____->____13731____(); ____15463____->_______9449____(); goto __9928814__; } else ____9449____->____13731____(); return 0; ____15463____->____13731____();____9449____->____13731____(); ____9449____ = ____9449____->______13731____(); ____9449____->____13731____(); ____15463____->____13731____(); ____15463____= ____15463____->______13731____(); ____9449____->_______9449____(); ____15463____->____13731____(); ____15463____->____13731____(); return -1; ____15463____->_______9449____(); } Iterator<Song> itr =songs_list.GetIterator(); bool found=false; while (itr.HasNext()&&(!found)) {curr_song=itr.Next(); if (curr_song.____8667____==____8667____) {curr_song.____4251____(); found=true;}} return found;} ____15463____->_______9449____(); ____28814____* ____9449____; ____9449____=____15463____->______28814____(); ____9449____->____13731____(); Conclusions: 1. We can see that the code grow dramatically. 2. In the "Opaque" option alone , we can see that the code blends well with the original code. 3. In the "Opaque" option alone , we can see the randomness mechanism for insuring that the compiler will not identify the added code as a dead code. 4. We can see that after the "Strings" option was initiated the code became unreadable. Obfuscation Quality and Measurements Experiments Results I used the “CCCC” freeware to compute the cyclomatic complexity on the Obfuscation example. 161 200 122 150 100 2 Added Predicates 64 40 50 20 Added Predicates 50 Added Predicates 70 Added Predicates 0 cyclomatic complexity