Table of Contents

advertisement
Code
Obfuscation
Yaakob Iyun
Guided by Eitan Koch
2009-2010
Table of Contents
1. Preface
2. Literature Survey Summary
3. Opaque Predicate
a. Definition of Opaque Predicate
b. Manufacturing of Opaque Predicate
c. Randomness
d. Advantages and Disadvantages
4. Obfuscator Application features
5. Implementation Details
6. Obfuscation Quality and Measurements
7. Obfuscation Example
1) Preface
Defending intellectual property of source code , has become more and more
difficult.
Most of the used programmed langue (Like JAVA , C# ect) today use an easy to
reverse engineering byte code that compile later to a machine code ,
depending on the machine that the application is will be run on.
One obvious method to protect the intellectual property from reverse
engineering is to encrypt the byte code, so only the client that has the
decryption key and code could decrypt the application and run it.
The problem with encryption is the need of password and key exchange, it
requires more time and resource for decryption. So other method was
proposed – Code Obfuscation.
The purpose of an obfuscation process on a given code, is to make it
unreadable , confusing and misinform code.
The obfuscation as an idea has a lot of obfuscation techniques , which will be
shown on the Literature Summary section.
Note that obfuscating a given code , doesn't grantee us that an attacker will
not succeed in reverse engineer our code , but the more obfuscated our code ,
the more difficult for the attacker to reverse engineer it.
The project purpose is to a create an application that for a given C++ source
code , generates an obfuscated one that has the exact same functionality as
the original one.
2) Literature Survey Summary
What Is Code Obfuscation
Code Obfuscation a transformation from original code P to P' that holds:
1. Original code P and obfuscated code P' have the same functionality.
2. Obfuscated code P' is much more difficult to understand by human eyes
or for automatic deobfuscator , then the Original code P.
Code Obfuscation Measurement
 Potency: how much obscurity is being added to the program.
 Cost: how mush computation and memory overhead is being added to
the original program.
 Stealth: how much added obscure code resembles the original code
 Resilience: how difficult is for an automated obfuscator to deobfuscate
the code.
Code Obfuscation Operations
 Obfuscating inheritance data
Modifying inheritance relations – one can artificially longer the
inheritance relations , by adding unnecessary classes , merge 2 class that
don't have any relationship or split a class to several class , while
maintaining the functionality. By Modifying inheritance relations, we can
increase the program complexity.
 Obfuscating Procedural Abstractions
Inline Methods - inline functions : remove the function abstraction and
simply insert the function commands into the place where the function is
being called.
Clone Methods – Obfuscator can clone method to several methods that
behave the same, but are called differently and by so, confuse the
deobfuscator that will deduce that their operation is different.
 Obfuscating built-in data types
Convert static to procedural data - Static data, like const integers and
strings , appear in large amount in any program, Obfuscator and add
method to make these const objects in runtime , so the obfuscator will
not be able to deduce their values by static analysis.
Split or Merge scalar variables – for a limited range data type, the
obfuscator can make new merged data types or new splited data types to
indicate a different data type use for the deobfuscator.
 Obfuscating Meaningful Strings
Many Meaningful Strings are being used in an application, from variable
names on to method names and classes names. Obfuscator can change
those names to other less Meaningful names. For example the function
name "GetDatabasName()" can be changed to "__()".
 Obfuscating Control-flow Transformations
The main problem of all reverse engineering programs is to deduce the
Control-flow of a program. By inserting more predicates and changing and
complicating if and while statements the obfuscator can increase the
workload and computation to detect the Control-flow Transformations.
One efficient way to obfuscate code , is to insert opaque predicates
variable and constructors into the code. Those opaque predicates are
known to the obfuscator and unknown to the deobfuscator and so the
decision of the opaque predicates result are hard to deduce.
For example, we can split set of command, S, to: S1 – P – S2 – S3, where P
is opaque predicate, that always evaluate to true , and continue the
program at S3 , based on the application invariant. Because the
deobfuscator doesn't know any of the application invariants, it can
deduce the P at run time , and can deduce that S2 is a part of the program
when it is actually a misleading peace of code.
Christian Collberg, Clark Thomborson, and Douglas Low :
Manufacturing cheap, resilient, and stealthy opaque constructs
In their article, Christian Collberg, Clark Thomborson, and Douglas Low ,
emphases the need for code obfuscation and design a java code object code
obfuscator.
Basic obfuscator operation:
1. Getting input parameters: Java object code, required level of
obfuscation (potency) , maximum time/space cost penalty (cost) and
java profiling data.
2. Understanding the program, through making Symbol table, Inheritance
graph and control flow graphs that represent the application.
3. Continues Iterations , where the obfuscator inserts code transformation
until the potency or cost level is achieved.
4. Outputting a new obfuscated Object code that has the same
functionality as the original code.
Obfuscation code transformation:
An Obfuscation code transformation is a transformation to the code ,
T(Program)=Obfuscated Program , that holds the following:
 If the Program fails to terminate or terminated with an error code then
the Obfuscated Program may not terminate.
 Otherwise the Obfuscated Program must terminate and produce the
same result as the Program.
One of the most used code transforms are the Control transforms that use the
opaque predicates. An opaque predicate is known to the obfuscator and
unknown to the deobfuscator and so the decision of the opaque predicates
result are hard to deduce.
With opaque predicates we can get the following methods and much more:
1. Insert Dead or Irrelevant code:
Consider basic block S=s1,s2,s3,…,sn. We can split the block and insert
dead or wrong code to it and a predicate that allows evaluate to false so
the inserted code couldn't be reached.
2. Extended loop condition:
We can add opaque that always evaluates to true to the loop condition
by an AND operator and so it complexes the condition but doesn't harm
the number of iteration and so the functionally stays the same.
Manufacturing Opaque Constructors:
As mentioned, opaque predicates result need to be easy for the obfuscator to
deduce and hard for the deobfuscator to deduce. In addition opaque
predicates should be stealthy (blend with the code) and not in high
computational cost.
One way to manufacture opaque predicate is to exploit the general difficulty of
alias analysis problem and the current algorithms for detecting recursive data
structures.
The basic technique is:
1. Add an obfuscated code that builds set of dynamic structures.
2. Keep a set of pointers to those structures.
3. The inserted obfuscated code should update the dynamic structures
and the pointes location to it. But to maintain certain invariant on the
pointes and dynamic structures. (Such as – "there will be a path from
p1 to p2 always").
4. Construct an opaque predicate that depend on the invariants.
The article defines An Graph ADT, with operations on it. The operations vary
from adding node, deleting node merging two Graph, splitting a graph and
much more.
The graph code and the statements that change it should be embedded in the
code , for example:
The graph code should comply with these three conditions to more disguise
itself:
1. The obfuscator should keep large library of the GRAPH variants, and use
several variants in several parts of the code.
2. The added code (graph class and the statement that use it) should be
deobfuscated to.
3. Including the Graph ADT in a user class to blend it with the original code.
One useful improvement is relying on the thread concurrency capability of java
language. Threads and their scheduling mechanism is hard to analyze, both
because it is not a language depended mechanism , and because the actual
scheduling of a thread is by asynchrony events (user interaction , network
state and ect.).
Drawbacks of opaque predicates from dynamic structures
 Code Size Increase: the code size is dramatically increase because of
adding dead / irrelevant code, and dynamic graph code and the
instructions that use it.
The increased size can harm a web deployed applications and mobile
application that require the application size to be small.
 Dynamic Data Handling Increase: from the adding of the dynamic type
and performing operation like add and remove on it , the new obfuscated
program requires a lot of use with dynamic allocation that causes time
overhead increase and memory usage increase to hold those data
structures.
3) Opaque Predicate
In order to obfuscate our code we will insert if statements with conditions that
will hold the opaque predicates.
a. Definition of Opaque Predicate
Opaque Predicate : an expression that evaluates to either "true" or "false“ ,
which its outcome is known to the obfuscator and is very difficult and even
impossible for the deobfuscator to evaluate , without running the application
itself.
b. Manufacturing of Opaque Predicate
In order to manufacture to condition of the IF statements , we will use an
opaque predicate.
First an obfuscated Graph class is being introduced to the code.
We will Introduce a code that manipulates (build and changes) the a Graph
type object. Then we build Opaque Predicate with user input names according
to specific invariants that the graph structure holds (for example , that 2
pointers that will travel the graph object will always have a path between
them).
Then finally we will add randomness for the IF conditions together with the
predicate.
For example we can see in the following image , we split the original program
(S1;S2,,,,SN;) and add predicate that will always evaluate to "aTrue".
c. Randomness
In Order for the compiler not to identify and remove a certain block that is a
dead code that we never enter (because the predicate is always false) , I
inserted a randomness mechanism.
tune1:
int tune2 = rand() % 2;
if (melody->__volume(tune) || tune2)
{
tune->singer();
melody->singer();tune->singer();
tune = tune->__singer();
tune->singer();
melody->singer();
melody= melody->__singer();
tune->___tune();
melody->singer();
melody->singer();
goto tune1;
melody->___tune();
}
else
{
tune->singer();
melody->singer();
melody->singer();
melody->___tune();
}
We can see that even if the programs enter the first if section it returns to the
"tune1" label and until the value of "tune2" changes it remains in that loop and
performs an misleading code (not a part of the original program).
In this way the obfuscated program can enter the if section and performs it.
The code in that section will not be considered a dead code to be removed by
the compiler and by the naked eye of the attacker.
d. Advantages and Disadvantages
The obfuscation method of inserting Opaque predicates to a given code, has its
Advantages and Disadvantages.
Advantages:
I.
Increased number of predicates to deduce by the attacker  more
resources need to be used by the attacker.
II.
Inserting Dead and irrelevant code to the code in order to confuse the
attacker.
III.
The Introduced Graph Class code and the code that manipulates the
Graph , blends well with the source code because it uses user input
names.
IV.
The predicate names are user defined so they resemble the given code.
V.
In Run Time the behavior is not deterministic , because of the
randomness mechanism.
Disadvantages:
I.
In Run Time The application needs more space resources for the graph
that is being build , that is used for the opaque predicates conditions.
II.
The application includes more code lines to be executed (dead code,
graph manipulation code)  the program needs more time to be
executed.
This disadvantage can harm an application that uses its resources cerfually
(like cellular applications).
4) Obfuscator Application features
The referenced application is a C++ Obfuscator. Its purpose it to covert a given
C++ code to an obfuscated C++, with the same functionality as the original
code.
The application is written in C++.
User Interface:
The Application is a command line application.
The user needs to move all the files that he wants them to be obfuscated to
the "Working Folder" specified in the configuration file.
The user will be able to choose the Obfuscation methods he wants to be
performed on his code (from the tree listed below).
The Obfuscation methods are:
Opaque
------->
Transformations - Adding
Obfuscating Control-flow
Opaque Predicates and dead code
Strings
------->
Obfuscating Meaningful Strings
Example: Obfuscate.exe config.txt Opaque Strings
Obfuscation methods used by the application (explanation can be
found on the Literature Survey section):
1. Obfuscating Control-flow Transformations - Adding Opaque Predicates
and dead code – "Opaque" option
2. Obfuscating Meaningful Strings – "Strings" option
Config file example:
/*************************************************/
/* Configuration File For Koby's Obfuscation App */
/*************************************************/
/* Working Folder */
WorkingFolder=C:\Users\koby\Desktop\New folder
/* Source Files */
SourceFileList=folder.cpp
/**********************************/
/* Parameters for "Opaque" Option */
/**********************************/
/* Related Project Names */
RelatedProjectNames=melody,tune,volume,artist,singer
/* Number Of Predicates To Add */
NumberOfPredicatesToAdd=2
/* Function Names To Insert The Predicates */
FunctionNamesToInsertThePredicates=Play_Song
/***********************************/
/* Parameters for "Strings" Option */
/***********************************/
/* Function Names To Be Obuscated */
FunctionNamesToBeObuscated=Play_Song,Print_Folder_Songs
/* Variable Names */
VariableNames=Folder,song_name,temp_song
 RelatedProjectNames can't be one of the following strings:
goto , to , go , random.
5) Implementation Details
The obfuscator is written in C++ code , and is complied with visual 2008 compiler.
Source code details:
The source code for the obfuscator includes the flowing files:
1. ControlFlowTransformations.cpp , ControlFlowTransformations.h:
Defines and implements the "Opaque" option of the application.
2. MeaningfulStringsObfuscation.cpp , MeaningfulStringsObfuscation.h:
Defines and implements the "Strings" option of the application.
3. FileTokenizer.cpp , FileTokenizer.h:
Defines and implements a file handling class that gives the option of
inserting tokens to afile , replacing token , ect.
4. Database.cpp , Database.h:
Database class that parse the input config file supplied by the user , and
holds the relevant parameters for the obfuscator to work.
5. main.cpp :
This is the main module that initiate the application options according to
the user input.
6. graphManipulationCode Folder:
Folder that holds templates for building and manipulation Graph , and
templates for the opaque predicates code.
7. GraphLibFolder - _include_.cpp _include_.h
The used Graph class for holding the invariants which the predicates are
based on.
6) Obfuscation Quality and Measurements
In this project , to measure the obfuscation quality , I used the McCabe
metric (Cyclomatic complexity) , whitch is a known metric that trys to
measure code complexity.
The cyclomatic complexity of a section of source code is the count of the
number of linearly independent paths through the source code
How to compute the “Cyclomatic complexity”:
 Build control flow graph of the program.
 Compute: E − N + 2P
Where E = the number of edges of the graph
N = the number of nodes of the graph
P = the number of connected components
An example: Cyclomatic complexity = 9 - 8 + 2 = 3
**note that Note that for a program without any IF statements
or WHILE, FOR , CASE statements the Cyclomatic complexity is 1 ,
because E=0 , N=1 , P=1.
7) Obfuscation Example
Config File:
/*************************************************/
/* Configuration File For Koby's Obfuscation App */
/*************************************************/
/* Working Folder */
WorkingFolder=C:\Users\koby\Desktop\New folder
/* Source Files */
SourceFileList=folder.cpp
/**********************************/
/* Parameters for "Opaque" Option */
/**********************************/
/* Related Project Names */
RelatedProjectNames=melody,tune,volume,artist,singer
/* Number Of Predicates To Add */
NumberOfPredicatesToAdd=2
/* Function Names To Insert The Predicates */
FunctionNamesToInsertThePredicates=Play_Song
/***********************************/
/* Parameters for "Strings" Option */
/***********************************/
/* Function Names To Be Obuscated */
FunctionNamesToBeObuscated=Play_Song,Print_Folder_Songs
/* Variable Names */
VariableNames=Folder,song_name,temp_song
Original code:
bool Folder::Print_Folder_Songs (const string& curr_artist_name)
{Iterator<Song> itr =songs_list.GetIterator();
Song temp_song;
if (curr_artist_name=="temp_not_given")
{while (itr.HasNext())
{itr.Next().Print_Song_Name();}}
else
{while (itr.HasNext())
{temp_song=itr.Next();
if (temp_song.Get_Artist_Name()==curr_artist_name)
{temp_song.Print_Song_Name();}}}
return true;}
bool Folder::Play_Song (const string& song_name)
{Song curr_song;
Iterator<Song> itr =songs_list.GetIterator();
bool found=false;
while (itr.HasNext()&&(!found))
{curr_song=itr.Next();
if (curr_song.song_name==song_name)
{curr_song.Play_Song();
found=true;}}
return found;}
// not checked
Obfuscated Code – Opaque Option only:
bool Folder::Print_Folder_Songs (const string& curr_artist_name)
{Iterator<Song> itr =songs_list.GetIterator();
Song temp_song;
if (curr_artist_name=="temp_not_given")
{while (itr.HasNext())
{itr.Next().Print_Song_Name();}}
else
{while (itr.HasNext())
{temp_song=itr.Next();
if (temp_song.Get_Artist_Name()==curr_artist_name)
{temp_song.Print_Song_Name();}}}
return true;}
bool Folder::Play_Song (const string& song_name)
{Song curr_song;
srand (time(NULL) );
artist* melody = new artist();
melody->singer();
melody= melody->__singer();
melody->singer();
melody= melody->__singer();
melody->singer();
melody->singer();
melody->___tune();
melody= melody->__singer();
melody->singer();
melody->singer();
melody->___tune();
melody= melody->__singer();
melody->singer();
melody->singer();
melody->___tune();
artist* tune;
tune=melody->__artist();
tune->singer();
tune = tune->__singer();
tune->singer();
melody->singer();tune->singer();
tune = tune->__singer();
tune->singer();
// not checked
melody->singer();
melody= melody->__singer();
tune->___tune();
melody->singer();
melody->singer();
melody->___tune();
if (melody->__volume(tune) )
{
artist* melody = new artist();
melody->singer();
melody= melody->__singer();
melody->singer();
melody= melody->__singer();
melody->singer();
melody->singer();
melody->___tune();
melody= melody->__singer();
melody->singer();
melody->singer();
melody->___tune();
melody= melody->__singer();
melody->singer();
melody->singer();
melody->___tune();
artist* tune;
tune=melody->__artist();
tune->singer();
tune = tune->__singer();
tune->singer();
melody->singer();tune->singer();
tune = tune->__singer();
tune->singer();
melody->singer();
melody= melody->__singer();
tune->___tune();
melody->singer();
melody->singer();
melody->___tune();
tune1:
int tune2 = rand() % 2;
if (melody->__volume(tune) || tune2)
{
tune->singer();
melody->singer();tune->singer();
tune = tune->__singer();
tune->singer();
melody->singer();
melody= melody->__singer();
tune->___tune();
melody->singer();
melody->singer();
goto tune1;
melody->___tune();
}
tune->singer();
melody->singer();
melody->singer();
melody->___tune();
}
Iterator<Song> itr =songs_list.GetIterator();
bool found=false;
while (itr.HasNext()&&(!found))
{curr_song=itr.Next();
if (curr_song.song_name==song_name)
{curr_song.Play_Song();
found=true;}}
return found;}
Obfuscated Code – Opaque Option + String
Option:
#include "folder.h"
#include "_include_.h"
/* implementation of class ____18960____ */
//private methods
bool ____18960____::____17385____ (const string& curr_____28814_____name)
{Iterator<Song> itr =songs_list.GetIterator();
Song ____237____;
if (curr_____28814_____name=="temp_not_given")
{while (itr.HasNext())
{itr.Next().Print_Song_Name();}}
else
{while (itr.HasNext())
{____237____=itr.Next();
if (____237____.Get_Artist_Name()==curr_____28814_____name)
{____237____.Print_Song_Name();}}}
return true;}
bool ____18960____::____4251____ (const string& ____8667____)
{Song curr_song;
____28814____* ____15463____ = new ____28814____();
____15463____->____13731____();
____15463____= ____15463____->______13731____();
____15463____->____13731____();
____15463____= ____15463____->______13731____();
____15463____->____13731____();
____15463____->____13731____();
____15463____->_______9449____();
____15463____= ____15463____->______13731____();
____15463____->____13731____();
____15463____->____13731____();
____15463____->_______9449____();
____15463____= ____15463____->______13731____();
____15463____->____13731____();
____15463____->____13731____();
____15463____->_______9449____();
____28814____* ____9449____;
____9449____=____15463____->______28814____();
____9449____->____13731____();
____9449____ = ____9449____->______13731____();
____9449____->____13731____();
____15463____->____13731____();____9449____->____13731____();
____9449____ = ____9449____->______13731____();
____9449____->____13731____();
____15463____->____13731____();
____15463____= ____15463____->______13731____();
____9449____->_______9449____();
____15463____->____13731____();
____15463____->____13731____();
____15463____->_______9449____();
if (____15463____->______10535____(____9449____))
{
____28814____* ____15463____ = new ____28814____();
____15463____->____13731____();
____15463____= ____15463____->______13731____();
____15463____->____13731____();
____15463____= ____15463____->______13731____();
____15463____->____13731____();
____15463____->____13731____();
____15463____->_______9449____();
____15463____= ____15463____->______13731____();
____15463____->____13731____();
____15463____->____13731____();
// not checked
____15463____->_______9449____();
____15463____= ____15463____->______13731____();
____15463____->____13731____();
____15463____->____13731____();
____15463____->_______9449____();
____28814____* ____9449____;
____9449____=____15463____->______28814____();
____9449____->____13731____();
____9449____ = ____9449____->______13731____();
____9449____->____13731____();
____15463____->____13731____();____9449____->____13731____();
____9449____ = ____9449____->______13731____();
____9449____->____13731____();
____15463____->____13731____();
____15463____= ____15463____->______13731____();
____9449____->_______9449____();
____15463____->____13731____();
____15463____->____13731____();
____15463____->_______9449____();
__9928814__:
int __69449__ = rand() % 2;
if (____15463____->______10535____(____9449____) || __69449__ )
{
____9449____->____13731____();
return 0;
____15463____->____13731____();____9449____->____13731____();
____9449____ = ____9449____->______13731____();
____9449____->____13731____();
____15463____->____13731____();
____15463____= ____15463____->______13731____();
____9449____->_______9449____();
____15463____->____13731____();
____15463____->____13731____();
____15463____->_______9449____();
goto __9928814__;
}
else
____9449____->____13731____();
return 0;
____15463____->____13731____();____9449____->____13731____();
____9449____ = ____9449____->______13731____();
____9449____->____13731____();
____15463____->____13731____();
____15463____= ____15463____->______13731____();
____9449____->_______9449____();
____15463____->____13731____();
____15463____->____13731____();
return -1;
____15463____->_______9449____();
}
Iterator<Song> itr =songs_list.GetIterator();
bool found=false;
while (itr.HasNext()&&(!found))
{curr_song=itr.Next();
if (curr_song.____8667____==____8667____) {curr_song.____4251____();
found=true;}}
return found;}
____15463____->_______9449____();
____28814____* ____9449____;
____9449____=____15463____->______28814____();
____9449____->____13731____();
Conclusions:
1. We can see that the code grow dramatically.
2. In the "Opaque" option alone , we can see that the code blends well with
the original code.
3. In the "Opaque" option alone , we can see the randomness mechanism
for insuring that the compiler will not identify the added code as a dead
code.
4. We can see that after the "Strings" option was initiated the code became
unreadable.
Obfuscation Quality and Measurements
Experiments Results
I used the “CCCC” freeware to compute the cyclomatic complexity on the
Obfuscation example.
161
200
122
150
100
2 Added Predicates
64
40
50
20 Added
Predicates
50 Added
Predicates
70 Added
Predicates
0
cyclomatic complexity
Download