1 introduction

advertisement
Guarding Software Checkpoints
Yongdong Wu
wydong@i2r.a-star.edu.sg
Institute for Infocomm Research, SIngapore
Summary:
This paper aims to hide the conditional branch instructions used
for software checkpoints. It presents methods such as inserting
auxiliary variables, encoding a secret into several symbols,
indirect accessing. In addition, it delays the alert with dummy
codes when an attacker tries to analyse the software control/data
flows. This technology can provide additional protection after the
handhold devices are stolen or missing.
Key words:
Software protection, CRT
1
INTRODUCTION
Authentication allows one party to assure that a claimant
is as declared. The most popular technology is to check
whether a claimant knows something or owns something,
e.g., password and/or token. In the verification process, the
verifier usually executes some codes for checking the
presence of a secret message. If the message is not present
or wrong, the request from the claimant is refused. A typical
implementation of the checking code is to put a checkpoint
in the software to verifying the secret. However, since the
location of checking code is fixed, it can be found with
some tools (e.g. debugger), thus, an attacker can remove the
defences after finding the checkpoints given ample physical
access. To defeat against this kind of attempt, a tamper
resistant mechanism should be applied such as physical
protection, security through obscurity, or hybrid [1,2,3,4].
Physical protection demands specific circuits, for instance,
crypto-processor, which is beyond the scope of this
invention. Obfuscation [5] is a popular software
tamperproof technology, which makes a program
unintelligible while preserving its functionality so as to raise
a bar to a height sufficient to deter an attacker. An ideal
obfuscated program looks like a virtual black box.
Unfortunately, code obfuscation can never completely
protect an application from malicious reverse engineering
efforts. Its best result is to make the difficulty of attack so
great that in practice it is not worth performing even though
it could eventually be successful. Any modification will,
with a high probability, produce persistently nonsensical
behaviour. To ensure this, the techniques employed by an
obfuscator have to be powerful enough to thwart attacks by
automatic deobfuscators that attempt to undo the
obfuscating transformations.
US 5,892,899 [6], produces a security sensitive program
by distributing the secret in space as well as in time,
obfuscating the program, isolating its security sensitive
functions, and deploying an interlocking trust mechanism.
US 6,205,550 [7] places checkpoints on the security
sensitive module by checking the code integrity and the
execution elapsed time. To determine whether the module
is being observed, it checks return address and operation
mode that supports single step execution. To inter-couple
the above techniques, some variables are combined to form
a new variable.
WO 0,114,953 [8] increases the tamper-resistance and
obscurity of software so that the observable operation of the
transformed software is dissociated from the intent of the
original code, and so that the functionality of the software is
extremely fragile when modified. These effects are achieved
by converting the control-flow of the software into datadriven form, and increasing the complexity of the controlflow by orders of magnitude.
US 5,748,741 [9] is an encoding technique to protect
intelligence from being tampered with and disclosed. The
encoding technique, e.g, cascading and intertwining of
blocks, employs the concept of complexity of programs so
that any output depends on all input.
To hide the data, Bhatkar et al [13] obfuscates the
objectives of address by randomizing (a) the absolute
locations of all code and data, and (b) the relative distances
between different data items.
This paper presents some technologies for protecting the
software checkpoints so as to defeat the debugging the
checkpoint positions. To this end, it encodes the secret and
binds the codeword symbols with the software entries. In
order to delay the alert, we further insert the dummy codes
so as to cheat the debugger.
The reminder of the paper is organized as follows.
Section 2 defines the problem for protecting checkpoints.
Section 3 introduces the proposed technologies, and Section
4 draws a conclusion.
2
PROBLEM DEFINITION
In a typical implementation of authentication procedure
shown in Figure 1, the admissible trial number (here it is 3)
is set, that is to say, a user can try 3 times. Afterwards, a
sample such as keying password or fingerprint is captured.
Subsequently, the authentication code checks the validity of
the sample. If the verification report is positive, the user will
be granted the right to do the appropriate works. Otherwise,
admissible trial number reduces one. If the number is
greater than 0, the user can try again, otherwise, the code
will reject the request from the user and terminates.
Usually, a software may include some self-checking
codes to detect tampering of the license-checking
mechanisms, for instance,
 Time checkpoint
It is well known that time check can be used to defense
reverse-engineering. The defendant selects a segment of
codes and set a clock at the entry point and another clock
at the exit point. Normally, the elapse time should be
fixed roughly in a predefined environment. When
someone debugs the code, the elapse time will be
increased dramatically. Thus, by evaluating the elapse
time, the sensitive code can detect the malicious actions.
 Parent process checkpoint
Another popular technology to detect observation is to
check the state of the parent process. If the parent process
is not the operating system, it means that one is interested
in the run-time states of the sensitive codes.
 Secret checkpoint
This kind of checkpoint checks whether an input such
as product serial number satisfies some properties, such
as UNIX password check system.
As shown in Figure 1, this verification process compares
the pattern with the mapping of input. Because the pattern
should be stored along the module together, the attacker is
able to find it or let the comparison ineffective. Thus, this
self-checking mechanism can be defeated if the attacker can
find these self-checking codes and disable them. Generally,
an attacker has two reverse-engineering approaches to
bypass the checkpoints. One is static analysis and another is
dynamic or run-time analysis. Both approaches focus on
data flow and control flow so that the attacker can
understand the program and make profitable changes.
Therefore, the goal of the present paper is to increase the
attack barrier unless the attacker is willing to attack the
software at a higher cost.
TryAgain:
AccessGranted:
End:
NumberOfTrial  3
sample  readInputDevice()
pattern  extractPattern(sample)
If (pattern is valid) goto AccessGranted
NumberOfTrial  NumberOfTrial -1
If (NumberOfTrial is 0) goto End
else goto TryAgain
DoOtherWorks
Halt
Fig. 1: A typical implementation of authentication.
3
THE GUARDING TECHNOLOGIES
A tampering resistant system should build a defense
frontline so that the flow analyses are difficult, i.e., the
reverse-engineering process should be hard. This section
describes four means to protect software from reverseengineering: Hiding branch; Hiding Data Flow; Hiding
Entry Address; and Inserting dummy code.
3.1
Hiding branch
If new variables can be introduced, a program can be
presented in one sequence of codes and at most one posttest loop. Therefore, the conditional branch can be hidden.
Technically, the process for branch cancellation is
substituted with sequential instructions as shown in Figure
2. If the expression Lvalue can be enumerated with small
number of values, say only two logical values: True or
False, the conditional branch can be tabulated. If there are
several values, it can be transformed into multi-step two
values procedures, while the simple case (true/false) can be
processed with simple sequence instructions. For example,
in the previous time checkpoint the branch instruction can
be protected with this means.
After the conditional branch is hidden, an attacker has to
analyze all the instructions, other than only the condition
instructions. Furthermore, the automatic analysis tools are
of no use in localizing the checkpoints. Clearly, the cost of
the attacker is increased. In order to misguide the attacker, it
is preferable to vary the process of transforming conditional
branch instructions so as to the analysis tools to detect the
transforming pattern.
If (Lvalue == 1) X=f1();
else if(Lvalue==0) X=f2()
Save the common variables
X1= f1()·Lvalue
Recover the common variables
X2= f2()· (1-Lvalue)
X= X1+X2
Fig. 2: The process of transforming conditional branch
instructions.
3.2
Hiding Data Flow
Since data flow has a tight relationship with the control
flow, it leaks the information of checking process. For
example, in the password based authentication, an automatic
analysis tool is able to detect the trace of the input, and then
confirms a small number of instructions that may be used
for the checkpoints. To increase the number of suspected
instructions, the data flow will be protected with the
following methods.
 Data Type hiding
The sensitive variables are split and distributed in
separate locations. When the variables are needed in an
instruction, a specific segment will combine the separated
results to reconstruct the original one. After hiding
variables in this way, the attacker has to analyse the code
control flow in order to understand the data-flow.
 Location hiding
A more sophisticated or complicated way is to
calculate the variable address and access to the variable
dynamically. Say, based on the relative address between
two local variables, access them via the relative address.
This may confuse the attacker, especially most automatic
tools. The pay for this benefit may increase the
difficulties in maintenance and compatibility.
 Data Alias
Either data flow or control analysis requires that the
variables information. Aliasing the variables and
selecting them randomly increase the difficulty for
analysis [10,11]. For example, access a variable directly
or indirectly, even duplicate or map a variable so that the
access to the variable is blind.
3.3
Hiding Secret
In the trivial implementation shown in Figure 1, since the
secret message (pattern) is operated directly, an attacker
can find it and bypass the verification checkpoint easily. In
order to hide this secret, the secret is encoded into several
symbols to represent program block entry addresses. For
example, the secret can be coded with CRT (Chinese
Reminder Theory) [12] or some other technologies into
several sub-elements. Technically, assume that secret X is of
size m bits. Selecting n addresses A1, A2,, An, which can
be the entries of basic blocks1 and n relatively primes p1,
p2,, pn where all the pi are of the same size, so that
X=A1 mod p1
X=A2 mod p2

X=Ai mod pi

X=An mod pn
hold simultaneously, and
n
p
i
 2m
i 1
1 Basic block is a sequence of instructions. It is never entered except at
the first instruction and ends in a branch or return instructions.
The verification process calculates all the address Ai and
executes the coding from computed addresses sequentially
instead of comparing X with the predefined value directly.
Another way to hide the secret is to encode the secret into
codeword X1 X2,, Xn, then bind the codeword symbols Xi
with the program entry addresses as shown in Figure 3. At
the verification stage, the program extracts the hiding
address (second row) from the table and computes the
correct entries for execution. With this method, the
verification process is processed in multiple steps so as to
increase the difficulty in automatic analysis.
Normal Address
Hiding Address
A1
A1X1
A2
A 2X2


An
A nX n
Fig. 3: The address transformation. The first row represents
the normal address, and the second row represents the
transformed new addresses generated from the original
addresses and the input variable or its mapping X.
3.4
Inserting dummy code
In a standard programming, when a sub-routine is called,
a return address (usually the address next to the caller) will
be stored into the stack. Once the called sub-routine
finishes, the return address will be popped up and the
instruction in the address will be executed continuously.
However, if an attacker modifies the program so that the
control flow entries into a code segment starting from a
wrong address, the program stack will be underflow when it
returns. Then, the program may not return to the calling
address but an unknown entry such as a wild address, hence
the program will be crashed. That is to say, the software
protected with the method in Subsection 3.3 will be crash if
the attacker inputs a wrong secrete X. In this case, the
analysis tool of the attacker deduces that the location of the
checkpoint must be earlier than the crashing position.
To misguide the attacker, the present technology reserves
a lot of memory in the stack and fill in them with some valid
addresses in advance. When the program is stack underflow
and try to return, these forgery addresses may be used and
prevent the attacker from observing irregular program
behaviors. Therefore, from the viewpoint of attacker, the
program can run “well”. When the attacker finds that the
software distracts the target, the origin of error has been
passed. He has to look back and check it one by one.
4
CONCLUSION
This paper aims to obfuscate the control flow of typical
authentication module to increase the difficulty in analysing
the software flow. To accomplish this, some of software
entry addresses are generated from the input secrets to hide
checkpoints. To further cheat the attacker, it adds many
return addresses to delay the crash such that the attacker (or
an debug-like tool) has difficulty in localizing the code for
checking.
REFERENCES
[1]
[2]
[3]
[4]
[5]
[6]
[7]
[8]
[9]
[10]
[11]
[12]
[13]
Masahiro Mambo, Takanori Murayama and Eiji Okamoto,
“A Tentative Approach to Constructing Tamper-Resistant
Software”, New Security Paradigms Workshop, pp. 23-33,
1997.
Josep Domingo-Ferrer, “Software Run-Time Protection: A
Cryptographic Issue”, Eurocrypt, LNCS 473, pp474-480,
1990
Gu Yuan, Stanley T. Chow and Harold J. Johnson, “Tamper
Resistant Software Encoding”, WO0077597, 2000
Scott A. Moskowitz and Marc Cooperman, “Method for
Stega-cipher Protection of Computer Code”, US 5,745,569,
1998.
] Christian S. Collberg , Clark Thomborson , Douglas Low,
Manufacturing Cheap, Resilient, and Stealthy Opaque
Constructs , ACM SIGPLAN-SIGACT Symposium on
Principles
of
Programming
Languages,
1998
http://www.cs.arizona.edu/~collberg/Research/Obfuscation/
David Aucsmith, Gray Graunke, “Tamper Resistant Methods
and Apparatus”, US 5,892,899, 1999
Richard L. Maliszewski , Richard P. Mangold, “Tamper
resistant methods and apparatus”, US 6,205,550, 2001
Gu Yuan, Stanley T. Chow, “Software Code Protection by
Obscuring Its Data-Driven Form”, WO 0,114,953, 2001.
Stanley T. Chow and Harold J. Johnson, “Encoding
technique for software and hardware”, US 5,748,741, 1998
G. Ramalingam, “The Undecidability of Aliasing”, ACM
Transactions on Programming Languages and Systems, 16(5):
1467-1471, 1994
Thomas Reps, “Undecidability of Context-Sensitive DataIndependence
Analysis”,
ACM
Transactions
on
Programming Languages, 22(1):162-186, Jan. 2000
A. Menezes, Paul C. van Oorschot and Scott A. Vanstone
Handbook of Applied Cryptography, CRC Press, ISBN:08493-8523-7,
Chap.
14,
October
1996,
http://www.cacr.math.uwaterloo.ca/hac/
Sandeep Bhatkar, Daniel C. DuVarney, and R. Sekar, “Address
Obfuscation: An Efficient Approach to Combat a Broad Range of
Memory Error Exploits,”USENIX security Symposium, pp.105120, 2003.
Yongdong Wu Received the B.A and
M.S. in Automation Control from Beijing University of
Aeronautics and Astronautics in 1991 and 1994 respectively, and
the Ph.D. degree in Pattern Recognition and Intelligent Control
from Institute of Automation, Chinese Academy of Science, 1997.
He is currently an Associate Lead Scientist with Infocomm
Security Department, Institute of Infocomm Research (I2R), a-star,
Singapore. His research interests include multimedia security, eBusiness, Digital Right Management and Network security. Dr.
Wu won the Tan Kah Kee Young Inventor award in 2004 and
2005.
Download