Guarding Software Checkpoints Yongdong Wu wydong@i2r.a-star.edu.sg Institute for Infocomm Research, SIngapore Summary: This paper aims to hide the conditional branch instructions used for software checkpoints. It presents methods such as inserting auxiliary variables, encoding a secret into several symbols, indirect accessing. In addition, it delays the alert with dummy codes when an attacker tries to analyse the software control/data flows. This technology can provide additional protection after the handhold devices are stolen or missing. Key words: Software protection, CRT 1 INTRODUCTION Authentication allows one party to assure that a claimant is as declared. The most popular technology is to check whether a claimant knows something or owns something, e.g., password and/or token. In the verification process, the verifier usually executes some codes for checking the presence of a secret message. If the message is not present or wrong, the request from the claimant is refused. A typical implementation of the checking code is to put a checkpoint in the software to verifying the secret. However, since the location of checking code is fixed, it can be found with some tools (e.g. debugger), thus, an attacker can remove the defences after finding the checkpoints given ample physical access. To defeat against this kind of attempt, a tamper resistant mechanism should be applied such as physical protection, security through obscurity, or hybrid [1,2,3,4]. Physical protection demands specific circuits, for instance, crypto-processor, which is beyond the scope of this invention. Obfuscation [5] is a popular software tamperproof technology, which makes a program unintelligible while preserving its functionality so as to raise a bar to a height sufficient to deter an attacker. An ideal obfuscated program looks like a virtual black box. Unfortunately, code obfuscation can never completely protect an application from malicious reverse engineering efforts. Its best result is to make the difficulty of attack so great that in practice it is not worth performing even though it could eventually be successful. Any modification will, with a high probability, produce persistently nonsensical behaviour. To ensure this, the techniques employed by an obfuscator have to be powerful enough to thwart attacks by automatic deobfuscators that attempt to undo the obfuscating transformations. US 5,892,899 [6], produces a security sensitive program by distributing the secret in space as well as in time, obfuscating the program, isolating its security sensitive functions, and deploying an interlocking trust mechanism. US 6,205,550 [7] places checkpoints on the security sensitive module by checking the code integrity and the execution elapsed time. To determine whether the module is being observed, it checks return address and operation mode that supports single step execution. To inter-couple the above techniques, some variables are combined to form a new variable. WO 0,114,953 [8] increases the tamper-resistance and obscurity of software so that the observable operation of the transformed software is dissociated from the intent of the original code, and so that the functionality of the software is extremely fragile when modified. These effects are achieved by converting the control-flow of the software into datadriven form, and increasing the complexity of the controlflow by orders of magnitude. US 5,748,741 [9] is an encoding technique to protect intelligence from being tampered with and disclosed. The encoding technique, e.g, cascading and intertwining of blocks, employs the concept of complexity of programs so that any output depends on all input. To hide the data, Bhatkar et al [13] obfuscates the objectives of address by randomizing (a) the absolute locations of all code and data, and (b) the relative distances between different data items. This paper presents some technologies for protecting the software checkpoints so as to defeat the debugging the checkpoint positions. To this end, it encodes the secret and binds the codeword symbols with the software entries. In order to delay the alert, we further insert the dummy codes so as to cheat the debugger. The reminder of the paper is organized as follows. Section 2 defines the problem for protecting checkpoints. Section 3 introduces the proposed technologies, and Section 4 draws a conclusion. 2 PROBLEM DEFINITION In a typical implementation of authentication procedure shown in Figure 1, the admissible trial number (here it is 3) is set, that is to say, a user can try 3 times. Afterwards, a sample such as keying password or fingerprint is captured. Subsequently, the authentication code checks the validity of the sample. If the verification report is positive, the user will be granted the right to do the appropriate works. Otherwise, admissible trial number reduces one. If the number is greater than 0, the user can try again, otherwise, the code will reject the request from the user and terminates. Usually, a software may include some self-checking codes to detect tampering of the license-checking mechanisms, for instance, Time checkpoint It is well known that time check can be used to defense reverse-engineering. The defendant selects a segment of codes and set a clock at the entry point and another clock at the exit point. Normally, the elapse time should be fixed roughly in a predefined environment. When someone debugs the code, the elapse time will be increased dramatically. Thus, by evaluating the elapse time, the sensitive code can detect the malicious actions. Parent process checkpoint Another popular technology to detect observation is to check the state of the parent process. If the parent process is not the operating system, it means that one is interested in the run-time states of the sensitive codes. Secret checkpoint This kind of checkpoint checks whether an input such as product serial number satisfies some properties, such as UNIX password check system. As shown in Figure 1, this verification process compares the pattern with the mapping of input. Because the pattern should be stored along the module together, the attacker is able to find it or let the comparison ineffective. Thus, this self-checking mechanism can be defeated if the attacker can find these self-checking codes and disable them. Generally, an attacker has two reverse-engineering approaches to bypass the checkpoints. One is static analysis and another is dynamic or run-time analysis. Both approaches focus on data flow and control flow so that the attacker can understand the program and make profitable changes. Therefore, the goal of the present paper is to increase the attack barrier unless the attacker is willing to attack the software at a higher cost. TryAgain: AccessGranted: End: NumberOfTrial 3 sample readInputDevice() pattern extractPattern(sample) If (pattern is valid) goto AccessGranted NumberOfTrial NumberOfTrial -1 If (NumberOfTrial is 0) goto End else goto TryAgain DoOtherWorks Halt Fig. 1: A typical implementation of authentication. 3 THE GUARDING TECHNOLOGIES A tampering resistant system should build a defense frontline so that the flow analyses are difficult, i.e., the reverse-engineering process should be hard. This section describes four means to protect software from reverseengineering: Hiding branch; Hiding Data Flow; Hiding Entry Address; and Inserting dummy code. 3.1 Hiding branch If new variables can be introduced, a program can be presented in one sequence of codes and at most one posttest loop. Therefore, the conditional branch can be hidden. Technically, the process for branch cancellation is substituted with sequential instructions as shown in Figure 2. If the expression Lvalue can be enumerated with small number of values, say only two logical values: True or False, the conditional branch can be tabulated. If there are several values, it can be transformed into multi-step two values procedures, while the simple case (true/false) can be processed with simple sequence instructions. For example, in the previous time checkpoint the branch instruction can be protected with this means. After the conditional branch is hidden, an attacker has to analyze all the instructions, other than only the condition instructions. Furthermore, the automatic analysis tools are of no use in localizing the checkpoints. Clearly, the cost of the attacker is increased. In order to misguide the attacker, it is preferable to vary the process of transforming conditional branch instructions so as to the analysis tools to detect the transforming pattern. If (Lvalue == 1) X=f1(); else if(Lvalue==0) X=f2() Save the common variables X1= f1()·Lvalue Recover the common variables X2= f2()· (1-Lvalue) X= X1+X2 Fig. 2: The process of transforming conditional branch instructions. 3.2 Hiding Data Flow Since data flow has a tight relationship with the control flow, it leaks the information of checking process. For example, in the password based authentication, an automatic analysis tool is able to detect the trace of the input, and then confirms a small number of instructions that may be used for the checkpoints. To increase the number of suspected instructions, the data flow will be protected with the following methods. Data Type hiding The sensitive variables are split and distributed in separate locations. When the variables are needed in an instruction, a specific segment will combine the separated results to reconstruct the original one. After hiding variables in this way, the attacker has to analyse the code control flow in order to understand the data-flow. Location hiding A more sophisticated or complicated way is to calculate the variable address and access to the variable dynamically. Say, based on the relative address between two local variables, access them via the relative address. This may confuse the attacker, especially most automatic tools. The pay for this benefit may increase the difficulties in maintenance and compatibility. Data Alias Either data flow or control analysis requires that the variables information. Aliasing the variables and selecting them randomly increase the difficulty for analysis [10,11]. For example, access a variable directly or indirectly, even duplicate or map a variable so that the access to the variable is blind. 3.3 Hiding Secret In the trivial implementation shown in Figure 1, since the secret message (pattern) is operated directly, an attacker can find it and bypass the verification checkpoint easily. In order to hide this secret, the secret is encoded into several symbols to represent program block entry addresses. For example, the secret can be coded with CRT (Chinese Reminder Theory) [12] or some other technologies into several sub-elements. Technically, assume that secret X is of size m bits. Selecting n addresses A1, A2,, An, which can be the entries of basic blocks1 and n relatively primes p1, p2,, pn where all the pi are of the same size, so that X=A1 mod p1 X=A2 mod p2 X=Ai mod pi X=An mod pn hold simultaneously, and n p i 2m i 1 1 Basic block is a sequence of instructions. It is never entered except at the first instruction and ends in a branch or return instructions. The verification process calculates all the address Ai and executes the coding from computed addresses sequentially instead of comparing X with the predefined value directly. Another way to hide the secret is to encode the secret into codeword X1 X2,, Xn, then bind the codeword symbols Xi with the program entry addresses as shown in Figure 3. At the verification stage, the program extracts the hiding address (second row) from the table and computes the correct entries for execution. With this method, the verification process is processed in multiple steps so as to increase the difficulty in automatic analysis. Normal Address Hiding Address A1 A1X1 A2 A 2X2 An A nX n Fig. 3: The address transformation. The first row represents the normal address, and the second row represents the transformed new addresses generated from the original addresses and the input variable or its mapping X. 3.4 Inserting dummy code In a standard programming, when a sub-routine is called, a return address (usually the address next to the caller) will be stored into the stack. Once the called sub-routine finishes, the return address will be popped up and the instruction in the address will be executed continuously. However, if an attacker modifies the program so that the control flow entries into a code segment starting from a wrong address, the program stack will be underflow when it returns. Then, the program may not return to the calling address but an unknown entry such as a wild address, hence the program will be crashed. That is to say, the software protected with the method in Subsection 3.3 will be crash if the attacker inputs a wrong secrete X. In this case, the analysis tool of the attacker deduces that the location of the checkpoint must be earlier than the crashing position. To misguide the attacker, the present technology reserves a lot of memory in the stack and fill in them with some valid addresses in advance. When the program is stack underflow and try to return, these forgery addresses may be used and prevent the attacker from observing irregular program behaviors. Therefore, from the viewpoint of attacker, the program can run “well”. When the attacker finds that the software distracts the target, the origin of error has been passed. He has to look back and check it one by one. 4 CONCLUSION This paper aims to obfuscate the control flow of typical authentication module to increase the difficulty in analysing the software flow. To accomplish this, some of software entry addresses are generated from the input secrets to hide checkpoints. To further cheat the attacker, it adds many return addresses to delay the crash such that the attacker (or an debug-like tool) has difficulty in localizing the code for checking. REFERENCES [1] [2] [3] [4] [5] [6] [7] [8] [9] [10] [11] [12] [13] Masahiro Mambo, Takanori Murayama and Eiji Okamoto, “A Tentative Approach to Constructing Tamper-Resistant Software”, New Security Paradigms Workshop, pp. 23-33, 1997. Josep Domingo-Ferrer, “Software Run-Time Protection: A Cryptographic Issue”, Eurocrypt, LNCS 473, pp474-480, 1990 Gu Yuan, Stanley T. Chow and Harold J. Johnson, “Tamper Resistant Software Encoding”, WO0077597, 2000 Scott A. Moskowitz and Marc Cooperman, “Method for Stega-cipher Protection of Computer Code”, US 5,745,569, 1998. ] Christian S. Collberg , Clark Thomborson , Douglas Low, Manufacturing Cheap, Resilient, and Stealthy Opaque Constructs , ACM SIGPLAN-SIGACT Symposium on Principles of Programming Languages, 1998 http://www.cs.arizona.edu/~collberg/Research/Obfuscation/ David Aucsmith, Gray Graunke, “Tamper Resistant Methods and Apparatus”, US 5,892,899, 1999 Richard L. Maliszewski , Richard P. Mangold, “Tamper resistant methods and apparatus”, US 6,205,550, 2001 Gu Yuan, Stanley T. Chow, “Software Code Protection by Obscuring Its Data-Driven Form”, WO 0,114,953, 2001. Stanley T. Chow and Harold J. Johnson, “Encoding technique for software and hardware”, US 5,748,741, 1998 G. Ramalingam, “The Undecidability of Aliasing”, ACM Transactions on Programming Languages and Systems, 16(5): 1467-1471, 1994 Thomas Reps, “Undecidability of Context-Sensitive DataIndependence Analysis”, ACM Transactions on Programming Languages, 22(1):162-186, Jan. 2000 A. Menezes, Paul C. van Oorschot and Scott A. Vanstone Handbook of Applied Cryptography, CRC Press, ISBN:08493-8523-7, Chap. 14, October 1996, http://www.cacr.math.uwaterloo.ca/hac/ Sandeep Bhatkar, Daniel C. DuVarney, and R. Sekar, “Address Obfuscation: An Efficient Approach to Combat a Broad Range of Memory Error Exploits,”USENIX security Symposium, pp.105120, 2003. Yongdong Wu Received the B.A and M.S. in Automation Control from Beijing University of Aeronautics and Astronautics in 1991 and 1994 respectively, and the Ph.D. degree in Pattern Recognition and Intelligent Control from Institute of Automation, Chinese Academy of Science, 1997. He is currently an Associate Lead Scientist with Infocomm Security Department, Institute of Infocomm Research (I2R), a-star, Singapore. His research interests include multimedia security, eBusiness, Digital Right Management and Network security. Dr. Wu won the Tan Kah Kee Young Inventor award in 2004 and 2005.