Title: Multi factored approach towards malware resistance Authors: Raghunathan Srinivasan (corresponding author), Partha Dasgupta, Sujit Sanjeev, Jatin Lodhia, Vivek Iyer, Amit Kanitkar Affiliation: Address: Email: Phone: Fax: School of Computing, Informatics and Decision Systems Engineering, Arizona State University, Tempe, AZ, USA raghus@asu.edu (1) 480-965-5583 (1)-480-965-2751 Abstract: Protecting the integrity of consumer platforms in unmanaged consumer computing systems is a difficult problem. Attackers may execute buffer overflow attacks to gain access to systems, patch on existing binaries to hide detection, and steal sensitive secrets. Every binary has inherent vulnerabilities that attackers may exploit. In this paper three orthogonal approaches are presented to improve the security of platforms. Each of the approaches provides a level of assurance against malware attacks beyond virus detectors. These approaches can be used independent of each other as well as combined to achieve the desired level of protection. The approaches can be added on top of normal defenses. This work attempts to find alternate solutions to the problem of malware resistance. The approaches we use are: adding diversity or randomization to data address spaces, hiding critical data to prevent data theft and the use of remote attestation to detect tampering with executable code. Keywords: Computer Security, Attacks, Remote Attestation, Integrity measurement, Virtual Machine Monitors, Secure key storage in memory, Memory Randomization. 1. Introduction The magnitude of the threat from malware, especially on “consumer computing” platforms is well known and well understood. Malware today can hide from virus detectors, steal secrets, live stealthily for extended periods of time, effectively prevent removal efforts, and much more. The ability to run sensitive applications and store sensitive data on consumer platforms without having to trust the platform (and without using trusted hardware modules) is very critical. The platforms hosting such applications can be subject to a variety of attacks, leading to the danger of data leakage, information theft, modification of functionality, and a variety of possibly damaging losses. Attestation of client computers using hardware attestation modules, or using hypervisors to scan computers have not had much success due to the cumbersomeness of the solutions. A smartly designed malware can have more power than any other application in the system [Srinivasan and Dasgupta, 2007]. A complete silver bullet solution to security problems is difficult to achieve [Basili and Perricone, 1984]. However the risks can be practically mitigated if there are mechanisms that ensure verifiable executions, check on the integrity of the application, and have isolation techniques that make information stealing difficult. We outline three orthogonal schemes in this paper; each of which provides a level of assurance against malware attacks beyond virus detectors. The approaches can be added on top of normal defenses and can be combined for tailoring the level of protection desired. This work attempts to find alternate solutions to the problem of malware resistance. When combined, these techniques will provide adequate guarantees against tampering of applications by existing malware. We implement techniques entirely in software to determine the integrity of an application on an untrusted machine, techniques to hide secrets in end user systems, and techniques to randomize memory layout of binaries. These three scenarios cover the motivations for an attacker to compromise an end user system. By mitigating these three factors, we can increase the security of systems. The first approach is Remote Attestation. Remote Attestation is a set of protocols that uses a trusted service to probe the memory of a client computer to determine whether one (or more) application has been tampered with or not. These techniques can be extended to determine whether the integrity of the system has been compromised. While the idea sounds easy, given the power of the adversary (malware), a very careful design has to be done to prevent the malware from declaring to the server that the system is safe. Remote Attestation has been implemented in hardware and software. Hardware based schemes have pros and cons, software based solutions for Remote Attestation involve taking a mathematical or a cryptographic checksum over a section of the program section. The solution in this research can provide tamper proof measurements entirely in software from the user space even if a smart malware attempts to manipulate system call responses, masquerading, and infects multiple programs including the kernel on the client machine. The second approach is to build obfuscation and shielding methodology to make stealing secrets from client machines harder. For example, memory in client applications holds encryption keys, passwords, sensitive data, and private keys in case of PKI implementations. Stealing secrets by copying zones of memory is particularly simple (keys for example, have high entropy). We provide two approaches to hide keys more effectively. The first method involves the use of a virtual machine monitor (VMM), and the second method scatters keys on the raw disk space. The third approach is the ability to provide “software diversity” for legacy software. Currently a malware designer can perform offline analysis of an application to discover vulnerabilities in it. These vulnerabilities can be exploited to launch various kinds of attacks on multiple systems. Every copy of an application that is shipped to consumers is exactly the same, and contains the same weaknesses in the same binary locations. Software diversity breaks up the uniformity making each instance of the application different, and attacks that work on one instance do not work on another. ASLR [Web link 1] is an example, but we take that idea to finer degrees of granularity on each stack frame and heap frame. The first technique randomizes the structure of stack on each copy of an application binary to prevent stack overflow based attacks; the second technique randomizes the structure of allocated heap memory to prevent heap based overflow attacks. Our methods have been implemented completely in software. We opine that such approaches, judiciously combined with traditional malware prevention methods, can make computing safer without adding much overhead to the applications and operating systems. In the remainder of the paper, we present work related to our approaches and provide brief overview and implementation details of our approaches. The rest of the paper is organized as follows, section 2 presents the related work for all these topics, section 3 presents the remote attestation technique, section 4 presents key hiding, and section 5 presents stack and heap randomization. 2. Related work Integrity measurement involves checking if the program code executing within a process or multiple processes is legitimate or has been tampered. It has been implemented using hardware, virtual machine monitors, and software based detection schemes. Some hardware based schemes operate off the TPM chip provided by the Trusted Computing Group [Goldman et all., 2006 ; Sailer et all., 2004; Stumpf et all., 2006]. The hardware based schemes allow a remote agent to verify whether the integrity of all the programs on the client machine is intact or not. The kernel executing on the client takes measurements when a program is first executed and provides it to the TPM which signs the values with its private key. The signed value is then sent to the remote agent that verifies the signature and the values generated. This scheme cannot measure malicious code which infects running programs and does not infect the file system. This scheme has another drawback that a compromised kernel may provide incorrect values to the TPM to sign. Hardware based schemes also suffer from the fact that checksums are located in the hardware and cannot be updated easily, and the hardware has to be physically replaced, and hence the use of a secure co-processor which is placed into the PCI slot of the platform is recommended [Wang and Dasgupta, 2007]. Terra uses a trusted virtual machine monitor (TVMM) and partitions the hardware platform into multiple virtual machines that are isolated from one another [Garfinkel et al., 2003]. Hardware dependent isolation and virtualization are used by Terra to isolate the TVMM from the other VMs. Terra relies on the underlying TPM to take some measurements, and hence is unsuitable for legacy systems. In Pioneer [Seshadri et al., 2005] the integrity measurement is done without the help of hardware modules or a VMM. The verification code for the application resides on the client machine. The verifier (server) sends a random number (nonce) as a challenge to the client machine. The response to the challenge determines if the verification code has been tampered or not. The verification code then performs attestation on some entity within the machine and transfers control to it. This forms a dynamic root of trust in the client machine. Pioneer assumes that the challenge cannot be re directed to another machine on a network, however, in many real world scenarios a malicious program can attempt to redirect challenges to another machine which has a clean copy of the attestation code. In its checksum procedure, Pioneer incorporates the values of Program Counter and Data Pointer, both of which hold virtual memory addresses. An adversary can load another copy of the client code to be executed in a sandbox like environment and provide it the challenge. This way an adversary can obtain results of the computation that the challenge produces and return it to the verifier. Pioneer also assumes that the server knows the exact hardware configuration of the client for performing a timing analysis; this places a restriction on the client to not upgrade or change hardware components. In TEAS [Garay and Huelsbergen, 2006] the authors propose a remote attestation scheme in which the verifier generates program code to be executed by the client machine. Random code is incorporated in the attestation code to make analysis difficult for the attacker. The analysis provided by them proves that it is very unlikely that an attacker can clearly determine the actions performed by the verification code; however implementation is not described in the research, and certain implementation details often determine the effectiveness of a particular solution. Many approaches have been developed to ensure secure software based key management. Centralized key storage employs techniques where the generation, storage, distribution, revocation and management throughout the lifetime of the key happens on a single central machine. This offers advantages such as ease of backup and recovery and securing a single machine secures the keys. One such commercially available product is Thales keyAuthority[Web link 2]. Secondary storage or a detachable device has also been used to hide keys by encrypting the key with a very strong password, but this can be attacked using a key logger that logs typed passwords on the system [Shamir and van Someren, 1999]. In the same research another method is presented to store keys in the system and that is to break the key into multiple parts and distribute it in different places. Distributing the key reduces the memory entropy footprint making it harder to detect the pieces that comprise the key. Another solution for key management is distributed key storage using secret sharing [Canetti et al., 2000]. This could be an option for large organizations, but it is not feasible for normal end users of cryptography. Reducing the number of key copies present in memory is another method to protect cryptographic keys from memory disclosure attacks. This avoids caching of the key by the operating system and also disallows swapping of the memory area of the key. It is concluded that though the number of copies of keys present is reduced, once a sufficiently large memory dump is obtained; there are high chances that it would contain the single copy of key. It is then suggested that in order to eliminate leakage of key via memory disclosure attacks, consumers would have to resort to special hardware devices [Harrison, 2007]. Networked cryptographic devices in which the cryptographic operations are performed by a remote server have also been researched. Even if the networked device where the cryptographic operations are performed is compromised, the attacker cannot derive the entire key from it [MacKenzie, 2001]. The solutions proposed in this paper try to prevent key theft from memory disclosure attacks, irrespective of the number of copies of the key present in memory, and prevents key exposure altogether. Buffer overflow is a very commonly used form of attack. The first known documented buffer overflow attack dates back to November 1988, when a worm attacked the Internet which was, at that time, a collection of 60,000 computers implementing the TCP/IP protocol suite [Eichin and Rochlis, 1989]. It has been found that buffer overflows constitute more than 50% of all major security bugs that are published as advisories by CERT [Viega and McGraw, 2002]. There are several variants of the buffer overflow attacks like stack overflows, heap corruption, format string attacks, integer overflow and so on [Foster et al., 2005]. ]. C and C++ are very commonly used to develop applications; due to the efficient “unmanaged” executions these languages are not safe. A vast majority of vulnerabilities occur in programs developed with these languages [Seacord, 2005]. Randomization is a technique to inject diversity into computer systems. The first known randomization of stack frame was proposed by placing a pad of random bytes between return address and local buffers [Forrest et al., 1997]. Random pads make it difficult to predict the distance between buffers and the return address on the stack. An attacker has to launch custom attacks for every copy of the randomized binary. Address obfuscation has extended the above idea by randomizing the base address of memory regions, permuting the order of elements in a binary, and introducing random gaps within memory regions [Bhatkar et al., 2003]. Address Obfuscation, does not require any changes to be made to the operating system or the compiler. Unlike the operating system randomization solutions, this technique is probabilistic is nature, i.e. it gives a partial, but high amount of randomization to the code and data segments of a binary executable. The major randomizations that are proposed in this technique are (1) randomization of the base memory address for the stack and heap segments, dynamically loaded libraries, routines and static data, (2) permutations on the order of variables and routines and (3) introduction of random gaps between memory objects like stack frames, heap allocations, static variables. Linux randomizes kernel address space by a methodology known as Address space layout randomization (ASLR). ASLR changes the start address of the stack and the heap every time an application is loaded. However, the program structure and the data layout remains the same. ASLR approach modifies an operating system so that the base address of various segments of a program is randomly relocated during the program execution. Two such popular ASLR implementations named PaX and ExecShield are available for Linux as kernel patches. These approaches do not require the modification of individual binaries, but necessitate that these binaries be compiled with a feature called Position Independent Executables (PIE). It has been demonstrated that PaX ASLR only marginally slows down the time taken to attack a system [Durden, 2002 and Shacham et al., 2004]. They explain techniques for bypassing ASLR protection and demonstrate a de-randomization attack on a known buffer overflow vulnerability which takes barely 216 seconds to succeed. It has been observed that regular executables are faster than their PIE counterparts. This is because for PIE executables, relative offset calculations need to be done on-the-fly rather than being available in pre-computed and pre-linked form. In Transparent Run-time Randomization (TRR) [Xu et al., 2003] the loader relocates various executable segments, shared libraries and object modules in the user address space of the process. TRR initially allows the system to set up sections of the user address space like the stack, heap, data segment, and so on. TRR moves the heap from its original base to a new base by adding a random number of addresses to the original heap base. TRR relocates the stack by creating a new stack segment below the current one and moving the stack pointer from the old stack to this new stack. This procedure occurs every single time a process is executed, thus giving different amounts of randomization during each execution. Thus, by relocating various sections of the user address space, TRR makes exploiting vulnerabilities a more challenging task than before. The solutions presented in this paper randomize the structure of the stack for every routine. This is done without access to the source code; we observe the disassembly of every binary and add a random pad in every function by increasing the stack allocation. This is done per binary and per function basis. We also randomize the heap frame by allocating extra random memory on every instance of heap allocation. 3. Remote Attestation Remote attestation is a framework for allowing a remote entity to obtain integrity measurements on an un-trusted client machine. In order for remote attestation to work, we need to have access to a remote verifier Trent that is a trusted (uncompromised) host and is accessible over a network A single trusted host can be used for a large number of clients; ensuring that sensitive applications running on the clients are not tampered by malicious code on the client machines. In the consumer computing scenario, we envision the deployment of “attestation servers”, where an end-user contracts with the service provider to test the safety of the applications on the end-platform. This is similar to the way virus detector updates are done today. Remote Attestation has been implemented traditionally with the help of hardware modules as discussed in section 2 and the use of VMM [Sahita et all., 2007] has also been suggested. It involves the trusted server (Trent) communicating with the hardware device installed on the client’s (Alice) machine. However, these modules are unsuitable for legacy platforms, and have the stigma of Digital Rights Management attached with them. The use of a VMM also requires greater hardware resources and compute power. In our framework remote attestation is implemented entirely in software without kernel support. Operating system support is not used in this framework as it would require a secure OS, or a loadable kernel module that performs the attestation. The first scenario is unlikely to occur, and the second scenario would require frequent human interaction to load the kernel modules in the system (to prevent automatic exploits of the kernel loading modules). The approach taken in this paper is designed to detect changes made to the code section of a process. Trent is a trusted entity who has knowledge of what the structure of an untampered copy of the process (P) to be verified. Trent provides executable code (C) to Alice which Alice injects on P. C takes overlapping MD5 hashes on the sub-regions of P and returns the results to Trent A software protocol means that there exists opportunity for an attacker (Mallory) to forge results. The attacker (Mallory) can perform a replay attack in which Trent is provided with results that are the response to a previous attestation challenge. Mallory may tamper with the results generated by the attestation code to provide the expected results to Trent. Mallory may re-direct the challenge to another machine which executes a clean copy of the application P, or Mallory may execute the challenge inside a sandbox to determine its results and send the results to Trent. This paper addresses these by incorporating specialized tests and generates random code to mitigate the effects of these attacks. We obtain machine identifiers through system interrupts to determine whether the challenge was replayed. We take measurements on the client platform that determine whether the attestation code was executed in a sandbox. Lastly, we perform extensive randomization of the attestation code by changing the arithmetic operations and memory locations read by every instruction. When P contacts Trent the remote attestation starts. Trent provides P with a binary attester code C (that is signed by Trent). C is generated each time it is “provided” and is composed of randomized and obfuscated binary, and hence it is difficult to reverse engineer C. Since C is downloaded code, Trent has to be trusted not to provide malware to Alice. P runs C, and C hashes the memory space of P in random overlapping sections and then encrypts the hashes with a nonce that is contained in C and sent to Trent. The nonce is located in a different location for each instance of C making it impossible for a compromised P to mimic C’s behavior. When Trent gets the results from C it verifies that P has not been tampered with and it is executing correctly. Now that we know P is correct, P can be entrusted to verify the integrity of the security sensitive application that execute on Alice. Figure 1 shows the overview of the framework. 3.1 Implementation The remote attestation scheme was implemented on Ubuntu 8.04 (Linux 32 bit) operating system using the gcc compiler; the application P and attestation code C were written in the C language. Figure 2 shows the detailed steps in performing Remote Attestation. C executes some tests on P to return an integrity measurement value M1 to Trent. Trent executes the same set of results on a local pristine copy of P, which produces a measurement value M0. Trent compares M1 and M0; if the two values are the same then Alice is informed that P is clean. Trent has to be certain that C took its measurements on the correct copy of P residing inside MAlice. To determine this, C executes some more tests on MAlice, these checks ensure that C was not bounced to another machine, and that it was not executed in a sandbox environment inside a dummy P process within MAlice. Trent introduces the following conditions inside C to prevent Mallory from faking any portion of the results. C computes a MD5 hash of P to determine if the code section has been tampered. Downloading MD5 code is an expensive operation as it is huge, and MD5 code cannot be randomized as it may lose its properties, hence the MD5 code permanently resides on P. To prevent Mallory from exploiting this aspect a two phase hash protocol is implemented. Trent places a mathematical checksum inside C which computes the checksum on the region of P containing the MD5 executable code along with some other selected regions. Trent receives the results of the arithmetic checksum, verifies the results and sends a message back to C which proceeds with the rest of the protocol if Trent responds in affirmative. The checksums are taken on overlapping sub regions to make prediction of results more difficult for Mallory. This creates multiple levels of indeterminacy for an attack to take place. Overlapping checksums also ensures that if by accident the sub-regions are defined identically in two different versions of C, the results of computation produced by C are still different. To determine whether C was bounced to another machine, Trent obtains the address of the machine that C is executing on. Trent had received an attestation request from Alice, hence has access to the IP address of MAlice. If C returns the IP address of the machine it is executing on, Trent can determine if both values are the same. Although IP addresses are dynamic, there is little possibility that any machine will change its IP address in the small time window between a request by Alice to measurements being taken and provided to Trent. C determines the IP address of MAlice using system interrupts. Mallory will find it hard to tamper with the results of an Interrupt. We assume that Alice is not running MAlice behind a NAT and that the machine has only one network interface. The reason to make these assumptions is that C takes measurements on MAlice to determine if it is the same machine that contacted Trent. If MAlice is behind a NAT then Trent would see the request coming from a router and measurements from MAlice. Multiple interfaces would return multiple Internet addresses and make it difficult for C to perform checks and return results to Trent. To determine that P was not executed in a sandbox environment, C determines the number of processes having an open connection to Trent on the client machine. This is obtained by determining the remote address and remote port combinations on each of the port descriptors in the system. C performs communication to Trent by using the socket descriptor provided by P. This implies that in a pristine situation there must be only one such descriptor on the entire system, and the process utilizing it must be the process inside which C is executing. If there is only one such process, C computes its own process id and compares the two values. We do not assume a compromised kernel. The verification code C relies on the kernel only to handle the system calls executed through interrupts and to read the file structure containing the open connections on the system. There are many system call routines in the Linux kernel and monitoring and duplicating the results of each of these may be a difficult task for malware. However, reading the port file structure requires support from the operating system. We will assume that the OS provides correct results when the contents of a directory and file are read out. Without this assumption, Remote Attestation cannot be performed entirely without kernel support. 3.2 Performance We obtained the source code for the VLC media player interface [Web Link 3]. We removed some sections of the interface code and left close to 1000 lines of C code in the program. We took 2 pairs of machines running Ubuntu 8.04. One pair were legacy machines executing on an Intel Pentium 4 processor with 1 GB of ram, and the second pair of machines were Intel Core 2 Quad machines with 3 GB of ram. The tests measured were the time taken to generate code including compile time, time taken by the server to do a local integrity check on a clean copy of the binary and time taken by the client to perform the integrity measurement and send a response back to the server. To obtain an average measurement for code generation we executed components of the program in a loop of 1000 times and measured the average time as reported in table 1. We then executed the integrity measurement code C locally on the server and sent it to the client for injection and execution. The time taken on the server is the compute time the code will take to generate integrity measurement on the server as both machines were kept with the same configuration in each case. These times are reported in table 2. It must be noted that the client requires a higher threshold to report results because it has to receive the code from the network stack, inject the code, execute it, return results back through the network stack to the server. Network delays also affect the time threshold. We can see from the two tables that it takes an order of a few hundred milliseconds for the server to generate code, while the integrity measurement is very light weight and returns results in the order of a few milliseconds. Due to this the code generation process can be viewed as a huge overhead. However, the server need not generate new code for every instance of a client connection. It can generate the measurement code periodically every second and ship out the same integrity measurement code to all clients connecting within that second. This can alleviate the workload on the server. 4. Secure key storage Stealing secrets from memory of executing programs is an effective method for circumventing security systems, especially encryption. Encryption keys have to be stored as clear-text in memory when the application that performs the encryption executes. Hence a crucial piece of information resides in memory; this information is susceptible to memory forensics based attacks. For example, the AACS encryption for high-definition DVD players uses a master key to keep other keys, and uses the unbroken AES encryption method. It has been documented that a particular HD-DVD encryption key was stolen from memory [Web link 4]. The encryption keys can also be stolen by performing an entropy analysis. Encryption keys are known to have high entropy by nature. In this paper we present two methods of safely storing encryption keys. This first involves hypervisor support. The keys are placed in a hypervisor below the operating system. This prevents any user application, kernel module, or the operating system itself from stealing the key. The key computations are done within the hypervisor using calls that allow an application to call the hypervisor. When an application requests a decryption operation, the hypervisor does a remote attestation to ensure that it is a valid, authorized and uncompromised application. The second method is disk striping. In this method the keys are kept on disk every time the key is not actively in use (even if the key handling application is running). The keys are split into tiny chunks of a few bits each and placed in hidden blocks of the disk that are not part of the file system. The randomized disk layout for keys is on a per system basis and uses a large number of random numbers that are also stored on hidden parts of the disk. Special retrieval routines are used to fetch and store keys and diversity techniques are used to make them hard to attack. We provide an overview and implementation details of both methods below. 4.1 Hypervisor based key storage Virtual Machines (VM) are primarily used for executing multiple machines on a single hardware platform. Virtual Machine Monitors (VMM) are also widely used to provide a trusted computing base to security sensitive applications as they can provide logical isolation between different VMs. We offload the cryptographic operations of the system to the secure VMM. The guest operating system(s) interact with the user and receive the request to perform cryptographic operations. The guest operating systems make a hypercall to the VMM to perform the actual encryption/decryption. Virtualization provides memory isolation; we leverage the secure nature of the VMM where a guest operating system cannot read the contents of the host VMM, but the VMM can read the contents of every guest OS. The guest programs interact with the outside world and request crypto operations. The routines performing encryption and decryption using the secret key are provided by the VMM. The secret key never reaches the memory sections of the guest operating systems. Any attacks launched by the attacker Mallory are restricted on the guest space. Due to this Mallory cannot get information about the key using forensic analysis on the guest memory. We also implement attestation which ensures that the application calling the cryptographic hypercall is legitimate. The attestation process ensures that the program requesting the cryptographic operations from the guest user space is the intended legitimate program. This thwarts attacks such as code injection where a malicious program may request cryptographic operations by patching onto the legitimate program. Attestation is achieved in two phases, in the first phase the guest kernel issues an encrypt request to the host kernel with the data. The host kernel responds by performing an attestation on the application making the request. Once the host kernel is satisfied that the application making the request is valid, it proceeds to the second phase where it encrypts the data. The key used to perform encryption/decryption is stored completely in the hypervisor. Figure 3 shows the sequence of operations for performing crypto operations. 1. The guest user space application issues requests for performing cryptographic operations. It passes the type of operation and the data to be operated upon as input. The request issues a software interrupt and the context switches from guest user space to guest kernel space. 2. The guest kernel forwards the request to the trusted VMM. 3. The secure VMM injects code into the running guest kernel. The injected code is responsible for returning the guest physical address where the user space program is loaded. It also brings the user application’s pages into memory in case they are swapped out. The injected code is changed every time an attestation request is made. 4. The guest kernel executes the injected code and returns the address of the page, where the user space program resides. 5. Control is transferred back to the VMM. 6. The secure VMM now reads the contents of the user space program directly from the memory, using the address obtained in step 4. 7. The secure VMM computes and compares the hash value of the memory content to pre computed hash values obtained from the original binary image of the program. 8. The requested operations are performed by the VMM 9. The results are written back to the memory location passed by the guest kernel. 10. The guest kernel copies the results to the guest user space. 4.1.1 Implementation The system was implemented on the Linux 2.6.23 kernel. We used the same guest and host operating systems. We utilized the lguest modules to implement the VMM. We used the DES module to perform cryptographic routines. We added all the required hypercalls for performing these operations. We performed attestation by creating a nop placeholder for the attestation code inside the guest kernel. When the guest kernel issues an attestation request, the VMM provides executable code which is to be injected at this location. The guest kernel uses copy_to_user() call to inject bytes at the specified placeholder. Control is passed back to the guest kernel which executes the injected bytes. The attesting routine returns hash computations of the user space program back to the hypervisor. The hypervisor compares these values against expected values to determine if the calling user space application is indeed a legitimate application. 4.1.2 Performance We implemented the system on the following platform. Processor: Intel Core 2 Duo 1.66 GHz, RAM: 1 GB. We compared the DES and RSA algorithms with similar key sizes while performing cryptographic operations. Table 3 shows the performance of the two algorithms on the test platform. We averaged the measurements over 200 runs. One round of encryption followed by decryption on the host OS took 8 microseconds, in the guest OS, one round of encryption and decryption took 31 microseconds, an increase by a factor of 3.8. RSA involves many more complex exponentiation operations as compared to DES; it predictably takes longer than DES on the system. 4.2 Disk striping based key storage We hide the key on secondary storage by writing to the unused sectors of files on the hard disk without using the file system calls of the OS. Each file on the storage media has an end of file (EOF) marker. The OS allocates space for files in disk blocks and does not reclaim the space beyond the EOF marker if a particular block is partially used. The space beyond the EOF on a sector is used in this research to store the key. We do not add key information to random files, but we place new files during installation of the code. Once the storage area is determined, we scatter the key throughout this area such that the attacker cannot retrieve the key even after knowing the sector where it is stored. Fig. 4 shows an instance of a scattered array. We refer to the location of a bit of the key in scattered array as bit-address. Every bit of the key has a bit-address associated with it which forms the bit-address pattern. This bit-address pattern is unique to every installation of the system. If the bit-address pattern is stored directly, then it can be easily read by the attacker. Instead of storing the bit-address pattern, the bit-address fetching block generates the address of each bit at run time. There is one logical block of code per bit address to be generated. The bit-address fetching block is generated for the application during installation time. We generate a bit-address and then generate the bitaddress fetching block to match the addresses. This is achieved by performing random arithmetic operations on the result value till we achieve the desired value for that particular bit. The arithmetic operations are chosen from a pool of operations. We also obfuscate the location of the bit-address fetching block in the binary. 4.2.1 Secure storage of the key in secondary memory The key is stored in the unused portions of the last sectors of files on the file system by writing to the raw disk interface via /dev. The area in which the key is scattered is the scattered array. The executable code for crypto operations assumes that the information about the location of the disk blocks containing the key is present at a known location in its program memory and the scattered key array is also present somewhere in the data section of the program memory. It contains an empty section which is filled with the key fetching code (bitaddress fetching block) during installation. We refer to the location of a bit of the key in scattered array as bit-address. If bit 0 of the key is present at the 11th position of the scattered array then the bit-address of bit 0 is 11. Instead of storing the bit-address pattern, the bit-address fetching block generates the address of each bit at run time. The execution of bite-address fetching block results in the formation of the bit-address in a predetermined register/memory address which is then used for accessing the bit from the scattered array. Construction of the block is done by first setting a min and a max bound for the number of operations to generate an address. If N operations generate a bit address, then min ≤ N ≤ max. Each operation is one of seven basic mathematical operations (add, subtract, multiply, divide, and, or, not). N is not a fixed number. During bit address fetching block generation, the values of registers and memory locations used is initialized and one operation is chosen to be performed. The result of this operation is tallied against the required value. If the desired value is not reached, further operations are applied. Once we reached the desired value, and min ≤ N ≤ max, we stop code generation. If N < min then we continue to add mutually cancelling operations till we obtain the desired value, if max < N, we restart the block generation. 4.2.2 Protecting the Bitaddress fetching blocks The location of the bit-address fetching block can be revealed by the presence of a jump/call instruction to this block. To prevent this, the target address for the call to the bit-address fetching block is generated during execution. The location of the block which calculates the target address is also randomized in the binary, and padded with many junk calculations that do not affect the outcome. To prevent any malicious code from executing the fetching blocks we self attest the running image of the executable. This code computes hashes on sections of its process image and compares the results with the expected results already hardcoded inside it. The attestation covers the fetching blocks and the application within which it is executed. We use a simple inline hash function to avoid the hash call from being observed on process tracers and tools like objdump. It is difficult for an attacker to change the hash values stored in the binary as they are stored in different locations in each installation. 4.2.3 Analysis of system This portion of the research was implemented on Ubuntu 8.04 Linux OS. We used a 32 bit key, with 2 self-attestation blocks. The size of each bit-address fetch block was set as 80 bytes. We used a scatter array size of 2K bits, and the application was 14K bytes in length. This implementation uses a few simple obfuscation techniques to keep the focus on our main idea and to keep the implementation simple. Table 4 provides the critical location information along with the information of the attestation blocks. It shows that even with the margin of as small as 0x50 bytes, we obtained a good measure of randomness in the system. This shows that any malware which tries to attack the application will find it difficult to perform remote analysis and use the information gained to attack another instance of the installation of the same application. 5. Randomization of memory layout Most attacks are successful due to the lack of genetic diversity among computer systems. An attacker can discover vulnerabilities in binaries and use them to engineer attacks on multiple machines. If the memory layout in each copy of the binary was different on every machine, it would make it extremely hard for the attacker to launch attacks. Figure 5 shows the layout of stack in the Intel x86 architecture. The relative location of the return address on the stack for each frame remains a constant for every function instance. This way an attacker can determine the number of bytes of offset from the current location of memory where the return address is located, and launch buffer overflow and stack smashing attacks. Similar techniques exist for attacking the heap frame. This can lead to the program executing any function present in the process space as specified by the attacker. In case of kernel threads; every function present in the operating system is accessible to the attacker. Most overflow attacks can be stopped if the relative locations of addresses are different in every instance of the binary. The objective of this section of the paper is to randomize the stack frame and the heap frame. The stack frame is randomized post-compilation, without compiler support and without the availability of source code. The heap frame is randomized by changing the system library code and the kernel code. 5.1 Randomization of the stack frame We randomize the size of the run-time stack frames to make every copy of a binary unique. The binaries are instrumented by analyzing the disassembly of the code segment in a binary. We do not inject additional bytes of code in the binary but rewrite existing bytes in the code segment. We do not require source code access for the binary. The randomization process is carried out at the end-users machine. Since the randomization is carried out at the end-user machines, it is required that the randomization procedure is simple, low-cost and efficient. We are mainly concerned with the scenario where attackers remotely target users using software that has a discovered or unknown vulnerability. 5.1.1 Analysis of disassembly for randomization Consider the C code snippet shown in figure 6 and its disassembly in the x86 architecture. The routine foo receives one integer argument, has two local elements which need space on the stack. The compiler allocates 1024 + 4 bytes on the stack for these two elements. The C library function gets, which is known to be vulnerable to write beyond buffer bounds, is used to take input from the user to store in the character buffer. This makes the character array vulnerable to a buffer overflow attack. During randomization, only those instructions that are relevant to the run-time stack need to be rewritten. By shifting the vulnerable character buffer down by a random amount, the distance between the return address and the buffer becomes different for every copy of the binary. This makes it impossible to use the same attack string against different copies of the binary. This increases the cost of devising an attack and reduces the propagation rate of a discovered vulnerability providing a larger window of opportunity to develop a patch for the security loop-hole. Instructions of the following type will need to be rewritten in the binary if we add a random pad: a. Instructions that create space on the stack frame for local variables and buffers. b. Instructions that deallocate space used by the locals of the function on the stack frame. These instructions are executed right before the function returns. In case of foo, the stack deallocation is done implicitly by the leave instruction that restores the stack pointer to the frame pointer and hence we don’t need to explicitly modify any instruction for correct deallocation of the random pad memory. c. Instructions that access local variables and buffers on the stack frame. All local variables and buffers are accessed with the help of the frame pointer EBP. All stack locals are located below the frame buffer at lower addresses in the x86 architecture. Because of the random pad, the local buffers have shifted further down from the frame pointer. All the local variables will shift downwards by the same amount. 5.1.2 Implementation The prototype randomizer has been developed in C and compiled using GCC. We have used the objdump disassembler. Figure 7 shows the flow of our randomizer. The binary file is fed to the disassembler. The output of the disassembler is parsed for identification of instruction operands that need to be modified in the binary. Before feeding the disassembly output to the parser, the grep utility is used to extract only those instructions that are relevant to stack frames. The parser separates out and analyzes each sub-routine in order to accomplish fine-grained randomization such that every function is padded separately with a random padding conforming to the constraints of that specific function. Thereafter these instructions are directly rewritten in the binary to change the layout of the stack frames at run-time. Instructions that create space on the stack frame subtract a specific constant value from the value of the stack pointer. The instructions that we are looking for are of the form “sub $0x#,%esp”, where # is a constant number determined by the compiler as per the requirements of the function. To restore the stack, the compiler usually adds the same constant value to the stack pointer at the end of a function that was used during allocation. All the references to the local variables are done with the help of a negative offset from the base pointer EBP. The prototype works as a 2-pass randomizer. In the first pass, each sub-routine is analyzed to determine the maximum padding that can be provided to that routine. Every instruction in the routine that accesses memory regions has an upper limit on the relative address that can be accessed by it. We process every instruction and check the maximum available random pad to that instruction. The least of these values becomes the pad for the function. The randomizer also looks for instructions that are sensitive to alignment of memory operands and takes a conservative approach of not randomizing sub-routine with very sensitive instructions. The random pad is then clipped to the nearest factor of 32 to resolve many alignment requirements of several instructions. In certain cases it is also necessary to place an upper-limit on the maximum padding given to each sub-routine as it increases the chances of a stack overflow causing the process to crash. In the second pass, the randomizer goes through the instructions in the disassembly and locates them in the executable binary file. While tracing every instruction the randomizer also keeps track of the sub-routine in which the instruction is present. With the help of the data structure built for every sub-routine during the first pass, the randomizer statically rewrites and instruments the corresponding instruction in the binary executable. 5.1.3 Analysis We randomized copies of the following applications: Open Office, pidgin, pico, ls, gcc, netstat, ifconfig, route, xcalc, tail, date, nslookup, sum, head, wc, md5sum, pwd, users, cat, cksum, hostid, logname, echo, size, firefox, viewres, xwininfo, oclock, ipcs, pdfinfo, pdftotext, eject, lsmod, clear, vlc, and gij. Thus we cover both “console” applications and graphics applications. It is clear that we have a proof of concept implementation that can cover all applications that we tested it on. All the binaries used in testing were release quality, optimized utilities that are part of Linux distributions. A small list of binaries that were successfully randomized is shown in Table 5. Our experiments show that the randomization cost itself is significantly reasonable and low. Since we only manipulate the size of the run-time stack, we did not expect this approach to have any run-time penalty whatsoever. The results of our experiments, in terms of the difference in the execution time of the original binary and the instrumented binary prove that the randomized binaries have exactly the same run-time efficiency as their original unrandomized counterparts. We found that on an average the randomizer modified the run-time stack of more than 75% of the sub-routines in every application. Some of the routines are not randomized as we take a conservative approach to not make any changes to routines containing alignment sensitive instructions such as FXSAVE. We are also restricted by the length of the stack allocation instruction as we do not inject additional bytes into the program. If the width of the operand on the stack allocating instruction is only one byte, we can allocate a maximum of 128 bytes of stack with such an instruction. If the routine already allocates 128 bytes of stack, then its stack frame cannot be randomized. 5.2 Heap frame randomization In this section we randomize the size of the heap chunk returned per every allocation request generated by an application. We single out the library functions that play a vital role in heap memory management (the functions that perform the free, allocate and resize operations) and hook the entry point to these functions so that we can add our randomization code to them. This helps in defeating attacks that target buffer overflows, in quite an effective fashion. Access to the source code of an application is not required for this solution since it patches the underlying heap memory management mechanism itself either statically or during run-time. We adopt a dual random padding strategy per every memory allocation. This is done by appending a random pad below as well as above the pointer to the heap memory chunk returned by the allocation algorithm. Figure 8 gives a view of this. We implement our approach by identifying the memory management functions to be patched in the GNU C library. The most important of these functions that we identified are – malloc(), free(), realloc() and memalign(). Other related functions which we identified are calloc(), valloc() and pvalloc() which need not be patched as they are based entirely on malloc() in the current version of the GNU C library. 5.2.1 Implementation malloc(): When a function call to the malloc() function is made, it is first intercepted by its public wrapper function. We insert following operations before the call to the internal malloc() is made. We generate two random integers i and j, which are multiples of 8 to respect the internal memory alignment rules. The upper limit of the random numbers generated can be selected heuristically. These two random integers are added to the size parameter contained in the original request, making an allocation call for i+ j+ original request. A successful malloc() operation returns the pointer to a newly allocated memory chunk. The value of the pointer returned to the calling function (user application) is shifted by i bytes. We store these two random numbers so that other memory management functions like free() and realloc() can know this random padding value in order to calculate the actual starting address of the memory chunk and thus the boundary tag information stored above it. Just like the boundary tag information is stored above the actual starting address of the memory chunk user space, we store the random integer i just above the new shifted starting address computed earlier. free(): Once a call to free() is made by the requestor function, execution enters the free() public wrapper. Here, just like in malloc() public wrapper, it is faced with the situation of calling the hook function before proceeding with the actual free() operation. If the hook is not set, we first extract the value i set by malloc() which lies just above the chunk address passed as an argument to free(). Using this value, we can calculate the original starting address of the memory chunk’s user space. This value thus can be passed to the internal free() function called here forth. realloc(): The realloc() public wrapper must assume the responsibility of performing pointer retrieval operation. The original chunk pointer must be available before we can go ahead with the actual reallocation operation. The reallocation operation can either shorten the memory buffer or can elongate it based on the polarity of the difference between the current chunk size and new requested size. In the case of elongation, it tries to achieve the same by calling the internal realloc() function. In an event that this operation fails, the public wrapper tries to allocate a new memory chunk altogether with the help of malloc() and then before returning this pointer to the requestor, it copies the data from the current memory chunk to the new one. With randomization introduced, this operation does not remain to be a straightforward copy-paste operation. The random factors (i and j) generated for the new memory chunk are most likely different from those of the old ones. By performing appropriate calculations and adjustments, we make sure that the copy operation starts from the old chunk’s shifted pointer location to the new chunk’s shifted pointer location. memalign(): memalign() function essentially is malloc(), but with memory alignment constraints. We must make sure that the random value i is a multiple of the alignment factor x passed as an argument to the memalign() function. In some cases, alignment request made is such that the buffer needs to be aligned by a page or more (such as 512 bytes or more). Padding by larger multiples of such large x values can prove to be quite costly. Hence, we keep the value of i to be x or 2x in such cases. 5.2.2 Analysis We evaluated the heap frame randomization on Ubuntu 8.04. We used a tool known as Unixbench version 4.1. We were successfully able to run applications such as web browsers, office suites, games, graphics processing utilities, etc. using our library patch, without any exceptions. When running applications with our randomization library patch, we did not notice any significant hit on their run-time efficiency. The results of run-time performance tests can be seen in table 6. 6. Conclusion In this paper we presented solutions to detect the presence of compromised binaries, storing secret keys in memory, and modifying the memory layout of binaries entirely in software. These three scenarios represent most of the attacks that occur in the computing world. Mitigating these situations allows us to improve the security of end user platforms. These techniques can be used independently and are not dependant on the software vendors to provide source code of their products. Future work would be to extend the techniques described above to enhance the security of the operating system itself by randomizing the OS and/or providing remote attestation of the OS code. Such attempts would of course raise other issues such as a variance in the OS code due to hardware differences amongst makes and models of computers. Work is needed to study the techniques of combining the approaches into a single system with multiple defense mechanisms. Randomization of the address space does imply changes to the code itself and hence code on different machines may have different MD-5 hashes and these have to be traced by the remote attester. References Basili V.R, and Perricone B.T. Software errors and complexity: an empirical investigation 0, Communications of the ACM (1984). 27-1, pp. 42 – 52. Bhatkar S, DuVarney D. C, and Sekar R. Address obfuscation: an efficient approach to combat a broad range of memory error exploits. 12th USENIX Security Symposium, August 2003. Canetti R, Dodis Y, Halevi S, Kushilevitz E and Sahai A. Exposure-resilient functions and all-or-nothing transforms, in Advances in Cryptology – EUROCRYPT, 2000, pp. 453-469. Durden T. Bypassing PaX ASLR Protection. Phrack Inc., 2002 Available from URL http://www.phrack.org/issues.html?issue=59&id=9#article Eichin M and Rochlis J. With microscope and tweezers: an analysis of the internet virus of november 1988. Proceedings of the IEEE Symposium on Security and Privacy, pages 326–343, May 1989. Forrest S, Somayaji A, and Ackley D.H. Building diverse computer systems. HOTOS ’97: Proceedings of the 6th Workshop on Hot Topics in Operating Systems (HotOS-VI), pages 67–72, May 1997. Foster J, Osipov V, Bhalla N, and Heinen N. Buffer Overflow Attacks: Detect, Expolit, Prevent. Syngress, 2005. Garay J.A and Huelsbergen L. Software integrity using Timed excutable agents, in: Proceedings of the 2006 ACM Symposium on Information, computer and communications security (2006), pp. 189 – 200. Garfinkel T, Pfaff B, Chow J, Rosenblum M, Boneh D. Terra: A virtual machine-based platform for trusted computing, ACM SIGOPS Operating Systems Review, pages 193 – 206, 2003 Goldman K, Perez R, and Sailer R. Linking remote attestation to secure tunnel endpoints, STC '06: Proceedings of the first ACM workshop on Scalable trusted computing, pages 21 - 24, 2006. Harrison K. Protecting Cryptographic Keys from Memory Disclosure Attacks. 37th Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp. 137-143, 2007. MacKenzie P. Networked cryptographic devices resilient to capture, IEEE Symposium on Security and Privacy, pp. 12-25, 2001. Sahita R, Savagaonkar U, Dewan P, and Durham D. Mitigating the Lying-Endpoint Problem in Virtualized Network Access Frameworks, Springer: Managing Virtualization of Networks and Services, pages 135 – 146, 2007. Sailer R, Zhang X, Jaeger T, and Van Doorn L. Design and implementation of a TCGbased integrity measurement architecture, Proceedings of the USENIX Security Symposium (2004), pp. 223-228. Seacord R. Secure Coding in C and C++. Addison-Wesley, 2005. Seshadri A, Luk M, Shi E, Perrig A, Van Doorn L, and Khosla P. Pioneer: Verifying code integrity and enforcing untampered code execution on legacy systems, ACM SIGOPS Operating Systems Review (2005), vol 39 -5. Shacham H, Page M, Pfaff B, Goh E, Modadugu N, and Boneh D. On the effectiveness of address-space randomization. CCS '04: Proceedings of the 11th ACM conference on Computer and communications security, pages 298-307, 2004. Shamir A, and van Someren N. Playing "Hide and Seek" with Stored Keys, Third International Conference on Financial Cryptography, pp. 118-124, 1999. Srinivasan R, and Dasgupta P. Towards more effective virus detectors. Communications of the Computer Society of India, vol 31-5, pages 21-23. August 2007. Stumpf F, Tafreschi O, Röder P, and Eckert C. A robust integrity reporting protocol for remote attestation, WATC’06: Second Workshop on Advances in Trusted Computing, 2006. Viega J and McGraw G. Building Secure Software. Addison-Wesley, 2002. Wang L and Dasgupta P. Kernel and application integrity assurance: Ensuring freedom from rootkits and malware in a computer system, in: Advanced Information Networking and Applications Workshops (2007), pp. 583 – 589. Xu J, Kalbarczyk Z, and Iyer R. Transparent Runtime Randomization for Security. Proceedings 22nd International Symposium on Reliable Distributed Systems, pages 260269, 2003. Web link 1: ASLR: Address space layout randomization, Retrieved on April 25 2010, http://pax.grsecurity.net/docs/aslr.txt Web link2: nCipher Solutions, Retrieved on April 20 2010, http://iss.thalesgroup.com/Resources/Product%20Data%20Sheets/keyAuthority.aspx Web link 3: VLC media player source code FTP repository. Retrieved on February 24 2010, http://download.videolan.org/pub/videolan/vlc/ Web link 4: HD-DVD Content Protection already hacked? Retrieved on April 4 2010, http://www.techamok.com/?pid=1849 Trent Alice Application P injected code C Figure 1: Overview of Remote attestation 1. Alice Trent Verification Request 2. Trent Alice Inject code at location and execute it 3. C Trent Machine Identifier 4. Trent C Proceed 5. C Trent Initial Checksum 6. Trent C Proceed 7. C Trent MD5 of specified regions 8. Trent C Proceed 9. C Trent Test of correct PID 10. Trent C Proceed/Halt Figure 2: Detailed steps in Remote Attestation processs Guest User space – Ring 3 (Untrusted) All user services such as ftp, http etc run here 1 … … Guest… Kernel space – Ring 1 (Untrusted) … … to the network, vulnerable to attacks Connected … .. 2 4 5 6 10 9 VMM – Ring 0 (Trusted) Trusted component in the system running at highest privilege 3 DES with Key 7 8 Figure 3: Key storage and attestation model 1 Bit 1 2 3 4 5 Bit 128 ......... 11 ......... Bit 0 Key Bit 0 is stored at location 11 Key Bit 1 is stored at location 1 Key Bit 24 is stored at location n ....... Key Bit 128 is stored at location 5 Figure 4: Scattered Array n Bit 24 Inactive Frame Return link to N - 3 N-2 Data Inactive Frame Return link to N - 2 N-1 Data Active Frame Return link to N - 1 N-1 Data Available Stack Figure 5: Layout of stack in the Intel architecture void foo(int dummy_arg) { char buffer[1024]; int local_variable; gets(buffer); local_variable = dummy_arg; } 080483c4 <foo>: 80483c4: 55 80483c5: 89 e5 80483c7: 81 ec 18 04 00 00 80483cd: 8d 85 fc fb ff ff 80483d3: 89 04 24 80483d6: e8 ed fe ff ff 80483db: 8b 45 08 80483de: 89 45 fc 80483e1: c9 80483e2: c3 Figure 6: Sample C routine and its disassembly push %ebp mov %esp,%ebp sub $0x418,%esp lea -0x404(%ebp),%eax mov %eax,(%esp) call 80482c8 <gets@plt> mov 0x8(%ebp),%eax mov %eax,-0x4(%ebp) leave ret Extract relevant instructions Binary file Disassembler Sub routine separation and instruction analysis Parser Randomizer Figure 7: Workflow of the randomizer Randomized binary Chunk pointer Previous chunk size Previous chunk size Size/Status (In use) Size/Status (In use) Chunk pointer USER DATA Random Pad 1 USER DATA Random Pad 2 Previous chunk size Previous chunk size Size/Status (In use) Size/Status (In use) Forward pointer Forward pointer Backward pointer Backward pointer Unused memory Unused memory Memory Chunk without randomization Randomized memory chunk Figure 8: Allocated memory chunk with dual random padding Machine Pentium 4 Quad Core Test generation 12.3 5.2 Compilation time 320 100 Total time 332 105 Table 1: Average code generation time in milliseconds on server end for Intel Pentium 4 and Core 2 Quad machines for one instance of the measurement Machine Pentium 4 Quad Core Server side execution time (ms) 0.6 0.4 Client side execution time (ms) 22 16 Table 2: Time (milliseconds) to compute the measurements on server and the client Input size in bits 64 128 192 Time taken for DES (ms) Host OS Guest OS 8 31 19.8 27.9 24.9 35.6 Time taken for RSA (ms) Host OS Guest OS 322 340 358 373 394 408 Table 3: Operating time on guest and host operating systems Position of Fetch blocks 0x21B7 Location of Jump to fetch blocks 0xCF7 Locations of Attestation blocks Attestation range Hash Value for attestation Total size of all fetch blocks 0xA40 0x2317 Start 0x0 End 0x13BF 0x67977 0x255D 0x203 0x38e4 0x145F0A 0x21DA 0x21B1 0x21C0 0x21B2 0xCC5 0xCF2 0xCBB 0xCFD 0x23A3 0x0 0x9B4 0x30697 0x25C6 0x1288 0x38e4 0xE5BF3 0x2248 0x0 0xCBA 0x489E0 0x2594 0x9DE 0x38e4 0x115E85 0x23C7 0x0 0xAF4 0x3ACCA 0x2542 0xFE9 0x38e4 0xF08AE 0x21EC 0x0 0x139C 0x66C3E 0x112E 0x38e4 0xEAE06 0x2232 0x0 0x3A0 0x63B7 0x23AD 0x965 0x38e4 0x11A6EF 0x244E 0x0 0xD3D 0x4AB06 0x272E 0x49E 0x38e4 0x13D852 0x2497 0x219D 0xCEB 0x21CE 0xCE5 Table 4: Data from different instances of installations 0xA03 0xA86 0xA1F 0xA77 0xA2D 0xAA2 Binary Open Office pidgin gcc route xcalc echo firefox vlc eject lsmod Time taken to randomize (sec) 20.22 % of subroutines randomized Original stack (bytes) Instrumented Overhead stack (bytes) (bytes) 88.33 21,256 23,170 1,914 54.718 1.836 0.505 0.109 0.137 1.2 0.057 0.435 0.397 91.48 91.77 93.75 100 100 100 100 75 100 16,093 1,829 2,183 11,372 1,018 27,890 6,366 20,611 4,414 16,407 2,009 2,614 11,592 1,079 28,216 6,688 21,431 4,473 314 180 431 220 61 326 322 820 59 Table 5: Performance Test Results for stack randomization Test Executed without the library patch (seconds) Executed with the library patch (seconds) Dhrystone 2 with register variables Whetstone Double Precision Execl Throughput File Copy (buffer size = 1024 Max. number of blocks = 2000) File Read (buffer size = 256 Max. number of blocks = 500) File Write (buffer size = 256 Max. number of blocks = 500) File Copy (buffer size = 256 Max. number of blocks = 500) File Read (buffer size = 4096 Max. number of blocks = 8000) File Write (buffer size = 4096 Max. number of blocks = 8000) File Copy (buffer size = 4096 Max. number of blocks = 8000) Pipe Throughput Pipe-base Context Switching Process Creation System Call Overhead Shell Script (8 concurrent) 10 10.2 29.3 30 10.1 10 29.8 30 30 30.1 30 30 30 30 30 30 30 30 30 30 10 10 30 10 60 9.9 10 30 10 60.2 Table 6: Run-time performance tests for heap randomization on Linux Ubuntu system