Title: Multi factored approach towards malware resistance

advertisement
Title: Multi factored approach towards malware resistance
Authors: Raghunathan Srinivasan (corresponding author), Partha Dasgupta, Sujit
Sanjeev, Jatin Lodhia, Vivek Iyer, Amit Kanitkar
Affiliation:
Address:
Email:
Phone:
Fax:
School of Computing, Informatics and Decision Systems Engineering,
Arizona State University, Tempe, AZ, USA
raghus@asu.edu
(1) 480-965-5583
(1)-480-965-2751
Abstract:
Protecting the integrity of consumer platforms in unmanaged consumer computing
systems is a difficult problem. Attackers may execute buffer overflow attacks to gain
access to systems, patch on existing binaries to hide detection, and steal sensitive secrets.
Every binary has inherent vulnerabilities that attackers may exploit. In this paper three
orthogonal approaches are presented to improve the security of platforms. Each of the
approaches provides a level of assurance against malware attacks beyond virus detectors.
These approaches can be used independent of each other as well as combined to achieve
the desired level of protection. The approaches can be added on top of normal defenses.
This work attempts to find alternate solutions to the problem of malware resistance. The
approaches we use are: adding diversity or randomization to data address spaces, hiding
critical data to prevent data theft and the use of remote attestation to detect tampering
with executable code.
Keywords:
Computer Security, Attacks, Remote Attestation, Integrity measurement, Virtual Machine
Monitors, Secure key storage in memory, Memory Randomization.
1. Introduction
The magnitude of the threat from malware, especially on “consumer computing”
platforms is well known and well understood. Malware today can hide from virus
detectors, steal secrets, live stealthily for extended periods of time, effectively prevent
removal efforts, and much more. The ability to run sensitive applications and store
sensitive data on consumer platforms without having to trust the platform (and without
using trusted hardware modules) is very critical. The platforms hosting such applications
can be subject to a variety of attacks, leading to the danger of data leakage, information
theft, modification of functionality, and a variety of possibly damaging losses.
Attestation of client computers using hardware attestation modules, or using hypervisors
to scan computers have not had much success due to the cumbersomeness of the
solutions.
A smartly designed malware can have more power than any other application in the
system [Srinivasan and Dasgupta, 2007]. A complete silver bullet solution to security
problems is difficult to achieve [Basili and Perricone, 1984]. However the risks can be
practically mitigated if there are mechanisms that ensure verifiable executions, check on
the integrity of the application, and have isolation techniques that make information
stealing difficult. We outline three orthogonal schemes in this paper; each of which
provides a level of assurance against malware attacks beyond virus detectors. The
approaches can be added on top of normal defenses and can be combined for tailoring the
level of protection desired. This work attempts to find alternate solutions to the problem
of malware resistance. When combined, these techniques will provide adequate
guarantees against tampering of applications by existing malware. We implement
techniques entirely in software to determine the integrity of an application on an untrusted machine, techniques to hide secrets in end user systems, and techniques to
randomize memory layout of binaries. These three scenarios cover the motivations for an
attacker to compromise an end user system. By mitigating these three factors, we can
increase the security of systems.
The first approach is Remote Attestation. Remote Attestation is a set of protocols that
uses a trusted service to probe the memory of a client computer to determine whether one
(or more) application has been tampered with or not. These techniques can be extended
to determine whether the integrity of the system has been compromised. While the idea
sounds easy, given the power of the adversary (malware), a very careful design has to be
done to prevent the malware from declaring to the server that the system is safe. Remote
Attestation has been implemented in hardware and software. Hardware based schemes
have pros and cons, software based solutions for Remote Attestation involve taking a
mathematical or a cryptographic checksum over a section of the program section. The
solution in this research can provide tamper proof measurements entirely in software
from the user space even if a smart malware attempts to manipulate system call
responses, masquerading, and infects multiple programs including the kernel on the client
machine.
The second approach is to build obfuscation and shielding methodology to make stealing
secrets from client machines harder. For example, memory in client applications holds
encryption keys, passwords, sensitive data, and private keys in case of PKI
implementations. Stealing secrets by copying zones of memory is particularly simple
(keys for example, have high entropy). We provide two approaches to hide keys more
effectively. The first method involves the use of a virtual machine monitor (VMM), and
the second method scatters keys on the raw disk space.
The third approach is the ability to provide “software diversity” for legacy software.
Currently a malware designer can perform offline analysis of an application to discover
vulnerabilities in it. These vulnerabilities can be exploited to launch various kinds of
attacks on multiple systems. Every copy of an application that is shipped to consumers is
exactly the same, and contains the same weaknesses in the same binary locations.
Software diversity breaks up the uniformity making each instance of the application
different, and attacks that work on one instance do not work on another. ASLR [Web link
1] is an example, but we take that idea to finer degrees of granularity on each stack frame
and heap frame. The first technique randomizes the structure of stack on each copy of an
application binary to prevent stack overflow based attacks; the second technique
randomizes the structure of allocated heap memory to prevent heap based overflow
attacks.
Our methods have been implemented completely in software. We opine that such
approaches, judiciously combined with traditional malware prevention methods, can
make computing safer without adding much overhead to the applications and operating
systems. In the remainder of the paper, we present work related to our approaches and
provide brief overview and implementation details of our approaches.
The rest of the paper is organized as follows, section 2 presents the related work for all
these topics, section 3 presents the remote attestation technique, section 4 presents key
hiding, and section 5 presents stack and heap randomization.
2. Related work
Integrity measurement involves checking if the program code executing within a process
or multiple processes is legitimate or has been tampered. It has been implemented using
hardware, virtual machine monitors, and software based detection schemes. Some
hardware based schemes operate off the TPM chip provided by the Trusted Computing
Group [Goldman et all., 2006 ; Sailer et all., 2004; Stumpf et all., 2006]. The hardware
based schemes allow a remote agent to verify whether the integrity of all the programs on
the client machine is intact or not. The kernel executing on the client takes measurements
when a program is first executed and provides it to the TPM which signs the values with
its private key. The signed value is then sent to the remote agent that verifies the
signature and the values generated. This scheme cannot measure malicious code which
infects running programs and does not infect the file system. This scheme has another
drawback that a compromised kernel may provide incorrect values to the TPM to sign.
Hardware based schemes also suffer from the fact that checksums are located in the
hardware and cannot be updated easily, and the hardware has to be physically replaced,
and hence the use of a secure co-processor which is placed into the PCI slot of the
platform is recommended [Wang and Dasgupta, 2007]. Terra uses a trusted virtual
machine monitor (TVMM) and partitions the hardware platform into multiple virtual
machines that are isolated from one another [Garfinkel et al., 2003]. Hardware
dependent isolation and virtualization are used by Terra to isolate the TVMM from the
other VMs. Terra relies on the underlying TPM to take some measurements, and hence
is unsuitable for legacy systems.
In Pioneer [Seshadri et al., 2005] the integrity measurement is done without the help of
hardware modules or a VMM. The verification code for the application resides on the
client machine. The verifier (server) sends a random number (nonce) as a challenge to
the client machine. The response to the challenge determines if the verification code has
been tampered or not. The verification code then performs attestation on some entity
within the machine and transfers control to it. This forms a dynamic root of trust in the
client machine. Pioneer assumes that the challenge cannot be re directed to another
machine on a network, however, in many real world scenarios a malicious program can
attempt to redirect challenges to another machine which has a clean copy of the
attestation code. In its checksum procedure, Pioneer incorporates the values of Program
Counter and Data Pointer, both of which hold virtual memory addresses. An adversary
can load another copy of the client code to be executed in a sandbox like environment
and provide it the challenge. This way an adversary can obtain results of the computation
that the challenge produces and return it to the verifier. Pioneer also assumes that the
server knows the exact hardware configuration of the client for performing a timing
analysis; this places a restriction on the client to not upgrade or change hardware
components. In TEAS [Garay and Huelsbergen, 2006] the authors propose a remote
attestation scheme in which the verifier generates program code to be executed by the
client machine. Random code is incorporated in the attestation code to make analysis
difficult for the attacker. The analysis provided by them proves that it is very unlikely
that an attacker can clearly determine the actions performed by the verification code;
however implementation is not described in the research, and certain implementation
details often determine the effectiveness of a particular solution.
Many approaches have been developed to ensure secure software based key management.
Centralized key storage employs techniques where the generation, storage, distribution,
revocation and management throughout the lifetime of the key happens on a single
central machine. This offers advantages such as ease of backup and recovery and
securing a single machine secures the keys. One such commercially available product is
Thales keyAuthority[Web link 2]. Secondary storage or a detachable device has also
been used to hide keys by encrypting the key with a very strong password, but this can be
attacked using a key logger that logs typed passwords on the system [Shamir and van
Someren, 1999]. In the same research another method is presented to store keys in the
system and that is to break the key into multiple parts and distribute it in different places.
Distributing the key reduces the memory entropy footprint making it harder to detect the
pieces that comprise the key. Another solution for key management is distributed key
storage using secret sharing [Canetti et al., 2000]. This could be an option for large
organizations, but it is not feasible for normal end users of cryptography. Reducing the
number of key copies present in memory is another method to protect cryptographic keys
from memory disclosure attacks. This avoids caching of the key by the operating system
and also disallows swapping of the memory area of the key. It is concluded that though
the number of copies of keys present is reduced, once a sufficiently large memory dump
is obtained; there are high chances that it would contain the single copy of key. It is then
suggested that in order to eliminate leakage of key via memory disclosure attacks,
consumers would have to resort to special hardware devices [Harrison, 2007].
Networked cryptographic devices in which the cryptographic operations are performed
by a remote server have also been researched. Even if the networked device where the
cryptographic operations are performed is compromised, the attacker cannot derive the
entire key from it [MacKenzie, 2001]. The solutions proposed in this paper try to prevent
key theft from memory disclosure attacks, irrespective of the number of copies of the key
present in memory, and prevents key exposure altogether.
Buffer overflow is a very commonly used form of attack. The first known documented
buffer overflow attack dates back to November 1988, when a worm attacked the Internet
which was, at that time, a collection of 60,000 computers implementing the TCP/IP
protocol suite [Eichin and Rochlis, 1989]. It has been found that buffer overflows
constitute more than 50% of all major security bugs that are published as advisories by
CERT [Viega and McGraw, 2002]. There are several variants of the buffer overflow
attacks like stack overflows, heap corruption, format string attacks, integer overflow and
so on [Foster et al., 2005]. ]. C and C++ are very commonly used to develop
applications; due to the efficient “unmanaged” executions these languages are not safe.
A vast majority of vulnerabilities occur in programs developed with these languages
[Seacord, 2005]. Randomization is a technique to inject diversity into computer systems.
The first known randomization of stack frame was proposed by placing a pad of random
bytes between return address and local buffers [Forrest et al., 1997]. Random pads make
it difficult to predict the distance between buffers and the return address on the stack. An
attacker has to launch custom attacks for every copy of the randomized binary. Address
obfuscation has extended the above idea by randomizing the base address of memory
regions, permuting the order of elements in a binary, and introducing random gaps within
memory regions [Bhatkar et al., 2003]. Address Obfuscation, does not require any
changes to be made to the operating system or the compiler. Unlike the operating system
randomization solutions, this technique is probabilistic is nature, i.e. it gives a partial, but
high amount of randomization to the code and data segments of a binary executable. The
major randomizations that are proposed in this technique are (1) randomization of the
base memory address for the stack and heap segments, dynamically loaded libraries,
routines and static data, (2) permutations on the order of variables and routines and (3)
introduction of random gaps between memory objects like stack frames, heap allocations,
static variables. Linux randomizes kernel address space by a methodology known as
Address space layout randomization (ASLR). ASLR changes the start address of the
stack and the heap every time an application is loaded. However, the program structure
and the data layout remains the same. ASLR approach modifies an operating system so
that the base address of various segments of a program is randomly relocated during the
program execution. Two such popular ASLR implementations named PaX and
ExecShield are available for Linux as kernel patches. These approaches do not require the
modification of individual binaries, but necessitate that these binaries be compiled with a
feature called Position Independent Executables (PIE). It has been demonstrated that
PaX ASLR only marginally slows down the time taken to attack a system [Durden, 2002
and Shacham et al., 2004]. They explain techniques for bypassing ASLR protection and
demonstrate a de-randomization attack on a known buffer overflow vulnerability which
takes barely 216 seconds to succeed. It has been observed that regular executables are
faster than their PIE counterparts. This is because for PIE executables, relative offset
calculations need to be done on-the-fly rather than being available in pre-computed and
pre-linked form. In Transparent Run-time Randomization (TRR) [Xu et al., 2003] the
loader relocates various executable segments, shared libraries and object modules in the
user address space of the process. TRR initially allows the system to set up sections of
the user address space like the stack, heap, data segment, and so on. TRR moves the
heap from its original base to a new base by adding a random number of addresses to the
original heap base. TRR relocates the stack by creating a new stack segment below the
current one and moving the stack pointer from the old stack to this new stack. This
procedure occurs every single time a process is executed, thus giving different amounts
of randomization during each execution. Thus, by relocating various sections of the user
address space, TRR makes exploiting vulnerabilities a more challenging task than before.
The solutions presented in this paper randomize the structure of the stack for every
routine. This is done without access to the source code; we observe the disassembly of
every binary and add a random pad in every function by increasing the stack allocation.
This is done per binary and per function basis. We also randomize the heap frame by
allocating extra random memory on every instance of heap allocation.
3. Remote Attestation
Remote attestation is a framework for allowing a remote entity to obtain integrity
measurements on an un-trusted client machine. In order for remote attestation to work,
we need to have access to a remote verifier Trent that is a trusted (uncompromised) host
and is accessible over a network A single trusted host can be used for a large number of
clients; ensuring that sensitive applications running on the clients are not tampered by
malicious code on the client machines. In the consumer computing scenario, we envision
the deployment of “attestation servers”, where an end-user contracts with the service
provider to test the safety of the applications on the end-platform. This is similar to the
way virus detector updates are done today.
Remote Attestation has been implemented traditionally with the help of hardware
modules as discussed in section 2 and the use of VMM [Sahita et all., 2007] has also been
suggested. It involves the trusted server (Trent) communicating with the hardware device
installed on the client’s (Alice) machine. However, these modules are unsuitable for
legacy platforms, and have the stigma of Digital Rights Management attached with them.
The use of a VMM also requires greater hardware resources and compute power.
In our framework remote attestation is implemented entirely in software without kernel
support. Operating system support is not used in this framework as it would require a
secure OS, or a loadable kernel module that performs the attestation. The first scenario is
unlikely to occur, and the second scenario would require frequent human interaction to
load the kernel modules in the system (to prevent automatic exploits of the kernel loading
modules).
The approach taken in this paper is designed to detect changes made to the code section
of a process. Trent is a trusted entity who has knowledge of what the structure of an untampered copy of the process (P) to be verified. Trent provides executable code (C) to
Alice which Alice injects on P. C takes overlapping MD5 hashes on the sub-regions of P
and returns the results to Trent
A software protocol means that there exists opportunity for an attacker (Mallory) to forge
results. The attacker (Mallory) can perform a replay attack in which Trent is provided
with results that are the response to a previous attestation challenge. Mallory may tamper
with the results generated by the attestation code to provide the expected results to Trent.
Mallory may re-direct the challenge to another machine which executes a clean copy of
the application P, or Mallory may execute the challenge inside a sandbox to determine its
results and send the results to Trent. This paper addresses these by incorporating
specialized tests and generates random code to mitigate the effects of these attacks. We
obtain machine identifiers through system interrupts to determine whether the challenge
was replayed. We take measurements on the client platform that determine whether the
attestation code was executed in a sandbox. Lastly, we perform extensive randomization
of the attestation code by changing the arithmetic operations and memory locations read
by every instruction.
When P contacts Trent the remote attestation starts. Trent provides P with a binary
attester code C (that is signed by Trent). C is generated each time it is “provided” and is
composed of randomized and obfuscated binary, and hence it is difficult to reverse
engineer C. Since C is downloaded code, Trent has to be trusted not to provide malware
to Alice. P runs C, and C hashes the memory space of P in random overlapping sections
and then encrypts the hashes with a nonce that is contained in C and sent to Trent. The
nonce is located in a different location for each instance of C making it impossible for a
compromised P to mimic C’s behavior. When Trent gets the results from C it verifies
that P has not been tampered with and it is executing correctly. Now that we know P is
correct, P can be entrusted to verify the integrity of the security sensitive application that
execute on Alice. Figure 1 shows the overview of the framework.
3.1 Implementation
The remote attestation scheme was implemented on Ubuntu 8.04 (Linux 32 bit) operating
system using the gcc compiler; the application P and attestation code C were written in
the C language. Figure 2 shows the detailed steps in performing Remote Attestation. C
executes some tests on P to return an integrity measurement value M1 to Trent. Trent
executes the same set of results on a local pristine copy of P, which produces a
measurement value M0. Trent compares M1 and M0; if the two values are the same then
Alice is informed that P is clean. Trent has to be certain that C took its measurements on
the correct copy of P residing inside MAlice. To determine this, C executes some more
tests on MAlice, these checks ensure that C was not bounced to another machine, and that it
was not executed in a sandbox environment inside a dummy P process within MAlice.
Trent introduces the following conditions inside C to prevent Mallory from faking any
portion of the results. C computes a MD5 hash of P to determine if the code section has
been tampered. Downloading MD5 code is an expensive operation as it is huge, and
MD5 code cannot be randomized as it may lose its properties, hence the MD5 code
permanently resides on P. To prevent Mallory from exploiting this aspect a two phase
hash protocol is implemented. Trent places a mathematical checksum inside C which
computes the checksum on the region of P containing the MD5 executable code along
with some other selected regions. Trent receives the results of the arithmetic checksum,
verifies the results and sends a message back to C which proceeds with the rest of the
protocol if Trent responds in affirmative. The checksums are taken on overlapping sub
regions to make prediction of results more difficult for Mallory. This creates multiple
levels of indeterminacy for an attack to take place. Overlapping checksums also ensures
that if by accident the sub-regions are defined identically in two different versions of C,
the results of computation produced by C are still different.
To determine whether C was bounced to another machine, Trent obtains the address of
the machine that C is executing on. Trent had received an attestation request from Alice,
hence has access to the IP address of MAlice. If C returns the IP address of the machine it
is executing on, Trent can determine if both values are the same. Although IP addresses
are dynamic, there is little possibility that any machine will change its IP address in the
small time window between a request by Alice to measurements being taken and
provided to Trent. C determines the IP address of MAlice using system interrupts.
Mallory will find it hard to tamper with the results of an Interrupt. We assume that Alice
is not running MAlice behind a NAT and that the machine has only one network interface.
The reason to make these assumptions is that C takes measurements on MAlice to
determine if it is the same machine that contacted Trent. If MAlice is behind a NAT then
Trent would see the request coming from a router and measurements from MAlice.
Multiple interfaces would return multiple Internet addresses and make it difficult for C to
perform checks and return results to Trent.
To determine that P was not executed in a sandbox environment, C determines the
number of processes having an open connection to Trent on the client machine. This is
obtained by determining the remote address and remote port combinations on each of the
port descriptors in the system. C performs communication to Trent by using the socket
descriptor provided by P. This implies that in a pristine situation there must be only one
such descriptor on the entire system, and the process utilizing it must be the process
inside which C is executing. If there is only one such process, C computes its own
process id and compares the two values. We do not assume a compromised kernel. The
verification code C relies on the kernel only to handle the system calls executed through
interrupts and to read the file structure containing the open connections on the system.
There are many system call routines in the Linux kernel and monitoring and duplicating
the results of each of these may be a difficult task for malware. However, reading the
port file structure requires support from the operating system. We will assume that the
OS provides correct results when the contents of a directory and file are read out.
Without this assumption, Remote Attestation cannot be performed entirely without kernel
support.
3.2 Performance
We obtained the source code for the VLC media player interface [Web Link 3]. We
removed some sections of the interface code and left close to 1000 lines of C code in the
program. We took 2 pairs of machines running Ubuntu 8.04. One pair were legacy
machines executing on an Intel Pentium 4 processor with 1 GB of ram, and the second
pair of machines were Intel Core 2 Quad machines with 3 GB of ram. The tests
measured were the time taken to generate code including compile time, time taken by the
server to do a local integrity check on a clean copy of the binary and time taken by the
client to perform the integrity measurement and send a response back to the server.
To obtain an average measurement for code generation we executed components of the
program in a loop of 1000 times and measured the average time as reported in table 1.
We then executed the integrity measurement code C locally on the server and sent it to
the client for injection and execution. The time taken on the server is the compute time
the code will take to generate integrity measurement on the server as both machines were
kept with the same configuration in each case. These times are reported in table 2. It
must be noted that the client requires a higher threshold to report results because it has to
receive the code from the network stack, inject the code, execute it, return results back
through the network stack to the server. Network delays also affect the time threshold.
We can see from the two tables that it takes an order of a few hundred milliseconds for
the server to generate code, while the integrity measurement is very light weight and
returns results in the order of a few milliseconds. Due to this the code generation process
can be viewed as a huge overhead. However, the server need not generate new code for
every instance of a client connection. It can generate the measurement code periodically
every second and ship out the same integrity measurement code to all clients connecting
within that second. This can alleviate the workload on the server.
4. Secure key storage
Stealing secrets from memory of executing programs is an effective method for
circumventing security systems, especially encryption. Encryption keys have to be stored
as clear-text in memory when the application that performs the encryption executes.
Hence a crucial piece of information resides in memory; this information is susceptible to
memory forensics based attacks. For example, the AACS encryption for high-definition
DVD players uses a master key to keep other keys, and uses the unbroken AES
encryption method. It has been documented that a particular HD-DVD encryption key
was stolen from memory [Web link 4]. The encryption keys can also be stolen by
performing an entropy analysis. Encryption keys are known to have high entropy by
nature. In this paper we present two methods of safely storing encryption keys. This first
involves hypervisor support. The keys are placed in a hypervisor below the operating
system. This prevents any user application, kernel module, or the operating system itself
from stealing the key. The key computations are done within the hypervisor using calls
that allow an application to call the hypervisor. When an application requests a
decryption operation, the hypervisor does a remote attestation to ensure that it is a valid,
authorized and uncompromised application. The second method is disk striping. In this
method the keys are kept on disk every time the key is not actively in use (even if the key
handling application is running). The keys are split into tiny chunks of a few bits each
and placed in hidden blocks of the disk that are not part of the file system. The
randomized disk layout for keys is on a per system basis and uses a large number of
random numbers that are also stored on hidden parts of the disk. Special retrieval
routines are used to fetch and store keys and diversity techniques are used to make them
hard to attack. We provide an overview and implementation details of both methods
below.
4.1 Hypervisor based key storage
Virtual Machines (VM) are primarily used for executing multiple machines on a single
hardware platform. Virtual Machine Monitors (VMM) are also widely used to provide a
trusted computing base to security sensitive applications as they can provide logical
isolation between different VMs. We offload the cryptographic operations of the system
to the secure VMM. The guest operating system(s) interact with the user and receive the
request to perform cryptographic operations. The guest operating systems make a
hypercall to the VMM to perform the actual encryption/decryption. Virtualization
provides memory isolation; we leverage the secure nature of the VMM where a guest
operating system cannot read the contents of the host VMM, but the VMM can read the
contents of every guest OS. The guest programs interact with the outside world and
request crypto operations. The routines performing encryption and decryption using the
secret key are provided by the VMM. The secret key never reaches the memory sections
of the guest operating systems. Any attacks launched by the attacker Mallory are
restricted on the guest space. Due to this Mallory cannot get information about the key
using forensic analysis on the guest memory. We also implement attestation which
ensures that the application calling the cryptographic hypercall is legitimate.
The attestation process ensures that the program requesting the cryptographic operations
from the guest user space is the intended legitimate program. This thwarts attacks such
as code injection where a malicious program may request cryptographic operations by
patching onto the legitimate program. Attestation is achieved in two phases, in the first
phase the guest kernel issues an encrypt request to the host kernel with the data. The host
kernel responds by performing an attestation on the application making the request. Once
the host kernel is satisfied that the application making the request is valid, it proceeds to
the second phase where it encrypts the data.
The key used to perform encryption/decryption is stored completely in the hypervisor.
Figure 3 shows the sequence of operations for performing crypto operations.
1. The guest user space application issues requests for performing cryptographic
operations. It passes the type of operation and the data to be operated upon as input. The
request issues a software interrupt and the context switches from guest user space to guest
kernel space.
2. The guest kernel forwards the request to the trusted VMM.
3. The secure VMM injects code into the running guest kernel. The injected code is
responsible for returning the guest physical address where the user space program is
loaded. It also brings the user application’s pages into memory in case they are swapped
out. The injected code is changed every time an attestation request is made.
4. The guest kernel executes the injected code and returns the address of the page, where
the user space program resides.
5. Control is transferred back to the VMM.
6. The secure VMM now reads the contents of the user space program directly from the
memory, using the address obtained in step 4.
7. The secure VMM computes and compares the hash value of the memory content to pre
computed hash values obtained from the original binary image of the program.
8. The requested operations are performed by the VMM
9. The results are written back to the memory location passed by the guest kernel.
10. The guest kernel copies the results to the guest user space.
4.1.1 Implementation
The system was implemented on the Linux 2.6.23 kernel. We used the same guest and
host operating systems. We utilized the lguest modules to implement the VMM. We
used the DES module to perform cryptographic routines. We added all the required
hypercalls for performing these operations. We performed attestation by creating a nop
placeholder for the attestation code inside the guest kernel. When the guest kernel issues
an attestation request, the VMM provides executable code which is to be injected at this
location. The guest kernel uses copy_to_user() call to inject bytes at the specified
placeholder. Control is passed back to the guest kernel which executes the injected bytes.
The attesting routine returns hash computations of the user space program back to the
hypervisor. The hypervisor compares these values against expected values to determine
if the calling user space application is indeed a legitimate application.
4.1.2 Performance
We implemented the system on the following platform. Processor: Intel Core 2 Duo 1.66
GHz, RAM: 1 GB. We compared the DES and RSA algorithms with similar key sizes
while performing cryptographic operations. Table 3 shows the performance of the two
algorithms on the test platform. We averaged the measurements over 200 runs. One
round of encryption followed by decryption on the host OS took 8 microseconds, in the
guest OS, one round of encryption and decryption took 31 microseconds, an increase by a
factor of 3.8. RSA involves many more complex exponentiation operations as compared
to DES; it predictably takes longer than DES on the system.
4.2 Disk striping based key storage
We hide the key on secondary storage by writing to the unused sectors of files on the
hard disk without using the file system calls of the OS. Each file on the storage media
has an end of file (EOF) marker. The OS allocates space for files in disk blocks and does
not reclaim the space beyond the EOF marker if a particular block is partially used. The
space beyond the EOF on a sector is used in this research to store the key. We do not add
key information to random files, but we place new files during installation of the code.
Once the storage area is determined, we scatter the key throughout this area such that the
attacker cannot retrieve the key even after knowing the sector where it is stored. Fig. 4
shows an instance of a scattered array. We refer to the location of a bit of the key in
scattered array as bit-address. Every bit of the key has a bit-address associated with it
which forms the bit-address pattern. This bit-address pattern is unique to every
installation of the system. If the bit-address pattern is stored directly, then it can be easily
read by the attacker. Instead of storing the bit-address pattern, the bit-address fetching
block generates the address of each bit at run time. There is one logical block of code per
bit address to be generated. The bit-address fetching block is generated for the
application during installation time. We generate a bit-address and then generate the bitaddress fetching block to match the addresses. This is achieved by performing random
arithmetic operations on the result value till we achieve the desired value for that
particular bit. The arithmetic operations are chosen from a pool of operations. We also
obfuscate the location of the bit-address fetching block in the binary.
4.2.1 Secure storage of the key in secondary memory
The key is stored in the unused portions of the last sectors of files on the file system by
writing to the raw disk interface via /dev. The area in which the key is scattered is the
scattered array. The executable code for crypto operations assumes that the information
about the location of the disk blocks containing the key is present at a known location in
its program memory and the scattered key array is also present somewhere in the data
section of the program memory. It contains an empty section which is filled with the key
fetching code (bitaddress fetching block) during installation.
We refer to the location of a bit of the key in scattered array as bit-address. If bit 0 of the
key is present at the 11th position of the scattered array then the bit-address of bit 0 is 11.
Instead of storing the bit-address pattern, the bit-address fetching block generates the
address of each bit at run time. The execution of bite-address fetching block results in
the formation of the bit-address in a predetermined register/memory address which is
then used for accessing the bit from the scattered array. Construction of the block is done
by first setting a min and a max bound for the number of operations to generate an
address. If N operations generate a bit address, then min ≤ N ≤ max. Each operation is
one of seven basic mathematical operations (add, subtract, multiply, divide, and, or, not).
N is not a fixed number. During bit address fetching block generation, the values of
registers and memory locations used is initialized and one operation is chosen to be
performed. The result of this operation is tallied against the required value. If the desired
value is not reached, further operations are applied. Once we reached the desired value,
and min ≤ N ≤ max, we stop code generation. If N < min then we continue to add
mutually cancelling operations till we obtain the desired value, if max < N, we restart the
block generation.
4.2.2 Protecting the Bitaddress fetching blocks
The location of the bit-address fetching block can be revealed by the presence of a
jump/call instruction to this block. To prevent this, the target address for the call to the
bit-address fetching block is generated during execution. The location of the block which
calculates the target address is also randomized in the binary, and padded with many junk
calculations that do not affect the outcome. To prevent any malicious code from
executing the fetching blocks we self attest the running image of the executable. This
code computes hashes on sections of its process image and compares the results with the
expected results already hardcoded inside it. The attestation covers the fetching blocks
and the application within which it is executed. We use a simple inline hash function to
avoid the hash call from being observed on process tracers and tools like objdump. It is
difficult for an attacker to change the hash values stored in the binary as they are stored in
different locations in each installation.
4.2.3 Analysis of system
This portion of the research was implemented on Ubuntu 8.04 Linux OS. We used a 32
bit key, with 2 self-attestation blocks. The size of each bit-address fetch block was set as
80 bytes. We used a scatter array size of 2K bits, and the application was 14K bytes in
length. This implementation uses a few simple obfuscation techniques to keep the focus
on our main idea and to keep the implementation simple. Table 4 provides the critical
location information along with the information of the attestation blocks. It shows that
even with the margin of as small as 0x50 bytes, we obtained a good measure of
randomness in the system. This shows that any malware which tries to attack the
application will find it difficult to perform remote analysis and use the information
gained to attack another instance of the installation of the same application.
5. Randomization of memory layout
Most attacks are successful due to the lack of genetic diversity among computer systems.
An attacker can discover vulnerabilities in binaries and use them to engineer attacks on
multiple machines. If the memory layout in each copy of the binary was different on
every machine, it would make it extremely hard for the attacker to launch attacks. Figure
5 shows the layout of stack in the Intel x86 architecture. The relative location of the
return address on the stack for each frame remains a constant for every function instance.
This way an attacker can determine the number of bytes of offset from the current
location of memory where the return address is located, and launch buffer overflow and
stack smashing attacks. Similar techniques exist for attacking the heap frame. This can
lead to the program executing any function present in the process space as specified by
the attacker. In case of kernel threads; every function present in the operating system is
accessible to the attacker. Most overflow attacks can be stopped if the relative locations
of addresses are different in every instance of the binary. The objective of this section of
the paper is to randomize the stack frame and the heap frame. The stack frame is
randomized post-compilation, without compiler support and without the availability of
source code. The heap frame is randomized by changing the system library code and the
kernel code.
5.1 Randomization of the stack frame
We randomize the size of the run-time stack frames to make every copy of a binary
unique. The binaries are instrumented by analyzing the disassembly of the code segment
in a binary. We do not inject additional bytes of code in the binary but rewrite existing
bytes in the code segment. We do not require source code access for the binary. The
randomization process is carried out at the end-users machine. Since the randomization
is carried out at the end-user machines, it is required that the randomization procedure is
simple, low-cost and efficient. We are mainly concerned with the scenario where
attackers remotely target users using software that has a discovered or unknown
vulnerability.
5.1.1 Analysis of disassembly for randomization
Consider the C code snippet shown in figure 6 and its disassembly in the x86
architecture. The routine foo receives one integer argument, has two local elements
which need space on the stack. The compiler allocates 1024 + 4 bytes on the stack for
these two elements. The C library function gets, which is known to be vulnerable to
write beyond buffer bounds, is used to take input from the user to store in the character
buffer. This makes the character array vulnerable to a buffer overflow attack.
During randomization, only those instructions that are relevant to the run-time stack need
to be rewritten. By shifting the vulnerable character buffer down by a random amount,
the distance between the return address and the buffer becomes different for every copy
of the binary. This makes it impossible to use the same attack string against different
copies of the binary. This increases the cost of devising an attack and reduces the
propagation rate of a discovered vulnerability providing a larger window of opportunity
to develop a patch for the security loop-hole. Instructions of the following type will need
to be rewritten in the binary if we add a random pad:
a. Instructions that create space on the stack frame for local variables and buffers.
b. Instructions that deallocate space used by the locals of the function on the stack
frame. These instructions are executed right before the function returns. In case of
foo, the stack deallocation is done implicitly by the leave instruction that restores the
stack pointer to the frame pointer and hence we don’t need to explicitly modify any
instruction for correct deallocation of the random pad memory.
c. Instructions that access local variables and buffers on the stack frame. All local
variables and buffers are accessed with the help of the frame pointer EBP. All stack
locals are located below the frame buffer at lower addresses in the x86 architecture.
Because of the random pad, the local buffers have shifted further down from the
frame pointer. All the local variables will shift downwards by the same amount.
5.1.2 Implementation
The prototype randomizer has been developed in C and compiled using GCC. We have
used the objdump disassembler. Figure 7 shows the flow of our randomizer. The binary
file is fed to the disassembler. The output of the disassembler is parsed for identification
of instruction operands that need to be modified in the binary. Before feeding the
disassembly output to the parser, the grep utility is used to extract only those instructions
that are relevant to stack frames. The parser separates out and analyzes each sub-routine
in order to accomplish fine-grained randomization such that every function is padded
separately with a random padding conforming to the constraints of that specific function.
Thereafter these instructions are directly rewritten in the binary to change the layout of
the stack frames at run-time.
Instructions that create space on the stack frame subtract a specific constant value from
the value of the stack pointer. The instructions that we are looking for are of the form
“sub $0x#,%esp”, where # is a constant number determined by the compiler as per the
requirements of the function. To restore the stack, the compiler usually adds the same
constant value to the stack pointer at the end of a function that was used during
allocation. All the references to the local variables are done with the help of a negative
offset from the base pointer EBP.
The prototype works as a 2-pass randomizer. In the first pass, each sub-routine is
analyzed to determine the maximum padding that can be provided to that routine. Every
instruction in the routine that accesses memory regions has an upper limit on the relative
address that can be accessed by it. We process every instruction and check the maximum
available random pad to that instruction. The least of these values becomes the pad for
the function. The randomizer also looks for instructions that are sensitive to alignment of
memory operands and takes a conservative approach of not randomizing sub-routine with
very sensitive instructions. The random pad is then clipped to the nearest factor of 32 to
resolve many alignment requirements of several instructions. In certain cases it is also
necessary to place an upper-limit on the maximum padding given to each sub-routine as it
increases the chances of a stack overflow causing the process to crash.
In the second pass, the randomizer goes through the instructions in the disassembly and
locates them in the executable binary file. While tracing every instruction the randomizer
also keeps track of the sub-routine in which the instruction is present. With the help of
the data structure built for every sub-routine during the first pass, the randomizer
statically rewrites and instruments the corresponding instruction in the binary executable.
5.1.3 Analysis
We randomized copies of the following applications: Open Office, pidgin, pico, ls, gcc,
netstat, ifconfig, route, xcalc, tail, date, nslookup, sum, head, wc, md5sum, pwd, users,
cat, cksum, hostid, logname, echo, size, firefox, viewres, xwininfo, oclock, ipcs, pdfinfo,
pdftotext, eject, lsmod, clear, vlc, and gij. Thus we cover both “console” applications and
graphics applications. It is clear that we have a proof of concept implementation that can
cover all applications that we tested it on. All the binaries used in testing were release
quality, optimized utilities that are part of Linux distributions. A small list of binaries
that were successfully randomized is shown in Table 5. Our experiments show that the
randomization cost itself is significantly reasonable and low. Since we only manipulate
the size of the run-time stack, we did not expect this approach to have any run-time
penalty whatsoever. The results of our experiments, in terms of the difference in the
execution time of the original binary and the instrumented binary prove that the
randomized binaries have exactly the same run-time efficiency as their original unrandomized counterparts.
We found that on an average the randomizer modified the run-time stack of more than
75% of the sub-routines in every application. Some of the routines are not randomized
as we take a conservative approach to not make any changes to routines containing
alignment sensitive instructions such as FXSAVE. We are also restricted by the length of
the stack allocation instruction as we do not inject additional bytes into the program. If
the width of the operand on the stack allocating instruction is only one byte, we can
allocate a maximum of 128 bytes of stack with such an instruction. If the routine already
allocates 128 bytes of stack, then its stack frame cannot be randomized.
5.2 Heap frame randomization
In this section we randomize the size of the heap chunk returned per every allocation
request generated by an application. We single out the library functions that play a vital
role in heap memory management (the functions that perform the free, allocate and resize
operations) and hook the entry point to these functions so that we can add our
randomization code to them. This helps in defeating attacks that target buffer overflows,
in quite an effective fashion. Access to the source code of an application is not required
for this solution since it patches the underlying heap memory management mechanism
itself either statically or during run-time. We adopt a dual random padding strategy per
every memory allocation. This is done by appending a random pad below as well as
above the pointer to the heap memory chunk returned by the allocation algorithm. Figure
8 gives a view of this.
We implement our approach by identifying the memory management functions to be
patched in the GNU C library. The most important of these functions that we identified
are – malloc(), free(), realloc() and memalign(). Other related functions which we
identified are calloc(), valloc() and pvalloc() which need not be patched as they are based
entirely on malloc() in the current version of the GNU C library.
5.2.1 Implementation
malloc(): When a function call to the malloc() function is made, it is first intercepted by
its public wrapper function. We insert following operations before the call to the internal
malloc() is made. We generate two random integers i and j, which are multiples of 8 to
respect the internal memory alignment rules. The upper limit of the random numbers
generated can be selected heuristically. These two random integers are added to the size
parameter contained in the original request, making an allocation call for i+ j+ original
request. A successful malloc() operation returns the pointer to a newly allocated memory
chunk. The value of the pointer returned to the calling function (user application) is
shifted by i bytes. We store these two random numbers so that other memory
management functions like free() and realloc() can know this random padding value in
order to calculate the actual starting address of the memory chunk and thus the boundary
tag information stored above it. Just like the boundary tag information is stored above
the actual starting address of the memory chunk user space, we store the random integer i
just above the new shifted starting address computed earlier.
free(): Once a call to free() is made by the requestor function, execution enters the free()
public wrapper. Here, just like in malloc() public wrapper, it is faced with the situation
of calling the hook function before proceeding with the actual free() operation. If the
hook is not set, we first extract the value i set by malloc() which lies just above the chunk
address passed as an argument to free(). Using this value, we can calculate the original
starting address of the memory chunk’s user space. This value thus can be passed to the
internal free() function called here forth.
realloc(): The realloc() public wrapper must assume the responsibility of performing
pointer retrieval operation. The original chunk pointer must be available before we can
go ahead with the actual reallocation operation. The reallocation operation can either
shorten the memory buffer or can elongate it based on the polarity of the difference
between the current chunk size and new requested size. In the case of elongation, it tries
to achieve the same by calling the internal realloc() function. In an event that this
operation fails, the public wrapper tries to allocate a new memory chunk altogether with
the help of malloc() and then before returning this pointer to the requestor, it copies the
data from the current memory chunk to the new one. With randomization introduced,
this operation does not remain to be a straightforward copy-paste operation. The random
factors (i and j) generated for the new memory chunk are most likely different from those
of the old ones. By performing appropriate calculations and adjustments, we make sure
that the copy operation starts from the old chunk’s shifted pointer location to the new
chunk’s shifted pointer location.
memalign(): memalign() function essentially is malloc(), but with memory alignment
constraints. We must make sure that the random value i is a multiple of the alignment
factor x passed as an argument to the memalign() function. In some cases, alignment
request made is such that the buffer needs to be aligned by a page or more (such as 512
bytes or more). Padding by larger multiples of such large x values can prove to be quite
costly. Hence, we keep the value of i to be x or 2x in such cases.
5.2.2 Analysis
We evaluated the heap frame randomization on Ubuntu 8.04. We used a tool known as
Unixbench version 4.1. We were successfully able to run applications such as web
browsers, office suites, games, graphics processing utilities, etc. using our library patch,
without any exceptions. When running applications with our randomization library
patch, we did not notice any significant hit on their run-time efficiency. The results of
run-time performance tests can be seen in table 6.
6. Conclusion
In this paper we presented solutions to detect the presence of compromised binaries,
storing secret keys in memory, and modifying the memory layout of binaries entirely in
software. These three scenarios represent most of the attacks that occur in the computing
world. Mitigating these situations allows us to improve the security of end user
platforms. These techniques can be used independently and are not dependant on the
software vendors to provide source code of their products.
Future work would be to extend the techniques described above to enhance the security
of the operating system itself by randomizing the OS and/or providing remote attestation
of the OS code. Such attempts would of course raise other issues such as a variance in the
OS code due to hardware differences amongst makes and models of computers. Work is
needed to study the techniques of combining the approaches into a single system with
multiple defense mechanisms. Randomization of the address space does imply changes to
the code itself and hence code on different machines may have different MD-5 hashes
and these have to be traced by the remote attester.
References
Basili V.R, and Perricone B.T. Software errors and complexity: an empirical
investigation 0, Communications of the ACM (1984). 27-1, pp. 42 – 52.
Bhatkar S, DuVarney D. C, and Sekar R. Address obfuscation: an efficient approach to
combat a broad range of memory error exploits. 12th USENIX Security Symposium,
August 2003.
Canetti R, Dodis Y, Halevi S, Kushilevitz E and Sahai A. Exposure-resilient functions
and all-or-nothing transforms, in Advances in Cryptology – EUROCRYPT, 2000, pp.
453-469.
Durden T. Bypassing PaX ASLR Protection. Phrack Inc., 2002 Available from URL
http://www.phrack.org/issues.html?issue=59&id=9#article
Eichin M and Rochlis J. With microscope and tweezers: an analysis of the internet virus
of november 1988. Proceedings of the IEEE Symposium on Security and Privacy, pages
326–343, May 1989.
Forrest S, Somayaji A, and Ackley D.H. Building diverse computer systems. HOTOS
’97: Proceedings of the 6th Workshop on Hot Topics in Operating Systems (HotOS-VI),
pages 67–72, May 1997.
Foster J, Osipov V, Bhalla N, and Heinen N. Buffer Overflow Attacks: Detect, Expolit,
Prevent. Syngress, 2005.
Garay J.A and Huelsbergen L. Software integrity using Timed excutable agents, in:
Proceedings of the 2006 ACM Symposium on Information, computer and
communications security (2006), pp. 189 – 200.
Garfinkel T, Pfaff B, Chow J, Rosenblum M, Boneh D. Terra: A virtual machine-based
platform for trusted computing, ACM SIGOPS Operating Systems Review, pages 193 –
206, 2003
Goldman K, Perez R, and Sailer R. Linking remote attestation to secure tunnel endpoints,
STC '06: Proceedings of the first ACM workshop on Scalable trusted computing, pages
21 - 24, 2006.
Harrison K. Protecting Cryptographic Keys from Memory Disclosure Attacks. 37th
Annual IEEE/IFIP International Conference on Dependable Systems and Networks, pp.
137-143, 2007.
MacKenzie P. Networked cryptographic devices resilient to capture, IEEE Symposium
on Security and Privacy, pp. 12-25, 2001.
Sahita R, Savagaonkar U, Dewan P, and Durham D. Mitigating the Lying-Endpoint
Problem in Virtualized Network Access Frameworks, Springer: Managing Virtualization
of Networks and Services, pages 135 – 146, 2007.
Sailer R, Zhang X, Jaeger T, and Van Doorn L. Design and implementation of a TCGbased integrity measurement architecture, Proceedings of the USENIX Security
Symposium (2004), pp. 223-228.
Seacord R. Secure Coding in C and C++. Addison-Wesley, 2005.
Seshadri A, Luk M, Shi E, Perrig A, Van Doorn L, and Khosla P. Pioneer: Verifying
code integrity and enforcing untampered code execution on legacy systems, ACM
SIGOPS Operating Systems Review (2005), vol 39 -5.
Shacham H, Page M, Pfaff B, Goh E, Modadugu N, and Boneh D. On the effectiveness
of address-space randomization. CCS '04: Proceedings of the 11th ACM conference on
Computer and communications security, pages 298-307, 2004.
Shamir A, and van Someren N. Playing "Hide and Seek" with Stored Keys, Third
International Conference on Financial Cryptography, pp. 118-124, 1999.
Srinivasan R, and Dasgupta P. Towards more effective virus detectors. Communications
of the Computer Society of India, vol 31-5, pages 21-23. August 2007.
Stumpf F, Tafreschi O, Röder P, and Eckert C. A robust integrity reporting protocol for
remote attestation, WATC’06: Second Workshop on Advances in Trusted Computing,
2006.
Viega J and McGraw G. Building Secure Software. Addison-Wesley, 2002.
Wang L and Dasgupta P. Kernel and application integrity assurance: Ensuring freedom
from rootkits and malware in a computer system, in: Advanced Information Networking
and Applications Workshops (2007), pp. 583 – 589.
Xu J, Kalbarczyk Z, and Iyer R. Transparent Runtime Randomization for Security.
Proceedings 22nd International Symposium on Reliable Distributed Systems, pages 260269, 2003.
Web link 1: ASLR: Address space layout randomization, Retrieved on April 25 2010,
http://pax.grsecurity.net/docs/aslr.txt
Web link2: nCipher Solutions, Retrieved on April 20 2010,
http://iss.thalesgroup.com/Resources/Product%20Data%20Sheets/keyAuthority.aspx
Web link 3: VLC media player source code FTP repository. Retrieved on February 24
2010, http://download.videolan.org/pub/videolan/vlc/
Web link 4: HD-DVD Content Protection already hacked? Retrieved on April 4 2010,
http://www.techamok.com/?pid=1849
Trent
Alice
Application
P
injected
code C
Figure 1: Overview of Remote attestation
1. Alice
Trent
Verification Request
2. Trent
Alice
Inject code at location
and execute it
3. C
Trent
Machine Identifier
4. Trent
C
Proceed
5. C
Trent
Initial Checksum
6. Trent
C
Proceed
7. C
Trent
MD5 of specified
regions
8. Trent
C
Proceed
9. C
Trent
Test of correct PID
10. Trent
C
Proceed/Halt
Figure 2: Detailed steps in Remote Attestation processs
Guest User space – Ring 3 (Untrusted)
All user services such as ftp, http etc run here
1
…
…
Guest…
Kernel space – Ring 1 (Untrusted)
…
… to the network, vulnerable to attacks
Connected
…
..
2
4
5
6
10
9
VMM – Ring 0 (Trusted)
Trusted component in the system running at highest privilege
3
DES with
Key
7
8
Figure 3: Key storage and attestation model
1
Bit 1
2
3
4
5
Bit 128
.........
11
.........
Bit 0
Key Bit 0 is stored at location 11
Key Bit 1 is stored at location 1
Key Bit 24 is stored at location n
.......
Key Bit 128 is stored at location 5
Figure 4: Scattered Array
n
Bit 24
Inactive
Frame
Return link to N - 3
N-2
Data
Inactive
Frame
Return link to N - 2
N-1
Data
Active
Frame
Return link to N - 1
N-1
Data
Available Stack
Figure 5: Layout of stack in the Intel architecture
void foo(int dummy_arg)
{
char buffer[1024];
int local_variable;
gets(buffer);
local_variable = dummy_arg;
}
080483c4 <foo>:
80483c4:
55
80483c5:
89 e5
80483c7:
81 ec 18 04 00 00
80483cd:
8d 85 fc fb ff ff
80483d3:
89 04 24
80483d6:
e8 ed fe ff ff
80483db:
8b 45 08
80483de:
89 45 fc
80483e1:
c9
80483e2:
c3
Figure 6: Sample C routine and its disassembly
push %ebp
mov %esp,%ebp
sub $0x418,%esp
lea -0x404(%ebp),%eax
mov %eax,(%esp)
call 80482c8 <gets@plt>
mov 0x8(%ebp),%eax
mov %eax,-0x4(%ebp)
leave
ret
Extract
relevant
instructions
Binary
file
Disassembler
Sub routine separation
and instruction analysis
Parser
Randomizer
Figure 7: Workflow of the randomizer
Randomized
binary
Chunk
pointer
Previous chunk size
Previous chunk size
Size/Status (In use)
Size/Status (In use)
Chunk
pointer
USER DATA
Random Pad 1
USER DATA
Random Pad 2
Previous chunk size
Previous chunk size
Size/Status (In use)
Size/Status (In use)
Forward pointer
Forward pointer
Backward pointer
Backward pointer
Unused memory
Unused memory
Memory Chunk without randomization
Randomized memory chunk
Figure 8: Allocated memory chunk with dual random padding
Machine
Pentium 4
Quad Core
Test
generation
12.3
5.2
Compilation
time
320
100
Total
time
332
105
Table 1: Average code generation time in milliseconds on server end for Intel Pentium 4
and Core 2 Quad machines for one instance of the measurement
Machine
Pentium 4
Quad Core
Server side
execution time
(ms)
0.6
0.4
Client side
execution time
(ms)
22
16
Table 2: Time (milliseconds) to compute the measurements on server and the client
Input size in bits
64
128
192
Time taken for DES (ms)
Host OS
Guest OS
8
31
19.8
27.9
24.9
35.6
Time taken for RSA (ms)
Host OS
Guest OS
322
340
358
373
394
408
Table 3: Operating time on guest and host operating systems
Position of
Fetch blocks
0x21B7
Location of
Jump to fetch
blocks
0xCF7
Locations of
Attestation
blocks
Attestation range
Hash Value for
attestation
Total size of all
fetch blocks
0xA40
0x2317
Start
0x0
End
0x13BF
0x67977
0x255D
0x203
0x38e4
0x145F0A
0x21DA
0x21B1
0x21C0
0x21B2
0xCC5
0xCF2
0xCBB
0xCFD
0x23A3
0x0
0x9B4
0x30697
0x25C6
0x1288
0x38e4
0xE5BF3
0x2248
0x0
0xCBA
0x489E0
0x2594
0x9DE
0x38e4
0x115E85
0x23C7
0x0
0xAF4
0x3ACCA
0x2542
0xFE9
0x38e4
0xF08AE
0x21EC
0x0
0x139C
0x66C3E
0x112E
0x38e4
0xEAE06
0x2232
0x0
0x3A0
0x63B7
0x23AD
0x965
0x38e4
0x11A6EF
0x244E
0x0
0xD3D
0x4AB06
0x272E
0x49E
0x38e4
0x13D852
0x2497
0x219D
0xCEB
0x21CE
0xCE5
Table 4: Data from different instances of installations
0xA03
0xA86
0xA1F
0xA77
0xA2D
0xAA2
Binary
Open
Office
pidgin
gcc
route
xcalc
echo
firefox
vlc
eject
lsmod
Time taken
to
randomize
(sec)
20.22
% of
subroutines
randomized
Original
stack
(bytes)
Instrumented Overhead
stack (bytes) (bytes)
88.33
21,256
23,170
1,914
54.718
1.836
0.505
0.109
0.137
1.2
0.057
0.435
0.397
91.48
91.77
93.75
100
100
100
100
75
100
16,093
1,829
2,183
11,372
1,018
27,890
6,366
20,611
4,414
16,407
2,009
2,614
11,592
1,079
28,216
6,688
21,431
4,473
314
180
431
220
61
326
322
820
59
Table 5: Performance Test Results for stack randomization
Test
Executed without
the library patch
(seconds)
Executed with
the library
patch (seconds)
Dhrystone 2 with register variables
Whetstone Double Precision
Execl Throughput
File Copy (buffer size = 1024 Max.
number of blocks = 2000)
File Read (buffer size = 256 Max. number
of blocks = 500)
File Write (buffer size = 256 Max.
number of blocks = 500)
File Copy (buffer size = 256 Max. number
of blocks = 500)
File Read (buffer size = 4096 Max.
number of blocks = 8000)
File Write (buffer size = 4096 Max.
number of blocks = 8000)
File Copy (buffer size = 4096 Max.
number of blocks = 8000)
Pipe Throughput
Pipe-base Context Switching
Process Creation
System Call Overhead
Shell Script (8 concurrent)
10
10.2
29.3
30
10.1
10
29.8
30
30
30.1
30
30
30
30
30
30
30
30
30
30
10
10
30
10
60
9.9
10
30
10
60.2
Table 6: Run-time performance tests for heap randomization on Linux Ubuntu system
Download