Title - Software based Remote Attestation: measuring integrity of user applications and kernels Authors: Raghunathan Srinivasan1 (corresponding author), Partha Dasgupta1, Tushar Gohad2 Affiliation: 1. School of Computing, Informatics and Decision Systems Engineering, Arizona State University, Tempe, AZ, USA 2. MontaVista Software LLC Address: Email: raghus@asu.edu Phone: (1) 480-965-5583 Fax: (1)-480-965-2751 Abstract: This research describes a method known as Remote attestation to attest the integrity of a process using a trusted remote entity. Remote attestation has mostly been implemented using hardware support. Our research focuses on the implementation of these techniques based entirely on software utilizing code injection inside a running process to attest its integrity. A trusted external entity issues a challenge to the client machine and the client machine has to respond to this challenge. The result of this challenge provides the external entity with an assurance on whether or not the software executing on the client machine is compromised. This paper also shows methods to determine the integrity of the operating system on which software based remote attestation occurs. Keywords: Remote Attestation, Integrity Measurement, Root of Trust, Kernel Integrity, Code Injection. 1. Introduction Many consumers utilize security sensitive applications on a machine (PC) along with other vulnerable software. Malware can patch on various software in the system by exploiting these vulnerabilities. A regular commodity OS consists of millions of lines of code (LOC) [1]. Device drivers usually range in size between a few lines of code to around 100 thousand lines of code (KLOC), with an average of 1 bug per device driver [2]. Another empirical study showed that bugs in the kernel may have a lifetime of nearly 1.8 years on average [3], and that there may be as many as 1000 bugs in the 2.4.1 Linux kernel. The cumulative effect of such studies is that it is difficult to prevent errors that can be exploited by malware. Smart malware can render Antimalware detection techniques by disabling them. Hardware detection schemes are considered to be non modifiable by malware. However, mass scale deployment of hardware techniques remains a challenge, and they also have the stigma of digital rights management (DRM) attached. Another issue with hardware measurement schemes is that software updates have to be handled such that only legitimate updates get registered with the hardware. If the hardware device offers an API to update measurements, malware can attempt to use that API to place malicious measurements in the hardware. If the hardware device is not updatable from the OS, then reprogramming has to be performed on it to reflect updated measurements. Software based attestation schemes offer flexibility and can be changed quickly to reflect legitimate updates. Due to the ease of use and the potential of mass scale deployment, software based attestation schemes offer significant advantages over hardware counterparts. However, every software based attestation scheme is potentially vulnerable to some corner case attack scenario. In extreme threat model cases and cases where updates are rare, network administrators can switch to using hardware based measurement schemes. For the general consumer, software based schemes offer a lightweight protocol that can detect intrusions prior to serious data losses. Remote Attestation is a set of methods that allows an external trusted agent to measure the integrity of a system. Software based solutions for Remote Attestation schemes vary in their implementation techniques. Pioneer [4], SWATT [5], Genuinity [6], and TEAS [7] are well known examples. In TEAS, the authors prove mathematically that it is highly difficult for an attacker to determine the response for every integrity challenge, provided the code for the challenge is regenerated for every instance. However, TEAS does not provide any implementation framework. In Genuinity, a trusted authority sends executable code to the kernel on the untrusted machine, and the kernel loads the attestation code to perform the integrity measurements. Genuinity has been shown to have some weaknesses by two studies [8], [5]. However, the authors of Genuinity have since claimed that these attacks may work only on the specific cases mentioned in the two works, a regeneration of the challenge by the server would render the attacks insignificant [9]. This work is quite similar to Genuinity with certain differences in technique. Like Genuinity, this work focuses on the importance of regenerating code that performs integrity measurement of an application on the client. We do not utilize the Operating System support to load the challenge, the application has to receive the code and execute it. In addition, this paper also deals with the problem of what we term a ‘redirect’ attack where an attacker may direct the challenge to a different machine. The attestation mechanisms presented in this work use the system call interface of the client platform. Due to this, the problem of determining integrity of an application on a client platform is split into two orthogonal problems. The first involves determining the integrity of the user application in question by utilizing system calls and software interrupts. The orthogonal problem is determining the integrity of the system call table, interrupt descriptors, and the Text section of the kernel that runs on client platform For the first problem, it is assumed that the system calls will produce the correct results. Rootkits are assumed to be absent from the system. We assume that there may be various other user level applications on the client platform that may attempt to tamper with the execution of the challenge. For the second problem, this paper presents a scheme where an external entity can determine the state of the OS Text section, System call Table, and the Interrupt Descriptor table on the client machine. It can be noted that the external entities obtaining the integrity measure for the application and the OS can be different. The solution in this paper is designed to detect changes made to the code section of a process. This allows the user (Alice) to determine whether one application is clean on the system. The same technique can be extended to every application on the system to determine whether all installed applications are clean. Trent is a trusted entity who has knowledge of the structure of an un-tampered copy of the process (P) to be verified. Trent may be the application vendor or Trent may be an entity that offers attestation services for various applications. It should be noted that Trent only needs to know the contents and behavior of the clean program image of P to generate challenges. Trent provides executable code (C) to Alice (the client/ end user), which Alice injects on P. C takes overlapping MD5 hashes on the sub-regions of P and returns the results to Trent. Trent has to be a trusted agent as the client downloads program code or performs certain operations based on Trent’s instructions. If Trent is not trusted then Alice cannot run the required code with certainty that it will not compromise Alice’s machine (MAlice). C is newly generated randomized code that executes on the user end to determine the integrity of an application on an x86 based platform. This ensures that an attacker cannot determine the results of the integrity measurement without executing C. Trent places some programming constructs in C that ensure that C is difficult to execute in a sandbox or a controlled environment. A software protocol means that there exists opportunity for an attacker (Mallory) to forge results. The solution provided in this paper protects itself from the following attacks. Replay Attack: Mallory may provide Trent forged results by replaying the response to a previous attestation challenge. To prevent this scenario, Trent changes the operations performed in every instance of C. This is done by placing some lines in the source code of C that depend on various constants. C is recompiled for every attestation request. These constants are generated prior to code compilation using random numbers. Consequentially, the outputs of these measurements change with the change of every constant. The code produced by Trent requires that Mallory monitors and adapts the attack to suit the challenge. We utilize the concept that program analysis of obfuscated code is complex enough to prevent attacks [7]. Tampering: Mallory may analyze the operations performed by the challenge to return forged values. Trent places dummy instructions, randomizes locations of variables, and places some self modifying instructions to prevent static analysis of the application. It must be noted that self modifying code is normally not permitted in the Intel x86 architecture as the code section is protected against writes. However, we use a Linux OS call ‘mprotect’ to change the protections on the code section of the process in which C executes to allow this feature. Furthermore Trent also maintains a time threshold by which the results are expected to be received; this reduces the window of opportunity for Mallory to launch a successful attack. Redirect: Mallory may re-direct the challenge from Trent to a clean machine or execute it in a sandbox which will provide correct integrity values as the response to Trent. The executable code sent by Trent obtains machine identifiers to determine whether it executed at the correct machine. It also executes certain tests to determine if it was executed inside a sandbox. C communicates multiple times to Trent while executing tests on P. This makes it harder for Mallory to prevent C from executing. These techniques are discussed in detail in section 5. For obtaining the integrity measurement of the OS Text section, the attestation service provider Trent′ provides executable code (Ckernel) to the client OS (OSAlice). OSAlice receives the code into a kernel module and executes the code. It is assumed that OSAlice has means such as Digital Signatures to verify that Ckernel did originate from Trent′. The details of implementation of this scheme are in section 7. The rest of the paper is organized as follows. Section 2 contains a review of the related work. Section 3 describes the problem statement, threat model and assumptions made in this solution. Section 4 describes the overall design of the system; section 5 describes the obfuscation techniques used in creating C. Section 6 describes the implementation of the application attestation system, section 7 describes the implementation of kernel runtime measurements and section 8 concludes the paper. 2. Related Work Code attestation involves checking if the program code executing within a process is legitimate or has been tampered. It has been implemented using hardware, virtual machine and software based detection schemes. In this section we discuss these schemes as well as methods to perform program analysis and obfuscation techniques available in literature. 2.1 Hardware based integrity checking Some hardware based schemes operate off the TPM chip provided by the Trusted Computing Group [10],[11], [12], while others use a hardware coprocessor which can be placed into the PCI slot of the platform [13], [14]. The schemes using the TPM chip involve the kernel or an application executing on the client obtaining integrity measurements, and providing it to the TPM, the TPM signs the values with its private key and may forward it to an external agent for verification. The coprocessor based schemes read measurements on the machine without any assistance from the OS or the CPU on the platform, and compare measurements to previously stored values. The hardware based scheme can allow a remote (or external) agent to verify whether the integrity of all the programs on the client machine is intact or not. Hardware based schemes have a stigma of DRM attached to them, may be difficult to reprogram and are not ideally suited for mass deployment. The TPM based schemes have little backward compatibility in that it does not work on legacy systems which do not have a TPM chip. Integrity Measurement Architecture (IMA) [15] is a software based integrity measurement scheme that utilizes the underlying TPM on the platform. The verification mechanism does not rely on the trustworthiness of the software on the system. IMA maintains a list of hash values of all possible executable content that is loaded in the system. When an executable, library, or kernel module is loaded, IMA performs an integrity check prior to executing it. IMA measures values while the system is being loaded, however does not provide means to determine whether any program that is in execution got tampered in memory. IMA also relies on being called by the OS when any application is loaded; it relies on the kernel functions for reading the file system, and relies on the underlying TPM to maintain an integrity value over the measurement list residing in the kernel. Due to this, each new measurement added to a kernel-held measurement list results in a change required for values stored in the Platform Configuration Register (PCR) of the TPM security chip on the system. 2.2 Virtualization based Integrity checking Virtualization implemented without hardware support has been used for security applications. This form of virtualization was implemented prior to large scale deployment of platforms containing in built hardware support for virtualization. Terra uses a trusted virtual machine monitor (TVMM) and partitions the hardware platform into multiple virtual machines that are isolated from one another [16]. Hardware dependent isolation and virtualization are used by Terra to isolate the TVMM from the other VMs. Terra implements a scheme where potentially every class of operation is performed on a separate virtual machine (VM) on the client platform. Terra is installed in one of the VMs and is not exposed to external applications like mail, gaming, and so on. The TVMM is provided the role of a Host OS. The root of trust in Terra is present in the hardware TPM; the TPM takes measurements on the boot loader, which in turn takes measurements on the TVMM. The TVMM takes measurements on the VMs prior to loading them. Terra relies on the underlying TPM to take some measurements. Most traditional VMM based schemes are bulky and need significant resources on the platform to appear transparent to the end user, this holds true for Terra where the authors advocate multiple virtual machines. 2.3 Integrity checking using hardware assisted virtualization Hardware support for virtualization has been deployed in the widely used x86 consumer platforms recently. Intel and AMD have come out with Intel VT-x and AMD-V which provide processor extensions where a system administrator can load certain values in the hardware to setup a VMM and execute the operating system in a guest environment. The VMM runs in a mode that has higher privileges than the guest OS and can therefore enforce access control between multiple guest operating systems and also between application programs inside an OS. The system administrator can also setup events in the hardware which cause the control to exit from the guest OS to the VMM in a trap and emulate model. The VMM can take a decision based on the local policy whether to emulate or ignore the instruction. VIS [17] is a hardware based virtualization scheme which determines the integrity of client programs that connect to a remote server. VIS contains an Integrity Measurement Module which reads the cryptographically signed reference measurement (manifest) of a client process. VIS verifies the signature in a scheme similar to X.509 certificate measurement and then takes the exact same measurements on the running client process to determine whether it has been tampered. The OS loader may perform relocation of certain sections of the client program, in which case the IMM reverses these relocations using information provided in the manifest and then obtains the measurement values. VIS requires that the pages of the client programs are pinned in memory (not paged out). VIS restricts network access during the verification phase to prevent any malicious program from bypassing registration. VIS does not allow the client programs unrestricted access to network before the program has been verified. 2.4 Software based integrity measurement schemes Genuinity [6] implements a remote attestation system in which the client kernel initializes the attestation for a program. It receives executable code and maps it into the execution environment as directed by the trusted authority. The system maps each page of physical memory into multiple pages of virtual memory creating a one to many relationship between the physical and virtual pages. The trusted external agent sends a pseudorandom sequence of addresses, the Genuinity system othen takes the checksum over the specified memory regions. Genuinity also incorporates various other values like the Instruction and Data TLB miss count, counters which determine number of branches and instructions executed. The executable code performs various checks on the client kernel and returns the results to a verified location in the kernel on the remote machine, which returns the results back to the server. The server verifies if the results are in accordance with the checks performed, if so the client is verified. This protocol requires OS support on the remote machine for many operations including loading the attestation code into the correct area in memory, obtaining hardware values such as TLB. Commodity OS have many applications, requiring OS support or a kernel module for each specific application can be considered a major overhead. In Pioneer [4] the verification code resides on the client machine. The verifier (server) sends a random number (nonce) as a challenge to the client machine. The result returned as response determines if the verification code has been tampered or not. The verification code then performs attestation on some entity within the machine and transfers control to it. This forms a dynamic root of trust in the client machine. Pioneer assumes that the challenge cannot be re directed to another machine on a network, however in many real world scenarios a malicious program can attempt to redirect challenges to another machine which has a clean copy of the attestation code. In its checksum procedure, it incorporates the values of Program Counter and Data Pointer, both of which hold virtual memory addresses. An adversary can load another copy of the client code to be executed in a sandbox like environment and provide it the challenge. This way an adversary can obtain results of the computation that the challenge produces and return it to the verifier. Pioneer also assumes that the server knows the exact hardware configuration of the client for performing a timing analysis, this places a restriction on the client to not upgrade or change hardware components. In TEAS [7] the authors propose a remote attestation scheme in which the verifier generates program code to be executed by the client machine. Random code is incorporated in the attestation code to make analysis difficult for the attacker. The analysis provided by them proves that it is very unlikely that an attacker can clearly determine the actions performed by the verification code; however implementation is not described in the research. A Java Virtual Machine (JVM) based root of trust method has also been implemented to attest code [18]. The authors implement programs in Java and modify the JVM to attest the runtime environment. However, the JVM has known vulnerabilities and is itself software that operates within the Operating System, and hence is not a suitable candidate for checking integrity. SWATT [5] implements a remote attestation scheme for embedded devices. The attestation code resides on the node to be attested. The code contains a pseudorandom number generator (PRG) which receives a seed from the verifier. The attestation code includes memory areas which correspond to the random numbers generated by PRG as part of the measurement to be returned to the verifier. The obtained measurements are passed through a keyed MAC function, the key for the instance of MAC operation is provided by the verifier. The problem with this scheme is that if an adversary obtains the seed and the key to the MAC function, the integrity measurements can be spoofed as the attacker would have access to the MAC function and the PRG code. 2.5 Attacks against software based attestation schemes Genuinity has been shown to have weaknesses by two works [8], [5]. In [8] it is described that Genuinity would fail against a range of attacks known as substitution attacks. The paper suggests placing attack code on the same physical page as the checksum code. The attack code leaves the checksum code unmodified and writes itself to the zero-filled locations in the page. If the pseudo random traversal maps into the page on which the imposter code is present, the attack code redirects the challenge to return byte values from the original code page. Authors of Genuinity countered these findings by stating that the attack scenario does not take into account the time required to extract test cases from the network, analyze it, find appropriate places to hide code and finally produce code to forge the checksum operations [9]. The attacks were specifically constructed against one instance of the checksum generation, and would require complex re engineering to succeed against all possible test cases. This would require a large scale code base to perform the attack. Such a large code base would not be easy to hide. In [5] it is suggested that genuinity has a problem of mobile code where an attacker can exploit vulnerabilities of mobile code as code is sent over the network to be executed on the client platform. In addition, the paper also states that Genuinity reads 32 bit words for performing a checksum and hence will be vulnerable if the attack is constructed to avoid the lower 32 bits of memory regions. These two claims are countered by the authors of Genuinity [9]. The first is countered by stating that Genuinity incorporates public key signing which will prevent mobile code modifications by an attacker, while the second is countered by stating that genuinity reads 32 bits at a time, and not the lower 32 bits of an address. A generic attack on software checksum based operations has been proposed [19]. This attack is based on installing a kernel patch that redirects data accesses of integrity measurement code to a different page in the memory containing a clean copy of the code. This attack constitutes installation of a rootkit to change the page table address translation routine in the OS. Although this scheme potentially defeats many software based techniques, the authors have themselves noted that it is difficult for this attack to work on an x86 based 64 bit machine which does not use segmentation, this is because the architecture does not provide the ability to use offsets for code and data segments. Moreover, an attack like this requires the installation of a kernel level rootkit that continuously redirects all read accesses to different pages in memory. The attestation scheme presented in this paper for the user application cannot defend itself against this attack, however, the scheme presented in this work to determine the integrity of the kernel is capable of detecting such modifications. In addition, Pioneer [4] suggests a workaround on this classes of attacks by suggesting that if there are multiple virtual address aliases, which in turn creates extra entries in the page table which will lead to the OS eventually flushing out the spurious pages. 2.6 Program analysis and code obfuscation Program Analysis requires disassembly of code and the control flow graph (CFG) generation. The linux tool ‘objdump’ is one of the simplest linear sweep tools that perform disassembly. It moves through the entire code once, disassembling each instruction as and when encountered. This method suffers from a weakness that it misinterprets data embedded inside instructions hence carefully constructed branch statements induce errors [20]. Linear sweep is also susceptible to insertion of dummy instructions and self modifying code. Recursive Traversal involves decoding executable code at the target of a branch before analyzing the next executable code in the current location. This technique can also be defeated by opaque predicates [21]where one target of a branch contains complex instructions which never execute [22]. CFG generation involves identifying blocks of code such that they have one entry point and only one branch instruction with target addresses. Once blocks are identified, branch targets are identified to create a CFG. Compiler optimization techniques such as executing instructions in the delay slot of a branch cause issues to the CGF and require iterative procedures to generate an accurate CFG. The execution time of these algorithms is non-linear (n2) [23]. 2.7 Kernel integrity measurement schemes An attacker can compromise any measurements taken by a user level program by installing a kernel level rootkit. The kernel provides file system, memory management and system calls for user applications. The remote attestation scheme as implemented in this work requires kernel support. This section describes prior work done in implementing kernel integrity measurement. Co-processor schemes that are installed on the PCI slot of the PC have been used to measure the integrity of the kernel as mentioned in section 2.1. One scheme [13] computes the integrity of the kernel at installation time and stores this value for future comparisons. The core of the system lies in a coprocessor (SecCore) that performs integrity measurement of a kernel module during system boot. The kernel interrupt service routine (SecISR) performs integrity checks on a kernel checker and a user application checker. The kernel checker proceeds with attesting the entire kernel .TEXT section and modules. The system determines that during installation for the machine used for building the prototype, the .TEXT section began at virtual address 0xC0100000 which corresponded to the physical address 0x00100000, and begin measurements at this address. Another work focuses on developing a framework for classifying rootkits [24]. The authors state that there are three classes of rootkits, those that modify system call table, those that modify targets of system calls, and those that redirect references to the system call table by redirecting to a different location. A kernel level rootkit may perform these actions by using /dev/kmem device file, an example of such a rootkit is the knark rootkit [25]. The rootkit detector keeps a copy of the original System.map file and compares the current system call table’s addresses with the original values. A difference between the two tables indicates system call table modification. This system of detecting changes to system call table detected the presence of knark rootkit that modifies 8 system calls. The framework also detects rootkits like SucKIT [26] which overwrite kernel memory to create a fake system call table. Any user access to the system calls re directs to the new table. The rootkit checker determines if the current system call table starts at a location different that the original address, in which case a compromise is detected. LKIM [27] obtains hashes and contextual measurements to determine the integrity of the platform. In addition to taking hash measurements on kernel Text section, system call table, LKIM also takes measurements on some other descriptors such as inodes, executable file format handlers, Linux security model hooks and so on. The measurements taken are defined by a set of measurement instructions. The paper states that there is no silver bullet to prevent the Linux OS from forging results, hence propose a hypervisor based scheme instead of a native OS scheme. The hypervisor scheme involves changing Xen’s domain U to host the LKIM infrastructure. The domain hosting LKIM is provided Domain 0 privileges. 3. Threat model and Assumptions We assume that Mallory an attacker has complete control over software residing on Alice’s machine and Mallory possesses the power to start a clean copy of Alice’s installed program P to execute it in a controlled environment to return results to Trent. Mallory can also attempt to re-direct the challenge to another machine which runs a clean copy of P. We assume that Mallory will not perform transient attacks like patching P with malicious code at any given time t and then at any time t + ∆ replace the old instructions back and remove any modifications. This behavior can be classified as rootkit like behavior which will not be determined by the application level remote attestation. However, a rootkit like this would get detected in the kernel level remote attestation as described in section 7. We assume that Alice will trust the code provided by Trent and allow it to execute on the machine to be verified, and that Alice has means such as certificates and digital signatures to verify that the verification code (C) has been generated by Trent. We also assume that Alice is not running MAlice behind a NAT and that the machine has only one network interface. The reason to make these assumptions is that C takes measurements on MAlice to determine if it is the same machine that contacted Trent. If MAlice is behind a NAT then Trent would see the request coming from a router and measurements from MAlice. This work focuses on the general client platform where only one network interface is installed, and each network interface has only one IP address associated with it. In the case that there are many addresses configured on the same network interface, the code can be altered to populate all possible IP addresses that it reads from the interface and send them to Trent. Trent can parse through the result to find the matching IP address. For the user application attestation part, this work does not assume a compromised kernel. The verification code C relies on the kernel to handle the system calls executed through interrupts, and to read the file structure containing the open connections on the system. There are many system call routines in the Linux kernel and monitoring and duplicating the results of each of these may be a difficult task for malware. Reading the port file structure also requires support from the operating system. We will assume that the OS provides correct results when the contents of a directory and file are read out. Without this assumption, Remote Attestation cannot be performed entirely without kernel support. For the kernel attestation part, we assume that the kernel is compromised; system call tables may be corrupted, and a malware may have changed the interrupt descriptors. Runtime code injection is performed on a kernel module to measure the integrity of the kernel. It is assumed that Alice has means such as digital certificates to determine that the code being injected is generated by a trusted server. It is also assumed that the trusted server is the OS vendor or a corporate network administrator who has knowledge of the OS mappings for the client. 4. Overview of operations to be performed on Client end If Alice could download the entire copy of P every time the program had to be executed then Remote Attestation would not be required. However, since P is an installed application, Alice must have customized certain profile options, saved some data which will be cumbersome to create ever time. Alice uses P to contact Trent for a service, Trent returns to P: a challenge which is executable code (C). P must inject C in its virtual memory and execute it at a location specified by Trent. C computes certain measurements and communicates integrity measurement value M1 directly to Trent. This process is depicted in Fig. 1. Trent has a local copy of P on which the same sets of tests are executed as above to produce a value M0. Trent compares M1 and M0; if the two values are the same then Alice is informed that P has not been tampered. This raises the issue of verifiable code execution, in which Trent wants to be certain that C took its measurements on P residing inside MAlice. To provide this guarantee C executes some more tests on MAlice and returns their results to Trent. These checks ensure that C was not bounced to another machine, and that it was not executed in a sandbox environment inside a dummy P process within MAlice. There are many ways in which Mallory may tamper with the execution of C. Mallory may substitute values of M1 being sent to Trent such that there is no evidence of any modification to P having taken place. It is also possible that Mallory may have loaded another copy of P which has not been tampered inside a sandbox, execute C within it, and provide the results back to Trent. Mallory may have also redirected the challenge to another machine on the network making it compute and send the responses back to Trent. Without addressing these issues, it is not possible for Trent to correctly determine whether the measurements accurately reflect the state of P on MAlice. If Trent can determine that C executed on MAlice, and C was not executed in a sandbox then Trent can produce code whose results are difficult to guess and the results can indicate the correct state of P. Achieving these guarantees require that C provides Trent with a machine identifier and a process identifier. Trent can retain a sense of certainty that the results are genuine by producing code that makes it difficult for Mallory to pre-compute results. Once these factors are satisfied, Trent can determine whether P on MAlice has been tampered. The entire process of Remote Attestation is shown in Fig. 2. 4.1 Determining checksum and MD5 on P C computes a MD5 hash of P to determine if the code section has been tampered. Downloading MD5 code is an expensive operation as the code size is fairly large, and MD5 code cannot be randomized as it may lose its properties. Due to these reasons, the MD5 code permanently resides on P. To prevent Mallory from exploiting this aspect, a two phase hash protocol is implemented. Trent places a mathematical checksum inside C which computes the checksum on the region of P containing the MD5 executable code along with some other selected regions. Trent computes the results of the checksum locally and verifies if C is returning the expected value. C proceeds with the rest of the protocol if Trent responds in affirmative. Trent changes the operations of the checksum in every instance so that Mallory cannot use prior knowledge to predict the results of the mathematical operations. C does not take the checksums over fixed sized regions; instead Trent divides the entire area over which checksum is taken into multiple overlapping sub-regions, the boundaries of the sub-regions are defined inside C by Trent by moving the data pointer back by a random number that is generated during compilation of the C source code. For the prototype implementation, the method used to generate the random numbers was the ‘rand’ call, since rand call may not me truly random, we used the ‘srand’ call and used the current stack pointer of the source code generating program as the seed to the random number. The stack of all processes is randomized using Address Space Layout Randomization (ASLR) [28]. It can be noted that this is not as secure as using a cryptographically secure random number generator. In real world applications, Trent can use the Linux ‘/dev/random’ file [29] to read random numbers. The individual checksums are then combined and sent to Trent. This is depicted in Fig. 3. C performs MD5 hash on overlapping sub-regions of P defined in a similar fashion as above. A degree of obfuscation is added by following the procedure in Fig. 4. C initially takes the MD5 hash of the first sub-region (H1). It then obtains the MD5 hash of the next sub-region (H2). It then concatenates the two values to produce H1H2. Then a MD5 Hash of H1H2 is taken to produce H12. H12 is then concatenated with H3 to produce H12H3. H12H3 is hashed again to produce H23 and so on. This process is followed for all the sub-regions and sent to Trent. Drawing inferences from executable code is considered difficult as discussed in section 2. Randomizing the boundary overlaps between the sub-regions makes it difficult to predict the hash values being generated. Mallory has to execute the code to observe the computation being performed. The checksums are taken on overlapping sub regions to make the prediction of results more difficult for Mallory. This creates multiple levels of indeterminacy for an attack to take place. Mallory has to not only predict the boundaries of the sub-regions, but has to also deal with the overlap among the sub-regions. Overlapping checksums also ensures that if by accident the subregions are defined identically in two different versions of C, the results of computation produced by C are still different. This also ensures that some random sections of P are present more than once in the checksum, making it more difficult for Mallory to hide any modifications to such regions. MD5 checksum has been used in this prototype, it has been discovered that it has collisions. However, MD5 can be substituted easily with a different hashing algorithm in a software based attestation scheme, the same cannot be done easily in a TPM or hardware based attestation scheme. 4.2 Determining process identifiers. C determines whether it was executed inside a fake process or the correct P process by obtaining some identifiers. C determines the number of processes having an open connection to Trent on MAlice. This is obtained by determining the remote address and remote port combinations on each of the port descriptors in the system. C communicates to Trent using the descriptor provided by P and does not create a new connection. This implies that in an ideal situation there must be only one such descriptor on the entire system, and the process utilizing it must be the process under which C is executing. The passing of socket descriptor from P to C also addresses the issue of redirection of challenge to another machine partially. The only way for such a connection to exist on a machine is if Trent accepts the incoming request, otherwise the machine will not have a socket descriptor with the ability to communicate with Trent. If there is more than one process having such a connection then an error message is sent to Trent. If there is only one such process, C computes its own process id and compares the two values. If they match an affirmative message is sent to Trent. If the values do not match then it reports an error with an appropriate message to Trent. 4.3 Determining the Identifier for MAlice C has to provide Trent the guarantee that it was not re-directed to another machine and that it was not executed in a sandbox environment or pasted on another clean copy of P within MAlice. The first is achieved by obtaining any particular unique machine identifier. In this case the IP address of the machine can serve as an identifier. Trent has received a request from Alice and has access to the IP address of MAlice. If C returns the IP address of the machine it is executing on Trent can determine if both are the same machine or not. It can be argued that IP addresses are dynamic however there is little possibility that any machine will change its IP address in the small time window between a request by Alice to measurements being taken and provided to Trent. C determines the IP address of MAlice using System Interrupts. Mallory will also find it hard to tamper with the results of an Interrupt. The interrupt ensures that the address present on the Network interface is correctly reported to Trent. It can again be noted that Mallory may have changed the address of the network interface to match that of MAlice, but as these machines are not behind a NAT it would be quite difficult for Mallory to provide the identical address to another machine on an external network and communicate with that machine. On receiving the results of the four tests, Trent knows that P has not been tampered from the time of installation to the time of request of verification being sent from MAlice. 5. Design of Checksum code produced by Trent Trent has to prevent Mallory from analyzing the operations performed by C. Trent places a series of obfuscations inside the generated code along with a time threshold (T) by which the response from MAlice is expected. If C does not respond back in a stipulated period of time (allowing for network delays), Trent will know that something went wrong at MAlice. This includes denial of service based attacks where Trent will inform Alice that C is not communicating back. Fig. 5 shows a sample snippet of the C mathematical checksum code. The send function used in the checksum snippet is implemented using inline ASM. It is evident that in order to forge any results, Mallory must determine the value of checksum2 being returned to Trent. This requires that Mallory identifies all the instructions modifying checksum2 and the locations on stack that it uses for computation. To prevent Mallory from analyzing the injected code, certain obfuscations are placed in C as discussed below: 5.1 Changing execution flow and locations of variables on stack To prevent Mallory from utilizing knowledge about a previous instance of C in the current test, Trent changes the checksum operations performed by selecting mathematical operations on memory blocks from a pool of possible operations and also changes the order of the instructions. The results of these operations are stored temporarily in the stack. Trent changes the pointers on the stack for all the local variables inside C for every instance. These steps prevent Mallory from successfully launching an attack similar to those used for HD-DVD key stealing [30, 31]. 5.2 Inserting Dummy Instructions Program Analysis is a non linear operation as discussed in section 2. An increase in the number of instructions that Mallory has to analyze decreases the time window available to forge the results of these operations. Trent inserts instructions that never execute and also inserts operations that are performed on MAlice but not included as part of the results sent back to Trent. These additions to the code make it difficult for Mallory to correctly analyze C within a reasonable period of time. 5.3 Changing instructions during execution Mallory may perform static analysis on the executable code C sent by Trent. A good disassembler can provide significant information on the instructions being executed, and allow Mallory to determine when system calls are made and when function calls are made. In addition it may also allow Mallory to see the area of code which reads memory recursively. If these tools do not have access to the code to be executed before it actually executes, then Mallory cannot determine the operations performed by C. Trent removes some instructions in C while sending the code to MAlice and places code inside C with data offsets such that during execution, this section in C changes the modified instructions to the correct values. This way without executing C it is difficult for Mallory to determine the exact contents of C. 6. Implementation of user application attestation In this section the implementation of the techniques proposed in this paper are described. architecture All the coding was done using the C language on Intel x86 machines on Linux kernel using the gcc compiler. 6.1 Generation of C by Trent Trent generates C for every instance of verification request. If Trent sent out the same copy of the verification code, then Mallory can gain significant knowledge on the individual checks performed by C, by generating new code for every instance of verification Trent mitigates this possibility. Trent also places obfuscations inside the code to prevent static analysis of the executable code. The operations performed by Trent to obfuscate the operations performed during verification are discussed below. 6.1.1 Changing execution flow and locations of variables on stack Changing execution flow and locations of stack serves to prevent the program analysis on C. The source code of C was divided into four blocks which are independent of each other. Trent assigns randomly generated sequence numbers to the four blocks and places them accordingly inside C source code. The checksum block is randomized by creating a pool of mathematical operations that can be performed on every memory location and selecting from the pool of operations. The pool of operations is created by replacing the mathematical operation with other mathematical operation on the exact same location. Once the mathematical operations are selected in the C source code, Trent changes the sub-regions for the checksum code and the MD5 calling procedure. This is done by replacing the numbers defining the sub-regions. C has sub–regions defined in its un-compiled code. To randomize the subregions, a pre-processor is executed on the un-compiled C such that it changes the numbers defining the sub-regions. The numbers are generated such that the sub-regions overlap by a random value. C allocates space on the local stack to store computational values. Instead of utilizing fixed locations on the stack, Trent replaces all variables inside C with pointers to locations on the stack. To allocate space on the stack Trent declares a large array of type ‘char’ of size N, which has enough space to hold contents of all the other variables simultaneously. Trent executes a preprocessor which assigns locations to the pointers. The pre-processor maintains a counter which starts at 0 and ends at N-1. It randomly picks a pointer to be assigned a location and assigns it to the value on the counter and increments the counter using the size of the corresponding variable in question. This continues until all the pointers are assigned a location on the stack. Trent compiles C source code to produce the executable after placing these obfuscations. 6.1.2 Obfuscating instructions executed Mallory cannot obtain a control flow graph (CFG) or perform program analysis on the executable code of C provided the instruction is being executed by C cannot be determined. Trent changes the instructions inside the executable code such that they cause analysis tools to produce incorrect results. C contains a section (Crestore) which changes these modified instructions back to their original contents when it executes. Crestore contains the offset from the current location and the value to be placed inside the offset. Trent places information to correct the modified instructions inside Crestore. Crestore is executed prior to executing other instructions inside C and Crestore corrects the values inside the modified instructions. 6.2 Execution of C on Client’s Machine The executable code is received by the Client’s (Alice) machine. The received information contains the length of the code and the location where it should be placed and executed. Normally it is not possible to introduce new code into a process during run time. However Alice’s software (P) can use a Linux library call to place C at the required location and execute the code. C communicates the results of the verification back to Trent without relying on P. The details of its execution are discussed below. 6.2.1 Injection of code by P on itself P makes a connection request to Trent. Trent grants the request and provides the number of bytes of challenge to be received and follows it with providing the executable code of C. Trent also sends the information on the location inside P where C should be placed. P receives the code and prepares the area for injection by executing the library utility mprotect on the area. The code section of a process in the Intel x86 architecture is write-protected. This utility changes the protection on the code specified area of the code section and allows this area to be overwritten with new values. Once the injection is complete P creates a function pointer which points to the address of the location where the code was injected and calls the function using the pointer, transferring control to C. 6.2.2 Obtaining measurements on the target machine C obtains certain identifiers on MAlice that allow Trent to identify whether it indeed executed at the correct machine and process. These identifiers have to be located outside the process space of P; therefore C computes the following values in order to send them to Trent. The IP address of MAlice, mathematical checksum on the MD5 code residing inside P, MD5 hash values of overlapping sub-regions inside P, and the process state that allows C to determine whether it was executed inside a sandbox. The first involves identifying the machine on which it is executing. Trent received an incoming connection from Alice, hence it is possible to track of the IP address of MAlice. Although most IP addresses are dynamic, there is little probability of an IP address changing in the small time window between a request being sent and C taking its measurements. C does not utilize the system call libraries to obtain values. It utilizes interrupts to execute system calls. This involves loading the stack with the correct operands for the system call, placing the system call number in the A register and the other registers and executing the interrupt instruction. The sample code for creating a socket is shown in Fig. 6. Reading the IP address involves creating a socket on the network interface and obtaining the address from the socket by means another system call – ioctl. The obtained address is in the form of an integer which is converted to the standard A.B.C.D format. After this, the address is sent to Trent using the send routine inside the socketcall system call. It must be noted that the send is done using the socket provided by P and not using a new socket. This is done so that Mallory cannot bounce C to another machine. If Mallory did that, then Mallory must provide an existing connection to Trent. However as connections to any machine can exist only with Trent’s knowledge, this situation cannot arise. Trent verifies the address of the machine and sends a response to C which then proceeds to take checksum on some portions of the code and follows up with an MD5 hash of the entire code section. As discussed in section 4.2 and 6.3, the sub-regions are defined randomly and such that they overlap. C sends the checksum and MD5 results to Trent utilizing the system interrupt method for send as discussed above. C obtains the pid of the process (P 0) under which it is executing using the system interrupt for getpid. It then locates all the remote connections established to Trent from MAlice. This is done by reading the contents of the ‘/proc/net/tcp/’ file. The file has a structure shown in Fig. 7. As seen in figure there is a remote address and port information for every connection that allows C to identify any open connection to Trent. Once all the connections are identified, C utilizes the inode of each of the socket descriptor to locate any process utilizing it. This is done by scanning the ‘/proc/<pid>/fd’ folder for all the running processes on MAlice. In the ideal situation there should be only one process id (P 1) utilizing the identified inode. If it encounters more than one such process, then it sends an error message back to Trent. Once the process id P 1 is obtained, C measures if the id P 0 and the id P 1 are the same. If so, C sends an affirmative to Trent. These measurements allow Trent to be certain that C executed on P residing on MAlice. 7. Remote kernel attestation To measure the integrity of the kernel we implement a scheme which is similar to the user application attestation scheme. Trent′ is a trusted server who provides code (Ckernel) to MAlice. It is assumed that Alice has means such as digital signature verification scheme to determine whether Ckernel was sent by Trent′. Alice receives Ckernel using a user level application Puser, verifies that it was sent by Trent’ and places it in the kernel of the OS executing on MAlice. Ckernel is then executed which obtains integrity measurements (Hkernel) on the OS Text section, system call table, and the interrupt descriptors table. Ckernel passes these results to Puser, which returns these results to Trent′. If required Ckernel can encrypt the integrity measurement results using a one time pad or a simple substitution cipher, however as the test case generated is different in every instance, this is not a required operation. Figure 8 depicts this process. Trent′ also provides a kernel module Pkernel that provides ioctl calls to Puser. As seen in figure 8a, Puser receives Ckernel from Trent′. In figure 8b, Puser forwards the code to Pkernel. It is assumed that Pkernel has the ability to verify that the code was sent by Trent′. Pkernel places the received code in its code section at a location specified by Trent′ and executes it. Ckernel obtains an arithmetic and MD5 checksum on the specified regions of the kernel on MAlice and returns the results to Puser as seen in figure 8c. Puser then forwards the results to Trent′ who determines whether the measurements obtained from the OS on MAlice match with existing computations (figure 8d). Since Trent′ is an OS vendor or a corporate network administrator, it can be assumed that Trent′ has local access to a pristine copy of the kernel executing on M Alice to obtain expected integrity measurement values generated by Ckernel. Although this seems like Trent′ would need infinite memory requirements to keep track of every client, most OS installations are identical as they are off the shelf. In addition if Trent is a system administrator of a number of machines on a corporate network, Trent′ would have knowledge of the OS on every client machines. 7.1 Implementation The kernel attestation was implemented on an x86 based 32 bit Ubuntu 8.04 machine executing with 2.6.24-28-generic kernel. In Linux the exact identical copy of the kernel is mapped to every process in the system. Since we use the system calls, and software interrupts for the application attestation part, this section describes the integrity measurement of the text section (which contains the code for system calls and other kernel routines), the system call table and the interrupt descriptor table. The /boot/System.map-2.6.24-28-generic file on the client platform was used to locate the symbols to be used for kernel measurement. The kernel text section was located at virtual address 0xC0100000, the end of kernel text section was located to be at 0xc03219CA which corresponded to the symbol '_etext'. The system call table was located at 0xC0326520, the next symbol in the maps file was located at 0xc0326b3c, a difference of 1564 bytes. The 'arch/x86/include/asm/unistd_32.h' file for the kernel build showed the number of system calls to be 337. Since MAlice was a 32 bit system, the space required for the address mappings would be 1348 bytes. We took integrity measurements from 0xC0326520 - 0xC0326B3B. The Interrupt descriptor table was located at 0xc0410000 and the next symbol was located at 0xc0410800, which gives the IDT a size of 2048 bytes. A fully populated IDT should be 256 entries of 8 bytes each which gives a 2KB sized IDT, this is consistent with the System.maps file on the client machine. Trent′ also provides a kernel module (Pkernel) to the client platform which is installed as a device driver for a character device. Pkernel offers functionalities using the ioctl call. Puser receives the code from the trusted authority and opens the char device. Puser then executes an ioctl which allows the kernel module to receive the executable code. As in the user application attestation case, Trent′does not send the MD5 code for every attestation instance. Instead the trusted authority sends a driver code which populates a data array and provides it to the MD5 code which stays resident on Pkernel. To prevent Mallory from exploiting this, the trusted authority also provides an arithmetic checksum computation routine which is downloaded for every attestation instance. This provides a degree of extra unpredictability to the results generated by the integrity measurement code. Kernel modules can be relocated during compile time. This means that the Trent′ would not know where the MD5 code got relocated during installation of the module. In order to execute the MD5 code, the Trent′ requests the location of MD5 function in the kernel module from the client end. After obtaining the address, Trent′ generates the executable code Ckernel which has numerous calls to the MD5 code. At generation, the call address may not match the actual function address at the client end. Once Ckernel is generated, the call instructions are identified in the code and the correct target address is patched on the call instruction. Once this patching is done, Trent′ sends the code to the client end. The call address calculation is done as follows: call_target length_ofcall = -( (address_injected_driver ) - address_mdstring + call_locations[0] + ); code_in_file[jump_locations[0] +1 ] = call_target; Ckernel is loaded in a char array code_in_file. The location where Ckernel address to be injected is determined by Trent′ by selecting a location from a number of 'nop' locations in the module, this address is termed as address_injected_driver in the above code snippet. The call location in the generated executable code is determined by scanning the code for the presence of the call instruction. The length of call instruction is a constant value which is dependent on the current architecture. Finally the address of mdstring (which is the location of MD5 code) is obtained from the client machine as described above. The second statement changes the code array by placing the correct target address. This procedure is repeated for all the call instructions in the generated code. It must be noted that Ckernel calls only the MD5 code and no other function. If obfuscation is required, Trent′ can place some junk function calls which get executed by evaluating an ‘if statement’. Trent′ can construct several if statements such that they never evaluate to true. It can be noted that even if the client does not communicate the address of the MD5 code, Pkernel can be designed such that the MD5 driver provided by the trusted authority and the MD5 code reside on the same page. This means that the higher 20 bits of the address of the MD5 code and the downloaded code will be the same and only the lower 12 bits would be different. This allows the Trent′ to determine where Ckernel will reside on the client machine, and automatically calculate the target address for the MD5 code. This is possible because the C compiler produces lower 12 bits of function addresses while creating a kernel module and allows the higher 20 bits to be populated during module insertion. Once the code is injected, Trent′ issues a message to the user application requesting the kernel integrity measurements. Puser executes another ioctl which causes the Pkernel to execute the injected code. Ckernel reads various memory locations in the kernel and passes the data to the MD5 code. The MD5 code returns the MD5 checksum value to Ckernel which in turn returns the value to the ioctl handler in the Pkernel. Pkernel then passes the MD5 and arithmetic checksum computations back to Puser which forwards the results to the Trent′. If required the disable interrupt instruction can be issued by Ckernel to prevent any other process from obtaining hold of the processor. It must be noted that in multi processor systems disable interrupt instruction may not prevent a second processor from smashing kernel integrity measurement values. However, as the test cases are different for every attestation instance, Mallory may not gain anything by smashing integrity measurement values. 8. Results The time threshold (T) is an important parameter in this implementation. We aim to prevent an attacker Mallory from intercepting C and providing fake results to Trent. If T is too large then Mallory may be able to obtain some information about the execution of C. The value of T must take into account network delays. Network delays between cities in IP networks are of the order of a few milliseconds [32]. Hence measuring the overall time required for one instance of Remote Attestation and adding a few seconds to the execution time can suffice for the value of T. We obtained the source code for the VLC media player interface [33]. We removed some sections of the interface code and left close to 1000 lines of C code in the program. We measured various stages of the integrity measurement process. We took 2 pairs of machines running Ubuntu 8.04. One pair were legacy machines executing on an Intel Pentium 4 processor with 1 GB of ram, and the second pair of machines were Intel Core 2 Quad machine with 3 GB of ram. The tests measured were the time taken to generate code including compile time, time taken by the server to do a local integrity check on a clean copy of the application and time taken by the client to perform the integrity measurement and send a response back to the server. To obtain an average measurement for code generation we executed the program in a loop of 1000 times and measured the time taken using a watch. We also measured the time reported by system clock and found to be a slight variation (order of 1 second) in the time perceived by the human eye using the watch and that reported by the system clock at the end of the loop. The time taken for compiling the freshly generated code was measured similarly. These two times are reported in table 1. We then executed the integrity measurement code C locally on the server and sent it to the client for injection and execution. The time taken on the server is the compute time the code will take to generate integrity measurement on the server as both machines were kept with the same configuration in each case. These times are reported in table 2. It must be noted that the client requires a higher threshold to report results because it has to receive the code from the network stack, inject the code, execute it, return results back through the network stack to the server. Network delays also affect the time threshold. We can see from the two tables that it takes an order of a few hundred milliseconds for the server to generate code, while the integrity measurement is very light weight and returns results in the order of a few milliseconds. Due to this the code generation process can be viewed as a huge overhead. However, the server need not generate new code for every instance of a client connection. It can generate the measurement code periodically every second and ship out the same integrity measurement code to all clients connecting within that second. This can alleviate the workload on the server. A value for T can be suitably computed from the table taking into consideration network hops required and be set to a value less than 5 seconds. 9. Conclusion and Future work This paper implements a method for implementing Remote Attestation entirely in software. We also presented number of other schemes in literature that address the problem of program integrity checking. We reduced the window of opportunity for the attacker Mallory to provide fake results to the trusted authority Trent by implementing various forms of obfuscation and providing new executable code for every run. We implemented this scheme on Intel x86 architecture and set a time threshold for the response. As future work we plan to implement this scheme using the virtualization extensions. We also plan to extend this work to find out whether the client process continued executing after the Remote Attestation was successful. References [1] Web link. In brief and statistics: The H open source. Retrieved on October 4, 2010, http://www.h-online.com/open/features/What-s-new-in-Linux-2-635-1047707.html?page=5 [2] T. Ball, E. Bounimova, B. Cook, V. Levin, J. Lichtenberg, C. McGarvey, B. Ondrusek, S. K. Rajamani and A. Ustuner, "Thorough static analysis of device drivers," ACM SIGOPS Operating Systems Review, vol. 40, pp. 73-85, 2006. [3] A. Chou, J. Yang, B. Chelf, S. Hallem and D. Engler, "An empirical study of operating systems errors," in Proceedings of the Eighteenth ACM Symposium on Operating Systems Principles, 2001, pp. 73-88. [4] A. Seshadri, M. Luk, E. Shi, A. Perrig, L. Van Doorn and P. Khosla, "Pioneer: Verifying code integrity and enforcing untampered code execution on legacy systems," in ACM SIGOPS Operating Systems Review, 2005, pp. 116. [5] A. Seshadri, A. Perrig, L. van Doorn and P. Khosla. SWATT: SoftWarebased ATTestation for embedded devices. 2004 IEEE Symposium on Security and Privacy. pp. 272-282. [6] R. Kennel and L. H. Jamieson, "Establishing the genuinity of remote computer systems," in Proceedings of the 12th USENIX Security Symposium, 2003, pp. 295-308. [7] J. A. Garay and L. Huelsbergen, "Software integrity using timed excutable agents," in Proceedings of the 2006 ACM Symposium on Information, Computer and Communications Security, 2006, pp. 189-200. [8] U. Shankar, M. Chew and J. D. Tygar, "Side effects are not sufficient to authenticate software," in Proceedings of the 13th USENIX Security Symposium, 2004, pp. 89-102. [9] R. Kennel and L. H. Jamieson, "An Analysis of proposed attacks against GENUINITY tests," CERIAS Technical Report, Purdue University, 2004. [10] F. Stumpf, O. Tafreschi, P. Röder and C. Eckert, "A robust integrity reporting protocol for remote attestation," in Second Workshop on Advances in Trusted Computing (WATC’06 Fall), 2006. [11] R. Sailer, X. Zhang, T. Jaeger and L. Van Doorn, "Design and implementation of a TCG-based integrity measurement architecture," in SSYM'04: Proceedings of the 13th Conference on USENIX Security Symposium, 2004, pp. 223-228. [12] K. Goldman, R. Perez and R. Sailer, "Linking remote attestation to secure tunnel endpoints," in STC '06: Proceedings of the First ACM Workshop on Scalable Trusted Computing, 2006, pp. 21-24. [13] L. Wang and P. Dasgupta, "Coprocessor-based hierarchical trust management for software integrity and digital identity protection," Journal of Computer Security, vol. 16, pp. 311-339, 2008. [14] N. L. Petroni Jr, T. Fraser, J. Molina and W. A. Arbaugh, "Copilot-a coprocessor-based kernel runtime integrity monitor," in Proceedings of the 13th Conference on USENIX Security Symposium-Volume 13, 2004. [15] R. Sailer. IBM research - integrity measurement architecture. Retrieved on November 3, 2010, http://domino.research.ibm.com/comm/research_people.nsf/pages/sailer.ima.h tml [16] T. Garfinkel, B. Pfaff, J. Chow, M. Rosenblum and D. Boneh, "Terra: A virtual machine-based platform for trusted computing," ACM SIGOPS Operating Systems Review, vol. 37, pp. 193 - 206, 2003. [17] R. Sahita, U. Savagaonkar, P. Dewan and D. Durham, "Mitigating the lying-endpoint problem in virtualized network access frameworks," 18th IFIP/IEEE international conference on Managing virtualization of networks and services, 2007, pp. 135-146. [18] V. Haldar, D. D. Chandra and M. M. Franz, "Semantic remote attestation: A virtual machine directed approach to trusted computing," in USENIX Virtual Machine Research and Technology Symposium, 2004, pp. 29-41. [19] G. Wurster, P. C. van Oorschot and A. Somayaji, "A generic attack on checksumming-based software tamper resistance," in 2005 IEEE Symposium on Security and Privacy, 2005, pp. 127-138. [20] B. Schwarz, S. Debray and G. Andrews, "Disassembly of executable code revisited," in Proceedings of Working Conference on Reverse Engineering, 2002, pp. 45-54. [21] C. Collberg, C. Thomborson and D. Low, "Manufacturing cheap stealthy opaque constructs," in Proceedings of Working Conference on Reverse Engineering, 1998, pp. 184-196. [22] C. Linn and S. Debray, "Obfuscation of executable code to improve resistance to static disassembly," in Proceedings of the 10th ACM Conference on Computer and Communications Security, 2003, pp. 290-299. [23] K. D. Cooper, T. J. Harvey and T. Waterman, "Building a control flow graph from scheduled assembly code," [24] J. F. Levine, J. B. Grizzard and H. L. Owen. (2006, Detecting and categorizing kernel-level rootkits to aid future detection. IEEE Security & Privacy pp. 24-32. [25] Web link, "Information about the knark rootkit," Retrieved on November 9 2010. http://www.ossec.net/rootkits/knark.php [26] D. Sd. (2001), Linux on-the-fly kernel patching without LKM. [27] P. A. Loscocco, P. W. Wilson, J. A. Pendergrass and C. D. McDonell, "Linux kernel integrity measurement using contextual inspection," in 2007 ACM Workshop on Scalable Trusted Computing, 2007, pp. 21-29. [28] Web link, "Address space layout randomization," Retrieved on April 25, 2010. http://pax.grsecurity.net/docs/aslr.txt [29] Web link, "Linux man pages online - kernel random number generator," Retrieved on August 30, 2010. http://linux.die.net/man/4/random [30] Web link. Hackers discover HD DVD and blu-ray processing key - all HD titles now exposed. Retrieved on November 3, 2009. http://www.engadget.com/2007/02/13/hackers-discover-hd-dvd-and-blu-rayprocessing-key-all-hd-t/ [31] Web link, "Hi-Def DVD Security is bypassed," Retrieved on November 3, 2009. http://news.bbc.co.uk/2/hi/technology/6301301.stm [32] Web link, "Global IP Network Latency," Retrieved on January 17, 2010. http://ipnetwork.bgtmo.ip.att.net/pws/network_delay.html [33] Web link, "VLC media player source code FTP repository," Retrieved on February 24, 2010. http://download.videolan.org/pub/videolan/vlc/ Machine Pentium 4 Quad Core Test generation 12.3 5.2 Compilation time 320 100 Total Time 332 105 Table 1: Average code generation time in milliseconds on server end for Intel Pentium 4 and Core 2 Quad machines for one instance of the measurement Machine Pentium 4 Quad Core Server side execution time 0.6 0.4 Client side execution time 22 16 Table 2: Time taken in milliseconds to compute the measurements on server and on the remote client Figure Captions Figure Number Caption 1 Challenge response Overview 2 Protocol Overview 3 Hash obtained on overlapping sub-regions. Two instances have different sub-regions 4 Procedure for obtaining the MD5 Hash of the entire code section 5 Snippet from the checksum code 6 ASM code for creating a socket 7 Contents of /proc/net/tcp file 8 Kernel remote attestation scheme a. User application initiates attestation request b. User application sends attestation code to kernel c. Kernel returns integrity values to user application d. Verification of kernel integrity by trusted server Figures MAlice Trent Request P C C Measurements Results Fig. 1 1. Alice Trent Verification Request 2. Trent Alice Inject code at location, execute it 3. C Trent Machine Identifier C 4. Trent Proceed 5. C Trent Initial Checksum C 6. Trent Proceed 7. C Trent MD5 Hash of specified regions C 8. Trent Proceed 9. C Trent Test of correct process ID C 10. Trent Proceed/Halt Fig. 2 0 0 Checksum 1 Checksum 1 50 60 Checksum 2 Checksum 2 80 110 Checksum 3 Checksum 3 Checksum 4 150 160 Checksum 4 200 Fig. 3 200 Region 1 MD 5 H1 Concatenation H2 Region 1 + H1H2 MD 5 MD 5 H12 Region 3 Region N MD 5 MD 5 H3 + H12H3 + Fig. 4 MD 5 Result { …… x = <random value> a = 0; while (a<400) { checksum 1 += Mem[a]; if ((a % 55) == 0) {checksum2 += checksum1/x;} a++; } send checksum2; ….. } Fig. 5 __asm__(“sub $12, %%esp\n” “movl $2, (%%esp)\n” “movl $1, 4(%%esp) \n” “movl $0, 8(%%esp) \n” “movl $102, %%eax\n” “movl $1,%%ebx\n” “movl %%esp, %%ecx\n” “int $0x80\n” “add $12, %%esp\n” : “=a” (new_socket) ); Fig. 6 sl local_address 0: 0100007F:1F40 1: 00000000:C3A9 2: 00000000:006F 3: 0100007F:0277 4: 0100007F:0019 5: 0100007F:743A rem_address 00000000:0000 0A 00000000:0000 0A 00000000:0000 0A 00000000:0000 0A 00000000:0000 0A 00000000:0000 0A st tx_queue rx_queue tr tm->when retrnsmt uid timeout inode 00000000:00000000 00:00000000 00000000:00000000 00:00000000 00000000:00000000 00:00000000 00000000:00000000 00:00000000 00000000:00000000 00:00000000 00000000:00000000 00:00000000 0 0 0 0 0 0 Fig. 7 00000000 00000000 00000000 00000000 00000000 00000000 0 0 0 0 0 0 5456 1 f6eb0980 299 0 0 2 -1 4533 1 f6ec0000 299 0 0 2 -1 4473 1 f6f60000 299 0 0 2 -1 5690 1 f6ec0980 299 0 0 2 -1 5358 1 f6ec04c0 299 0 0 2 -1 5411 1 f6eb04c0 299 0 0 2 -1 Userland Kernel attestation request Trent′ Puser Ckernel Pkernel Operating System Fig. 8a Userland Userland Puser Puser Ckernel Hkernel Pkernel Pkernel Operating System Operating System Fig. 8b Userland Fig. 8c Kernel integrity measurements Puser OK Pkernel Operating System Fig. 8d Figure 8 Trent′