CDR MIKE BILZOR 16 AUGUST 2010 I Title and Introduction Using 3D Circuit Integration to Detect Malicious Inclusions in General Purpose Processors Proposed Dissertation Statement Hardware malicious inclusions in microprocessors present an increasing threat to U.S. high-assurance computing systems, particularly those of the Department of Defense, due to vulnerabilities at several stages in the acquisition chain. Existing testing techniques are limited in their ability to detect these maliciously modified integrated circuits. We propose a novel method, based on the evolution of three-dimensional (3D) integrated circuit fabrication techniques and on execution monitor theory, by which malicious inclusions, including those not detectable by existing means, may be detected and potentially mitigated in the lab and in fielded, real time operation. We propose to develop and implement techniques for detecting and mitigating hardware malicious inclusions by utilizing 3D connections to monitor the control and data flows in an untrusted, target commodity processor from a trusted attached processor called the "control plane". Research Goals There are a number of potential new security-related applications of circuit-level three dimensional (3D) architecture fabrication methods which provide certain novel capabilities. Our research will focus on developing new techniques for identifying hardware malicious inclusions, specifically those not detectable by existing methods, in general purpose processors. To date, no other work has leveraged the capabilities of 3D architectural techniques to identify malicious inclusions in processor hardware; existing approaches are either destructive, or operate externally, and only during the test phase, not during deployed use. We will conduct experiments in support of an assessment of the feasibility of the 3D approach to detecting malicious inclusions, specifically commenting on: Which types of malicious inclusion that a 3D system is best, and least, able to detect and mitigate Which types of malicious inclusion that a 3D system can detect and mitigate that a 2D system cannot How to most effectively mitigate the likeliest and most dangerous malicious inclusions using the 3-D analysis approach 1 II Problem Description Modern Weapons and High-Assurance Military Systems Rely on Microprocessors Today's Defense Department relies on advanced microprocessors for its high-assurance needs. Those applications include everything from advanced weaponry, fighter jets, ships, and tanks, to satellites and desktop computers for classified systems. Much attention and resources have been devoted to securing the software that runs these devices and the networks on which they communicate. However, two significant trends make it increasingly important that we also focus on securing the underlying hardware that runs these high-assurance devices. The first is the U.S.' greater reliance on processors produced overseas. The second is the evolution in the complexity of hardware, along with the ease of making malicious changes to it. Trusting the Supply Chain Every year, more microprocessors destined for U.S. Department of Defense (DoD) systems are manufactured overseas, and fewer are made inside the U.S. As a result, there is a greater risk of processors being manufactured with malicious inclusions, or "hardware Trojans," which could compromise high-assurance systems. This concern was highlighted in a 2005 report by the Defense Science Board, which noted a continued exodus of high-technology fabrication facilities from the U.S. [1]. Since this report, "more U.S. companies have shifted production overseas, have sold or licensed high-end capabilities to foreign entities, or have exited the business." [2] One of the Defense Science Board report's key findings reads, "Throughout the past ten years, the need for classified devices has been satisfied primarily through the use of government owned, government- or contractor-operated or dedicated facilities such as those operated by the NSA and Sandia. The rapid evolution of technology has made the NSA facility obsolete or otherwise inadequate to perform this mission; the cost of continuously keeping it near to the state of the art is regarded as prohibitive. Sandia is not well suited to supply the variety and volume of DoD special circuits. There is no longer a diverse base of U.S. integrated circuit fabricators capable of meeting trusted and classified chip needs." [1] Moving Fabrication Overseas Today, most semiconductor design still occurs in the U.S., but some design centers have recently developed in Taiwan and China [7]. In addition, major U.S. corporations are moving more of their front-line fabrication operations overseas for economic reasons: "Press reports indicate that Intel received up to $1 billion in incentives from the Chinese government to build its new front-end fab in Dalian, which is scheduled to begin production in 2010." [8] 2 "Cisco Systems has pronounced that it is a 'Chinese company,' and that virtually all of its products are produced under contract in factories overseas." [2] "Raising even greater alarm in the defense electronics community was the announcement by IBM to transfer its 45-nanometer bulk process integrated circuit (IC) technology to Semiconductor Manufacturing International Corp. (SMIC), which is headquartered in Shanghai, China. There is a concern within the defense community that it is IBM's first step to becoming a 'fab-less' semiconductor company. IBM is the only state-of-the-art IC manufacturer that has a 'trusted' take-or-pay contract with the Defense Department and the National Security Agency at its plant in Vermont. Intel, the other cutting-edge U.S. integrated circuit maker, does not want to do dedicated work for the U.S. government." [2] The author of [9] notes, "almost all field-programmable gate arrays (FPGAs) are now made at foundries outside the United States, about 80 percent of them in Taiwan. Defense contractors have no good way of guaranteeing that these economical chips haven't been tampered with. Building a kill switch into an FPGA could mean embedding as few as 1,000 transistors within its many hundreds of millions." In general, the large percentage of U.S. semiconductors manufactured in Taiwan is also a longer-term concern because of the political uncertainty of future China-Taiwan relations. In the case of political unification, which the U.S. may not be in a position to prevent, China could hypothetically gain access to the manufacture of millions more U.S.-bound processors in relatively short order, exacerbating supply chain concerns. Processors - More Complex, Designed in Software, Modifiable After Manufacture The Defense Science Board report observes, "Defense system electronic hardware ... has undergone a radical transformation. Whereas custom circuits, unique to specific applications, were once widely used, most information processing today is performed by combinations of memory chips and programmable microchips ... Of the two classes of parts, the latter have more intricate designs, which make them difficult to validate (especially after manufacturing) and thus more subject to undetected compromise." [1] Since modern processors are designed in software, the processor design plans become a potential target of attack. John Randall, a semiconductor expert at Zyvex Corp., notes that "any malefactor who can penetrate government security can find out what chips are being ordered by the Defense Department and then target them for sabotage. If they can access the chip designs and add the modifications, then the chips could be manufactured correctly anywhere and still contain the unwanted circuitry. " [9] In addition to the overseas fabrication threat, malicious design modifications could theoretically occur either outside or inside the United States. According to IEEE Associate Editor Sally Adee, "The Defense Department's assumption that onshore assembly is more secure than offshore reveals a blind spot." Adds Samsung's Victoria Coleman, "Why can't people put something bad into the chips made right here? " [9] 3 Such undetected logic can be inserted during the design phase, if malicious code is inserted into the design template, or even after a chip has been manufactured. "Chip alteration can even be done after the device has been manufactured and packaged, provided the design data are available, notes Chad Rue, an engineer with FEI ... Skilled circuit editing requires electrical engineering know-how, the blueprints of the chip, and an etching machine (which) shoots a stream of ions at precise areas on the chip, mechanically milling away tiny amounts of material ... You can remove material, cut a metal line, and make new connections ... The results can be astonishing: a knowledgeable technician can edit the chip's design just as easily as if he were taking 'an eraser and a pencil to it.' " [9] The "Kill Switch" Though reports of actual malicious inclusions are often classified or kept quiet for other reasons, some reports do surface, like this unverified account: "According to a U.S. defense contractor who spoke on condition of anonymity, a 'European chip maker' recently built into its microprocessors a kill switch that could be accessed remotely. French defense contractors have used the chips in military equipment, the contractor told IEEE Spectrum. If in the future the equipment fell into hostile hands, 'the French wanted a way to disable that circuit,' he said." [9] According to the New York Times, such a "kill switch" may have been used in the 2007 Israeli raid on a suspected Syrian nuclear facility under construction. The Times report cites an unnamed American semiconductor industry executive, claiming direct knowledge of the operation. [52] Summary High performance general purpose processors used in Department of Defense highassurance systems are increasingly being manufactured and assembled overseas. An adversary with sufficient resources could maliciously modify a general purpose processor at several different stages of the acquisition chain, from design and fabrication to assembly and transport. As discussed in the following sections, our current ability to detect and mitigate such malicious modifications in processors is limited, and therefore new methods need to be developed. 4 III Description of 3D Integration Techniques General Overview In the last few years, hardware manufacturers and scientific researchers have been studying methods of connecting silicon-based computational circuits in non-traditional ways. Up until now, integrated circuit manufacturing has been limited to designs that are essentially two-dimensional. Increasing the number of circuits per unit area has required decreasing the size of the features in the circuit. However, techniques for decreasing feature size are approaching their theoretical physical limits. New circuit interconnection methods under development allow two or more computational planes, each of them an essentially 2D structure, to be interconnected, allowing them to form a composite, threedimensional computing structure. The most immediate benefits of this technology relate to speed, time, and distance. At current computing speeds, electrons can only move a limited distance in one clock cycle. Admiral Grace Hopper was famous for demonstrating the distance that electromagnetic energy can travel in a nanosecond by showing off pieces of wire ("nanoseconds") just under a foot in length [19]. The farther away an external memory cache sits from the processor, for example, the more clock cycles it will take to conduct a memory transaction between them. In [20], the authors demonstrate reductions in the average wire lengths within a circuit, when implementing it with 3D technology, as compared to traditional 2D technology only. There are several different technologies under consideration for 3D integration. One promising method involves the creation of "vias", which are direct metal connections, much like ordinary wires. Since they will normally travel through a silicon plane, such they are often referred to as "through-silicon vias", or TSVs [20], which are also informally referred to as "posts". Other possible 3D connection technologies include so-called "wireless superconnect", "wire bonding," and "multi-chip modules" [6], as well as connection techniques relying on electrical inductance. A survey of some of the techniques for 3D interconnects under development is presented in [21]. 5 A survey of various 3D fabrication techniques from [21]. Terminology. In describing our approach, we will often use the following terms: 3D interconnect - a connection between one integrated circuit and another integrated circuit, each manufactured separately but attached during a later process. Sometimes we will informally call these "posts", independent of the attachment technique. 3D fabrication technique - any technique, from the above descriptions or otherwise, for joining two or more integrated circuits together at points within their computation circuits (not just along their edges). 3D security (or 3Dsec) - a security-oriented application of the 3D interconnect methods above, involving two integrated circuits: one untrusted target integrated circuit, sometimes referred to as the "computation plane," and one trusted integrated circuit, sometimes called the "control plane," which monitors and/or modifies the behavior of the former. Malicious Inclusion1 (MI) - an unauthorized modification to an integrated circuit that can cause the circuit's behavior to deviate from its specified functionality. Deviations may 1 Most of the hardware-oriented literature refers to malicious modifications as "hardware Trojans". The term "Trojan" is normally associated in the Computer Science literature with an attack that requires a naive action of acceptance by the victim, such as opening a link in an unsolicited e-mail. Since hardware modifications are covert and the victim is usually unaware, we will use the term "malicious inclusions" rather than "hardware Trojans" to preserve this distinction. 6 include, but are not limited to, unauthorized shutdown or impairment of the circuit, subversion of the circuit's functions to facilitate an attack on its running software, or corruption or compromise (leakage) of data passing through the integrated circuit. Using 3D Technology for Security Though a great deal of research has been done on the potential performance benefits of 3D integration, such as connecting an external memory cache, relatively little attention in the industry has been focused on the potential for using 3D technology to enhance security for high assurance users. The main ideas for using 3D technology in the security context are identified and outlined in [6]: By fabricating the control plane with functions that are complementary to (but separate from) the main processor, stacked interconnect offers the potential to add security mechanisms on just a small subset of devices without impacting the overall cost of the main processor. Just to be clear, we are advocating the fabrication of a processor which is always fabricated with connections built in for security (via an optional control plane chip). The difference between the system sold to the cost-sensitive consumer and the one that is sold to the high-assurance customer is only whether a specialized security device is actually stacked on top of the standard IC or not... A security overlay also provides the freedom to place specific security mechanisms directly above where they are needed ... For a given device type, reconfiguration of the security policy mechanisms can be implemented, thus efficiently supporting different user requirements. An overlay also provides several clear theoretical benefits. As always, it is critical to protect security mechanisms, but in this case they may be much less prone to tampering as they are when they are entangled with the monitored design." The computation and control planes would be constructed separately (with the interconnect locations specified in the design), then connected later, in a separate process, as in the following diagrams: 7 Two possible arrangements of the computation and control plane integrated circuits. Assumptions and Viability For the purpose of our investigation, we will assume that economically feasible techniques for connecting two or more integrated circuits will continue to develop. The particular method of interconnection that wins industry favor is not relevant to our approach, as long as it meets several criteria: The time it takes for an electrical signal to propagate and stabilize across an interconnect is sufficiently short. For example, in [24], Mysore, et al., perform a detailed analysis of a 3D interconnect system that requires only a single clock buffer, and hence only one cycle of latency, to facilitate 3D monitoring. Heat dissipation technology is sufficient to allow the passthrough of both data and power signals across the interconnects. The number of total interconnects that could be produced to facilitate control-plane monitoring will be sufficient (we will examine the approximate number of required interconnects as part of our research). A simulation in [20] illustrates the practicality of using on the order of 1,000-10,000 through-silicon vias (TSVs), for example. The viability of many of the physical assumptions underlying the 3D security approach was demonstrated in [24], in which the authors modeled a Pentium 4 computation plane being monitored, using 1,024 3D interconnects, by an XScale ARM processor. The authors used a variety of modeling techniques to demonstrate: 8 An increase to the (computation plane) commodity processor of only .021mm2 in area and 1.4% in power, as a result of adding the 3D connection points. An increase in power to drive data across the interconnects, with the monitoring plane attached, of 23%, with the potential for reducing that to around 8%. 3D hardware monitoring can be performed with significantly less power and shorter wires, compared to a comparable 2D monitoring scheme, because the monitoring plane can be placed much closer. In the simulation, the 3D approach consumes half the increased power compared to the 2D approach, and a twentieth of the increase in the area imposed on the computation plane. Even using the worst case thermal assumptions, tiling eight analysis chips on top of the computation chip only led to a temperature increase of about 2.5C. Relationship of 3D to Other Monitoring Approaches Why not implement the monitoring logic right in the computation chip itself? We are operating under the assumption that the commodity computation chip may be from an untrusted source, and anyone with sufficient access to modify the processor could also modify the monitoring logic. By adding the monitoring logic separately, via 3D integration, the monitoring logic can come from a more trusted source, be reconfigurable, and be isolated from threats to the computation plane during most of the development cycle. Why not put the monitoring logic in a coprocessor? A security coprocessor could use some of the same techniques we will explore, but is limited by the bandwidth and fidelity of the target processor's main connection to the printed circuit board. The 3D approach permits finer-grained access to the key architectural nodes within the target processor. Also, the 3D approach has the potential for performing the same security functions with shorter wires and lower power requirements [24]. Why not just run some comparable monitoring logic in a hypervisor? Any malicious inclusion in hardware has the potential to override the controls in the software above it, no matter how secure that software is. The emerging threat to the security of hardware must be dealt with by either by isolating the threat entirely (for example, by full control over the supply chain), or by detecting and mitigating the threat in hardware. Why not just obfuscate the hardware design? Obfuscation techniques have proved valuable in protecting intellectual property in software, and more recently in reconfigurable hardware and system-on-chip (SoC) designs [40]. However, in commodity microprocessors, such as those we consider in the 3D scenario, the performance cost of obfuscation techniques, in speed and power, will limit their applicability in commodity designs. That said, a microprocessor whose design has been obfuscated, but whose 3D posts make the same logical connections as the 9 unobfuscated design, could be monitored from a control plane in the same manner as the unobfuscated processor. In this sense, obfuscation and 3D monitoring are complementary, as far as deterring malicious inclusions - obfuscation makes malicious inclusions more difficult to implement, and 3D monitoring techniques could still be used to detect them. Couldn't the adversary just design a malicious inclusion that circumvents the 3D posts? It is true that an adversary, with sufficient knowledge of the processor design, the 3D post locations, and the monitoring scheme, could design a malicious inclusion which performs its malicious function and either bypasses the posts or modifies the signals being monitored. We know of no software or hardware technique for precluding an adversary with all this information from doing so. However, it would necessarily make the adversary's task much more challenging. Combining 3D security with obfuscation of the target processor's design adds another layer of difficulty for the adversary. 10 IV Security Policies, Level of Abstraction, and Threat Model Bits are Bits In any computing system, different concepts live at different levels of abstraction. High-level software constructs, such as those found in object-oriented programming, may or may not be meaningful to the operating system kernel. Low-level software constructs, such as a small loop performing an iterative computation over an array, will not be aware of object-oriented concepts, or the subjects and objects defined in an operating system security policy, for example. Similarly, concepts defined in software, like a dynamically linked code library, or even functions or types [34], may not have meaning at the processor level, in hardware. A passage from Dr. Richard Hamming's book "Learning to Learn: The Art of Doing Science and Engineering" emphasizes this point: We see that the machine does not know where it has been, nor where it is going to go; it has at best only a myopic view of simply repeating the same cycle endlessly. Below this level, the individual gates and two-way storage devices do not know any meaning - they simply react to what they are supposed to do. They too have no global knowledge of what is going on, nor any meaning to attach to any bit, whether storage or gating ... it is we who attach meaning to the bits (emphasis in original). The machine is a machine in the classical sense; it does what it does and nothing else [33]. Security Policies and Principles - A Higher Level of Abstraction In light of the previous discussion, we mention a few popular security models and concepts, and some of the characteristic constructs which comprise them, in order to show that they necessarily exist above the hardware level of abstraction. Basic access control safety policies: Subjects, Objects, Authorizations Lattice-Based Information Flow Policy: Objects, Subjects, Security Classes [29] Noninterference Information Flow Policy: Users, States, Commands, Outputs [31] Integrity Policy: Users, Constrained Data Items, Transformation Procedures [32] Reference Monitor Concept: Subjects and Objects [27] In each case, at least one of the constructs on which the policy is defined, such as subject or object, is defined at the software level of abstraction. Though the constructs in a processor, such as a memory word, an interrupt, or an executing instruction, may be supporting one of these higher-level constructs, the processor has no built-in awareness of what is represented by them at the higher level. Though some work has been done on using hardware support mechanisms to help facilitate the enforcement of security policies at the software level [44,45], those methods are orthogonal to the ones explored here. Our work focuses on using one processor to detect malicious changes made to another processor, using the target processor's design specification as a baseline for what constitutes non-malicious behavior. 11 Covert Channels One way a safety policy or an information flow policy can be violated is through exploitation of a covert channel. A covert channel is a conduit through which information can be conveyed from a process operating at a higher sensitivity level to a process operating at a lower sensitivity level. There are two basic types of covert channels, storage channels and timing channels. In [35], Kemmerer defines conditions necessary for covert channels and gives a structured methodology for identifying where they could potentially occur. The minimum requirements for a storage channel are identified as: The sending and receiving processes must have access to the same attribute of a shared resource. There must be some means by which the sending process can force the shared attribute to change. There must be some means by which the receiving process can detect the attribute change. There must be some mechanism for initiating the communication between the sending and receiving processes and for sequencing the events correctly. This mechanism could be another channel with a smaller bandwidth. And the minimum requirements for a timing channel are identified as: The sending and receiving processes must have access to the same attribute of a shared resource. The sending and receiving processes must have access to a time reference such as a real-time clock. The sender must be capable of modulating the receiver's response time for detecting a change in the shared attribute. There must be some mechanism for initiating the processes and for sequencing the events. We note here that covert channel analysis, while not a formally specified security policy, is an important security technique. It is also important to observe that, while the processes which communicate with each other via covert channel exist in software, the shared resource attribute can exist at either the software or the hardware level. Semantically, we might find it useful to distinguish between a covert channel, which is an implementation of a timing or storage channel as described by the Kemmerer criteria above, and a covert channel mechanism, which is the shared resource attribute through which the communication occurs. In this sense, the covert channel is the end-to-end communication, including the processes, while the covert channel mechanism is solely the medium through which communication occurs. Because hardware is primarily 12 process-unaware2, 3D monitoring techniques, which only observe hardware functionality, are therefore constrained to looking for a shared resource attribute or covert channel mechanism only. One note about the interface between a hardware covert channel mechanism (or "shared resource attribute") and a software process which tries to exploit it is that the covert channel mechanism must at some point pass through a "process-visible", or "software readable" part of the architecture, in order to meet the first criteria in each of Kemmerer's lists, above. Therefore, intuitively, covert channel prevention by a 3D system would involve monitoring all the process-visible elements of the architecture, and ensuring some type of flushing of all process-visible content during context switches, if the environment assumes that multi-level processes will be sharing execution time on the same processor, and that the sensitivity level of a process may be conveyed to the processor by the OS. Since the modified attribute must at some point be process-visible in order to be exploited by software, it also stands to reasons that monitoring non-process-visible processor elements will be unnecessary; i.e., monitoring all the process-visible elements is both necessary and sufficient, in terms of addressing this type of attack. However, it is important to note that the things that are observable at the process level include not only registers and flags which can be read directly using the instruction set interface, but also internal features whose values can be indirectly deduced. For example, a process with access to the system clock might be able to infer that a cache miss has occurred during a memory reference, or that a branch was not successfully predicted by a branch predictor [53,54], thereby gaining knowledge of the state of the cache or the branch-prediction buffer. 2 Some processors, like those of the Intel IA-32 architecture, provide hardware support for context switching, using features like the Task State Segment (TSS). These hardware features can facilitate resource (e.g., memory and I/O) assignment and bounds checking for a process, but they do not constitute the type process awareness we associate with the operating system, which can explicitly access the value of any resource being used by a software process. 13 A description of a 3D approach to covert channel mitigation was described for memory caches in [6], but this type of experiment as applied to an entire general purpose processor is beyond the scope of the proposed research, and is left to future work. Side Channels In this context, it is important to differentiate covert channel attacks from side channel attacks. Side channel attacks use some property, often a physical property such as heat or electricity, external to the system itself, in order to gain information about the system. An example is externally evaluating the electromagnetic characteristics of a circuit while the circuit performs cryptographic computations, in order to deduce some properties of the unencrypted data or the encryption key. Side channel attacks and analysis are important in hardware security, but orthogonal to the proposed investigation. The Processor Design Life Cycle A general-purpose processor's life cycle spans many phases, as summarized in the following chart: The potential for malicious modification varies from stage to stage. For example, some processors are designed and verified in facilities certified to be "trusted". This does not make their designs invulnerable, but gives us relatively greater confidence in their fidelity at this stage. However, the physical fabrication process, especially for high-performance processors, is largely beyond the control of DoD, for the latest-generation processor technology. At the other end of the design cycle, installation and operation, DoD and other high-assurance users will again normally have tight control of a processor's fielded environment. In 2007, DARPA provided industry with a subjective assessment snapshot of the relative risk in each of the phases [53]: 14 As discussed in a later section, existing methods for detecting malicious inclusions are primarily based on detecting physical changes in the power and timing characteristics of a processor, as observed from its input and output ports. These techniques rely on possession of a known good, or "golden", sample processor, which acts as a baseline, against which other processors are judged. This approach focuses on detecting changes made to a processor in the design or fabrication stage, but it will not detect an early design change that makes it way into all the processors in a production run, since the "golden" model would also be affected. More recently, Hicks, et al., outlined a technique for detecting some malicious designstage modifications [14]. In their approach, called Blue Chip, the high-level design is analyzed for potential malicious inclusions, and a combination of hardware and software uses interrupts to handle the potential threats. In this approach, the malicious change must already be present in the high-level design; if it is introduced afterward, Blue Chip will not detect it. Our research will attempt to identify ways of extracting sufficient information from both the architectural design specification and the high-level processor design as a basis for parallel construction of a 3D monitor - both its interconnects and its operation. In this way, malicious inclusions introduced in the low-level design phase, as well as the fabrication, assembly, and distribution phases, may be detected. 15 The Threat Model In theory, a malicious modification to a processor's design could be located almost anywhere, but in practice it will be governed by several limiting factors: The larger the modification, the easier it will be to detect. Existing work has successfully demonstrated nondestructive detection of some malicious inclusions which occupy as little as around .1% of the total processor area. [12,13,14] When using post-design modification techniques like focused ion beam (FIB) milling, it is very difficult and time consuming to make a large number of edited connections over widely dispersed elements of a processor without being detected. [41] FIB milling is also very challenging at state-of-the-art feature sizes. Therefore, we will focus our efforts on detecting malicious inclusions that are relatively small in size, local in scope, and that target those key circuits related to the expected subversive intent. We assume that the adversary's primary goals will depend on the hardware-hosted application, but are likely to center on either extraction of information or denial of service. We also note that, due to the presence of triggers and delayed activation [16], we can make no a priori assumptions about when a malicious inclusion will become active or inactive. For the purpose of this research, we will not consider attacks which reveal themselves only physically, such as thermal attacks, but rather those which reveal themselves logically, at the architectural level. Several researchers have described malicious inclusions, or hardware Trojans, in taxonomy form. Tehranipoor and Koushanfar summarized these efforts in [16]. Our 16 preliminary analysis suggests that malicious hardware can fit into more than one action category: In the next section, we discuss several methods for employing 3D monitoring techniques to counter the specific types of attacks that could be employed by a malicious inclusion. The focus of our experiments is on the first monitoring method discussed, a novel type of execution monitor governing the correctness of instruction-set execution. 17 V 3D Security for General Purpose Processors The components of a simple general-purpose processor are generally classifiable according to their function. For example, a circuit in a microprocessor may participate in control-flow execution (participate in fetch-decode-execute-retire), be part of a data path (like a bus), execute storage and retrieval (like a cache controller), assist with control, test and debug (as in a debug circuit), or perform arithmetic and logic computation (like an arithmetic-logic circuit, or ALU). This list may not be exhaustive, and some circuits' functions may overlap, but broadly speaking we can subdivide the component circuits in a processor using these classifications: Control Flow Circuits Data Paths Memory Storage and Retrieval Circuits Arithmetic and Logic Computation Circuits Chip Control, Test, and Debug Circuits The main focus of our research will be the detection of malicious inclusions in the first category, control flow circuits. However, in considering processor malicious inclusions, it is worth noting that in some cases a detection strategy is warranted, and in others a mitigation strategy may be preferable. Of course, if a malicious inclusion is detected, we will normally want to follow a detection of a malicious inclusion with some type of mitigation. The following table lists each of the circuit functional types from above, and pairs it with a potential 3D detection and/or mitigation strategy: Circuit Type Control Flow Detection/Mitigation Technique Control Flow Execution Monitor (planned subject of experiments) Chip Control, Test, and Debug Data Paths Memory Storage and Retrieval Arithmetic and Logic Computation Keep-Alive Protections Datapath Integrity Verification Load/Store Verification Arithmetic/Logic Verification We can associate these techniques with the malicious inclusion taxonomy from the previous section: 18 Our research will describe all five techniques, with emphasis on the execution monitor and keep-alive protections (relative to the chip's control, test, and debug circuits), which we view as addressing many of the more serious potential threats. Our experiments will demonstrate an implementation of the execution monitor, which governs the operation of the instruction set of a general-purpose processor. An Execution Monitor for Instruction Execution In a general-purpose processor, we ask the question: what does it mean for the execution flow to be "correct"? One way of characterizing the execution flow in a general-purpose processor is by the action of the control circuits. In general, we characterize the execution control circuits as the ones which carry out the implementation of the instruction sequence. In other words, some circuits in a processor perform their functions independently of the sequence of what instructions are loaded into the instruction register, and other circuits are activated or deactivated in unique combinations which depend on the sequence of instruction codes coming into the processor; our focus is on the latter type. In the proposed research, we plan to explore this dependency characterization more thoroughly, to see if the process of automatically identifying execution control circuits can be generalized to apply to all general-purpose processors, from their high-level design. Next, we consider a small example, to see whether the identification of such circuits lends itself to a particular monitoring and enforcement strategy. 19 Example Architecture The following example architecture is a bus-based MIPS architecture, given in MIT's Open Courseware materials, based on its architecture course [36]: In this example architecture, we have an instruction register, an arithmetic-logic unit (ALU), a set of 32 registers, and some on-chip memory. They share access to a common bus, and only one unit at a time may transmit a signal to the bus (more than one may read data from the bus). Each unit has an enable signal which gives it access to and from the bus. The memory and registers also have write signals, which permit the memory or registers to be written to. If the enable signal is high but the write signal is low, a read operation is assumed. For this architecture, all words are 32 bits, and there are 32 registers. The basic MIPS load-store instruction set is assumed. Also, this architecture maintains the program counter (PC) in a special register, number 31. The ALU performs basic operations like add, subtract, shift left/right, etc. In this simple example, we do not consider the additional complexities of a pipelined or superscalar execution, register renaming, out-of-order commit, speculative execution, or interrupts. Also, for simplicity, we assume in the example that the results of ALU and memory operations are immediately available (loaded on one cycle, and read the following cycle). The microcode representation in the chart is a way for us to analyze the inner workings of the expected flow, based on what instruction is issued. In the microcode 20 representation, we set a control signal to 1 if it is enabled, or high, and set it to 0 if it is disabled, or low. We set it a * to indicate "don't care," which in this context means that the signal could be either 0 or 1 (or floating), without affecting the correctness of the commanded operation. All of the instructions begin with the microcode for the "fetch" operation, and then proceed to different flows, such as add, shift, etc., based on the instruction named. In a more complex architecture we could add to the "fetch" state by putting in other states representing what happens during "interrupt," "retire," "commit," or "write back," etc., but for now we consider only "fetch" and an example ALU operation. Here we differ from the source diagram by coloring the execution control circuits blue. Why are they colored blue? Their operation, as illustrated in the microcode diagram, is dependent on the instruction being executed. Stated informally, data wires and other black (non-execution-control) wires perform their functions regardless of the instruction and state of the fetch-decode-execute-retire cycle, whereas the blue (execution-control) wires function in a manner dependent on the micro-state. It is this fact that we seek to leverage from a monitoring standpoint, relative to a given class of MIs that target elements of the execution control flow. In the example, we have selected the control wires, and colored them blue, based on observation, in the absence of any formal methodology. One of our research tasks is to examine whether an automatic general procedure can be developed to, in all general purpose processors, unambiguously distinguish the execution-control circuits from the non-execution-control circuits, in support of 3D monitoring. Suppose that we want to use this bus-based MIPS architecture to add the contents of two registers, and place the result value in a third register. In this architecture, registerregister operations always use registers rs and rt as the sources, and register rd as the destination. In the following diagrams, we illustrate how the operation is carried out, as the microcode representation spells out the action of the execution-control circuits. 21 22 Leak Attack Now consider how a malicious inclusion could potentially modify the control flow of this register-register operation. In this case, we design a small malicious inclusion that is able to read ALU inputs A and B; when it observes special trigger values on those circuits, it causes the arithmetic result to "leak" to a secret predetermined memory address by accessing the load_MA, Mem_Wrt, and En_Mem signals. It might look conceptually like this: 23 The register-register operation now executes the same way in steps one and two, but in the third step, the malicious inclusion is activated: Note in the chart the highlighted deviation from normal execution in the load_MA, Mem_Write, and en_Mem signals. Also note that when we considered only the correctness of the specified operation itself, we were often able to put a * (don't care) label for what happens to ancillary control signals during a micro-operation. However, in the context of detecting malicious inclusions, we are also concerned with preventing 24 unspecified additional functionality from occurring. Therefore, it appears that we would want to eliminate the use of * and instead specify a definite 0 or 1 (usually 0) for ancillary control signals. But how do we monitor and/or enforce the correct and complete operation, once it's been specified? Suppose that, using 3D interconnects, we are able to monitor the value of the control signals, from the 3D plane, and detect when execution deviates from the specification, as described by the microcode state. In the following diagrams, we denote with a green lightning bolt the signals that we might monitor. Now the execution proceeds as before, but with the monitoring in place: 25 Now the incorrect control signals in load_MA, Mem_Write, and en_Mem are detected in the third step. This type of attack is unlikely to be detected through normal verification methods (which might, for example, try all possible ALU operand combinations and check for the correct processor state after each operation) because of the computational challenge of trying all 264 potential combinations of (32-bit) ALU operands in the test set. In this simple example, the result of a register-register arithmetic operation was leaked to memory. Given a more complex circuit which includes privilege levels, privileged instructions, and a more powerful instruction set, more nefarious attacks are naturally possible. [25] Timing Requirements Note that the signals to be monitored must be able to be measured during some synchronous interval. Circuits operating at independent speeds or asynchronous clock cycles will probably not be able to participate in the same monitoring group without some additional complexity. Therefore, during our research we will examine signals grouped in such a way that each group may be sampled synchronously: 26 In addition, at the physical level the 3D monitoring circuit must sample a signal after it is stable and any gate delay has completed: How Execution Monitoring Might Look from the 3D Control Plane In the control plane, given the availability of a general-purpose processor, we could write a simple program to look for unspecified relationships among control signals. But the control plane logic, if it is complex, might not be fast enough to keep up with the computation plane microcode, or a general-purpose processor might not be available. We can construct a representation of the transition logic for the microcode using a finite automata, which executes on a circuit in the control plane: 27 In this example, the execution begins in the state labeled FETCH_0, and for the first step proceeds according to the instruction in IR. For subsequent transitions, the signals observed in the execution control circuits must match the ones expected, as indicated by each individual row in the microcode diagram. If any non-matching set of signals is observed, the automata enters a non-accepting state and remains there until it is reset by the control plane monitor, or the processor is reset. The control plane, on observing the FAULT state, would conclude that a violation of expected execution flow has occurred, then take some appropriate corrective action. Depending on the implementation, that action could include disabling the processor, invoking a failsafe mode of operation, or simply notifying the operating system in some way. Relationship to Execution Monitor Theory The DFA in this example has a defined start state, FETCH_0, but it differs from an ordinary DFA in one important aspect, namely the need to consider infinite-length inputs, since processor execution is unbounded. Büchi Automata are a special class of automata that allow for infinite-length inputs. For a Büchi Automata to accept an infinite length input, it must revisit at least one accepting state an infinite number of times. By observation, we can see that if we construct the automata as in the example above, the FETCH_0 state meets this criterion, since FETCH_0 state will be visited an infinite number of times if and only if the automata accepts an infinite-length input sequence. As long as the automata remains in an accepting state, the execution-flow predicate is 28 satisfied; if it ever enters a non-accepting state, it will never accept the input sequence (there will be no defined transitions out of a non-accepting state like FAULT to an accepting state), and the execution-flow predicate is violated. We note that automata like the one described in this example meet the criteria of a class of Büchi Automata called security automata, enforcing safety properties, as outlined by Schneider in [26]. Therefore, if we define our security policy P as "the execution control circuits assume sequential values only in accordance with the transitions permitted by the microcode specification," then P is enforceable by an execution monitor, using the security automata defined by the microcode representation. We believe that the proposed work will be the first to describe the of use of a security automata, as defined in [26], in a 3D execution monitor, and these notions will be formalized in our report. In a recent industry presentation, Abramovici and Bradley describe in general terms a somewhat similar approach for a 2D system-on-chip (SoC) design [47]. They propose a system using signal probe networks (analogous to the 3D posts described here) and security monitors, local on-chip reprogrammable security logic units which operate based on the signal probe network inputs. They mention that such a system could employ customer-specified finite-state machines in the security monitors, but they offer no further details on how the probe networks or security monitors should be constructed or should function. It remains to be demonstrated in our research that the concepts we have applied in this small example may be applied in the general case, and under what conditions, along with establishing criteria for completely and unambiguously identifying the execution control circuits, as well as the automata definition. We plan a demonstration of this approach in our processor simulation, showing that malicious inclusions that cause deviations from instruction-specified microarchitectural flow can be detected using 3D interconnects and a stateful execution monitoring system, such as the one described in the example. Keep-Alive Protections An adversary could also use MIs to disable an integrated circuit very directly. In modern processors, there are many opportunities for what could be described as "zero cycle" attacks, meaning there would essentially be no advance warning. Control, Test, and Debug Circuits A malicious inclusion could, for example, shut down a processor altogether. An example is the use of a test port, like those specified by the IEEE Standard 1149.1, Standard Test Access Port and Boundary-Scan Architecture [39]. Boundary scan functionality is found on most general-purpose microprocessors today. It can include both standard functionality, which is common across all devices, and proprietary functionality, which a manufacturer can build in, often without official documentation. During test and development, the boundary scan input/output pins can be connected to external test equipment. When the processor family is fabricated, the circuitry in the 29 microprocessor that supports the test functionality is still present. So, even if there is no external test equipment connected to the boundary scan input pins, there are still corresponding circuits on the chip that could carry out that functionality, such as halting or resetting the processor, if triggered - an easy denial of service attack avenue for a knowledgeable adversary to exploit through the use of malicious inclusions. Other Disabling Attacks Denial of service attacks need not derive only from control-test-debug circuits. For example, a malicious inclusion, once active, could access a processor's interrupt mechanism, repeatedly invoking false interrupts that were not caused in the manner specified, such as an actual input/output (I/O) transaction interrupt. Another method might be for a malicious inclusion with access to the input to the instruction register (IR) to override the intended instruction opcode with repeated no_op instructions, causing meaningful execution to be overwritten. Many other potential attacks may be imagined. Liveness, Availability, Execution Monitors, and Keep-Alive Protections Keeping a circuit functioning normally, so that it is kept available for the intended use, is often referred to by the term "liveness". As introduced by Lamport in [49] and formally defined by Alpern and Schneider in [48], liveness informally means that "something good eventually happens." Alpern and Schneider elaborate: Examples of liveness properties include starvation freedom, termination, and guaranteed service. In starvation freedom, which states that a process makes progress infinitely often, the "good thing" is making progress. In termination, which asserts that a program does not run forever, the "good thing" is completion of the final instruction. Finally, in guaranteed service, which states that every request for service is satisfied eventually, the "good thing" is receiving service. However, they go on to point out that, while EM enforcement mechanisms enforce security policies that are safety properties, "availability, if taken to mean that no principal is forever denied use of some given resource, is not a safety property - any partial execution can be extended in a way that allows a principal to access the resource, so the defining set of proscribed partial executions that every safety property must have is absent." This is certainly true in the case of a general-purpose processor. For example, the liveness policy statement "all memory read requests are eventually fulfilled" is never violated, since the possibility always exists that a pending read request will be fulfilled at some future time. As pointed out by Ryan, even defining liveness, or "availability," in terms of a limited time window does not make availability policies EM-enforceable: " Note that even though time-limited availability can be characterized as a trace property, it still cannot be enforced by an execution monitor. An execution monitor, by definition, can only block actions and cannot force the target system to perform an action." [55] In light of the previous discussion, rather than considering an execution monitor in this category too, we plan to consider how the 3D control layer may be used to employ simple 30 "keep alive" circuitry, which either disables or overrides the potentially offending control-test-debug circuits, which are susceptible to malicious inclusions, in the computation plane. We plan to discuss several of these potential denial of service attacks in more detail, and describe ways that 3D control plane keep-alive circuits could be used to mitigate them. This section of the proposed work does not include an experimental demonstration, and is independent of the execution control circuit experiments discussed earlier. Memory Storage and Retrieval A typical general-purpose processor will have some small amount of on-chip memory, such as an L1 cache. Informally, we can adopt a description of correct memory operation from Suh, et al.: "Memory behaves correctly if the value the processor loads from a particular address is the most recent value that it has stored to that address" [50]. A service in the 3D control plane might be used to validate the process of storage and retrieval. 3D Memory Verification Services One-way functions, like hash functions, are easy to compute in the forward direction but computationally complex to reverse. A great deal of research has established the value of hash functions for verifying the integrity of memory transactions [45,50]. When storing a small block of memory, we can separately store the hash of the memory block's value. Later, when the block is retrieved, we can compute the hash value of the retrieved block, and compare it with the stored hash. If the values do not match, then either the memory block, the hash function, or the stored hash value has been modified. In one potential scenario, the hash function and the stored hash value are kept in the 3D control plane, as a service. We assume, when later reading a memory block that has been stored, that a hash mismatch indicates the memory block was modified since it was last stored. A diagram of this type of service might looks like this: 31 In this simple example, the 3D interconnects include the memory address, the read and write enable signals, and the memory word, both into and out of the storage unit. When a store operation is commanded, the control plane unit computes the hash of the incoming memory word and stores it at the associated address. When a load operation is commanded, the control plane unit computes the hash of the outgoing memory word and compares it to the hash stored the associated address, if a valid hash exists for that location. 32 Note that the amount of storage in the control plane can be reduced by using a hash function that maps to a smaller domain size; however, we should avoid making it too small, to reduce the chance that a memory block modification goes undetected due to a hash collision. Error Correction in Memory Another important tool in considering the correctness of memory operation is the use of error correction codes (ECCs). A malicious memory modification, affecting only the value of some on-chip memory, may be detectable and even correctable using existing ECCs, which are fielded in many dedicated memory chips today. There are many ECC techniques in use [37] in memory; in our untrusted processor scenario, some or all of the error-correction circuitry might be relocated to the control plane, to keep it from being disabled. Our research will describe the general application of these notions, illustrating how the 3D control plane might be used to provide verification services for untrusted processor memory, at a reasonable cost in terms of interconnects and monitoring logic. Arithmetic and Logic Arithmetic and logic in a general-purpose processor takes several forms. It may include compositions of logic circuits for decision making, or perhaps a full floating-point computation unit (FPU) or an integer arithmetic-logic unit (ALU). Logic Circuits In general, the problem of ensuring correct operation of logic circuits is comparatively challenging, as noted by Dutta and Jas: Detecting and correcting errors in logic circuits is much more difficult than in memories. While concurrent error detection and correction mechanisms can be efficiently incorporated in memories due to their regular structure, logic circuits present a much greater challenge because of their irregular structure ... The easiest concurrent error detection (CED) scheme for logic circuits is to use duplication where the circuit is duplicated and the outputs are compared with an equality checker. While this is a simple method to implement, it requires more than 100% overhead and also it can not correct any of the errors. [38] In the 3D scenario, we can duplicate selected logic circuits in the control plane, and simply accept the overhead of duplication, comparing the duplicated results with the original. However, if the overhead of doing so is unacceptable but some reduced error detection rate is acceptable, a technique called non-intrusive concurrent error detection (CED) may useful. Dutta and Jas [38] provide a diagram summarizing how this class of techniques is implemented (left), and we conjecture a comparable arrangement in a 3D scenario (right): 33 This structure lends itself well to the 3D scenario, since the inputs, outputs, and functional logic are unchanged in the computation plane, while the prediction circuit, compaction, and comparison circuits, in the control plane, could provide an error indication without duplicating every gate of the original logic. Though it consumes less overhead than strict duplication, non-intrusive CED will not detect all errors or malicious changes in logic; also, construction of the prediction circuit may require extensive inputspace sampling [38]. Ultimately, the control plane logic verification technique will likely involve some tradeoff between completeness and duplication overhead. Arithmetic Circuits Here again, some form of duplication is the most complete method of verifying a circuit's correct operation. There may be other options for detecting a subset of errors or malicious modifications, such as using some type of checksum or bounds-checking on the outputs, but we would probably elect to duplicate an entire ALU or FPU in the control plane, if there are significant concerns about malicious modifications to an arithmetic unit. In our research, we will informally explore the design considerations involved in implementing this type of monitoring in a 3D scenario. 34 Data Path Integrity Our threat model grants an adversary the ability to implement a malicious inclusion which can modify the logical value of a circuit while the data is in transit, propagating from one part of the circuit to another. How is the correctness of data propagation along a circuit path defined, and how can incorrect operation be detected? We propose to informally investigate these questions. First, we must investigate whether there can exist a complete, consistent, and unambiguous method for defining what circuits are simply "data paths" in any generalpurpose processor. If we are able to define the control circuits, storage and retrieval circuits, and arithmetic and logic circuits constructively, then it may be the case that the data path circuits are simply "everything else." The next question that must be answered in the security context, in regard to data path integrity, is its scope. Informally, correct operation of the specified functionality asks only whether: a data path that transmits a value along a circuit, from the output of one element to the input of another element, always has the same logical value at both ends. We might call this notion transit invariance. However, transit invariance does not capture the problem of the presence of additional, possibly malicious, circuits that affect confidentiality, integrity, and availability of the data. These additional circuits may, for example, tap off the data path and create a copy of the signal that was not in the processor specification. In the following diagram, a signal propagates along a circuit through several buffers; its value does not change during propagation, but an unauthorized (i.e., not in the specification) copy of the circuit value is made and possibly used elsewhere: In this case, also detecting the unauthorized signal would require a new definition, which captures the notion that no additional functionality (in this case, no new circuits), beyond those in the processor specification, are permitted. The problem is, how does one (nondestructively) examine a complex processor, and determine that it has no "extra" circuits? The 3D monitoring techniques we have outlined seem better suited for detecting deviations from specified circuit behavior than for detecting additional circuit 35 behavior. It may be the case that, in terms of data path integrity, additional circuits are tolerable only if they have no effect on the other essential functional categories execution flow, control-test-debug, arithmetic and logic, and storage and retrieval. We plan to informally explore the question of data path integrity in a 3D monitoring context, but it is not a planned subject of our experiments. 36 VI Experimental Verification Plan To verify our research, we will conduct experiments on processor simulators, focusing on detecting malicious inclusions by monitoring execution flow. The demonstration will focus on the following tasks: Review and analyze existing malicious inclusion detection techniques Review existing malicious inclusion taxonomies, types, and examples Map the common microprocessor architectural features (such as load-store unit, reorder buffer, etc.) to the types of malicious inclusions to which they are vulnerable Identify the key computation-plane architectural nodes which must be monitored by interconnects for successful 3D detection of malicious inclusions, and explain their importance Detect known, predefined MIs using a 3D security in a simulation, using a simple monitor Detect unknown, non-predefined threats, using a generalized monitor Analyze the limits of what can and cannot be reasonably detected using these methods. Identify the tradeoffs in monitor size and the number of monitor points against performance, and describe an optimum balance of performance and security Provide evidence that the malicious inclusions detected in our demonstration are not detectable using previously described techniques During the experimentation phase, we will consider the following questions: Malicious Inclusions At the architectural level, what features are the easiest and most difficult to monitor using 3D methods, in terms of the size of logic and number of connections required? At the architectural level, what are the most and least important features, in terms of security threat potential, that must be monitored to detect MIs? What features of the computation plane architecture are most suitable for the application of 3D monitoring, in terms of ease of fabrication? What are the unique characteristics of MIs that are not detectable using existing testing methods? What are the characteristics of MIs that are detectable, or not detectable, using a 3D approach? What MIs are simplest to design, yet have the most serious potential security impact? Control Plane Monitoring Techniques Once the computation-plane data to be monitored is available, what are the lowest running-time control-plane algorithms for concluding the presence of MIs? What is their time and space complexity, and how does the complexity of the 3D approach compare with the complexity of other detection approaches? Can we implement a 37 state-based approach using automata? If so, how many states are required? How can processor microcode flow be verified the fastest? Within how many clock cycles can a monitor detect the activation of an MI? When the CPU in the computation plane is idle, can the control plane use the available cycles to conduct background checks on the computation plane, instead of only monitoring the computation plane during commanded execution? What should those tests look like, to maximize the detection rate per number of tests conducted? We propose to use general-purpose architectural designs as baselines for the computation plane and for the control plane. The computation plane model will then be modified with malicious inclusions that must be detected. The control plane model will be modified to connect to the computation plane with circuits that mirror where the 3D "posts" would be on a silicon chip. The control plane will be configured to continuously monitor the various posts, so it can follow the operation of the computation plane and identify the malicious behavior associated with MIs. The specifications for the computation plane, control plane, interconnections, and malicious inclusions, and the test results will all be included in our report. 38 VII Comparison With Other Work Naval Postgraduate School Technical Report - 3D Security Vision and Outline In [6], the authors introduce the fundamental approach to 3D security, and list several possible applications. Our work will expand on this, to provide an architecture-level demonstration, emphasizing the detection of malicious inclusions in hardware. Some Existing Malicious Examples In [5], the authors develop eight specific MIs (referred to by the authors as "hardware trojans") for an Alpha architecture, in the context of protecting a cryptographic key. They categorize MIs based on function: broadcast information to the attacker, compromise the host circuit, or disable the host circuit. They demonstrate how MIs can be designed in conjunction with circuit optimization, whereby the removal of some existing circuits helps mask the addition of malicious circuits, making the MIs more difficult to detect by existing power-analysis techniques. Their MIs operate at the register transfer level (RTL), but they note that some MIs could be developed to operate at the gate or transistor level. Our technique is not designed to detect these kind of MIs, since they are built for a special-purpose processing chip, and our method is designed for general-purpose processor architectures. This work focuses on MI design, whereas our work will focus on MI detection and response, which is not addressed in [5]. In [25], King, et al., develop two malicious inclusion attacks, one allowing an escalation-of-privilege backdoor, and another implementing a secret "shadow memory" system on the Leon3 processor. Our 3D execution monitor technique should detect the types of MIs created for this work, and we plan to test our system against either these particular attacks, on the Leon3 processor design, or comparable, similar attacks on a smaller processor design. Other MI Tests - Power and Timing The currently evolving category of test methods for detecting MIs may generally be described as "fingerprinting" of either power or timing signals. A processor may have measurable variations at its power ports under various inputs, or it may have measurable variance in output signal delay for certain inputs. By studying how these power levels or timing delays vary over a large sample of normal inputs on trusted "gold" chips, researchers are able to establish a "fingerprint" of the baseline behavior. They then can identify deviations from the "fingerprint" patterns if malicious inclusions are present, when testing uncertified chips. These promising techniques are described in [5], [12], [13], [14], and [15]. One common shortfall of nondestructive testing techniques like these, though, is they do not scale well to very large, complex chips, and tend to perform better when the malicious anomaly is located closer to an observation point on the exterior, and occupies a reasonable (greater than approximately .1%, in tests so far - more is better) portion of the overall number of circuits in the chip. In [5], the authors illustrate 39 techniques for inserting malicious inclusions while masking the effect of the added malicious circuits through balanced reductions in existing circuits nearby, using optimization or by eliminating unused legacy circuits. The existence of such disguise techniques, and the difficulty of detecting path delay and power changes over large-scale, complex chips will limit the application of these detection methodologies, especially against a capable adversary. Our approach is complementary to the "fingerprinting" methods in that 3D Security techniques have the potential for finding a malicious inclusion which does not produce large changes in the power or timing characteristics of the target processor. Malicious inclusions could be as small as a few dozen gates, and masked as in [5], and would therefore be very difficult to detect using these existing techniques. These MI detection methods are very difficult to scale up, and can be performed only prior to installation; our techniques will be easier to scale, and can run continually during operations in the field. Detecting Network Intrusions, Viruses, and Malicious Code Our proposal differs from network intrusion detection, software virus detection, and other similar systems in two ways: First, viruses and intruder network packets exist in software, at a higher level of abstraction than the level we propose to monitor. Though the term "hardware virus" has been used in some literature, a virus differs from a hardware malicious inclusion in the sense that a virus seeks to replicate and propagate itself to other systems, as is possible in software, whereas a malicious inclusion seeks to perform its function in place, in hardware, on one system only. Second, the concept of "correct flow" of control may not be well defined in software, due to software's ability to change configurations and evolve over time. However, correct flow is well defined in a microprocessor architecture (here we propose to monitor non-programmable, general purpose chips; the problem of monitoring reprogrammable circuits like FPGAs is not considered). In general, the processor design permits us to a priori identify certain sequences or combinations of control circuit activations which will never be authorized; using the 3D approach, it should be possible to detect and mitigate some such sequences that are not detectable using existing methods. The principal advantage of 3D that facilitates this is the ability to place the 3-D interconnects precisely at any key circuits which need monitoring, not just the circuits accessible via external pins. Part of our research is to examine which type of internal control signal and which internal components of the integrated circuit need to be monitored, and which do not, for each type of malicious inclusion. As noted in [25], hardware malicious inclusions may facilitate subsequent software attacks; we do not propose to consider the software side of such an attack in the scope of this research, but do propose to consider how to detect hardware changes that might facilitate it. 40 Reference Monitors The reference monitor concept, first outlined in [27], has been well defined and explored in the realm of high-assurance computing. The general properties and limitations of security enforcement mechanisms, especially using automata-like structures, are discussed in [26]. Sterne, et al., assert that "the security properties required by most organizations cannot be enforced unilaterally by a reference monitor, even when combined with the other components of a conventional Trusted Computing Base (TCB)." [28] We are not proposing a new or improved reference monitor or enforcement mechanism in the sense of [26] or [27]. A reference monitor is a system which tries to prevent subjects from gaining unauthorized access to objects, according to some definition, usually in the operating system, of subject, object, and authorized access. Subjects and objects are defined, in that context, at a coarser granularity and a higher level of abstraction than the level at which we propose to operate. We propose to monitor and verify the correctness of the microprocessor operation with respect to its definition in an architectural-level specification, not with respect to subjects and objects. A malicious inclusion does not know what a subject or an object is, but rather is preprogrammed to modify the operation of the microprocessor in some way, such as disabling the protection logic of a memory management unit [25], leaking a secret [5], or surreptitiously disabling the processor entirely [9]. Our method examines the mechanisms by which malicious inclusions perform these acts, and proposes new ways to detect malicious inclusions that are not detectable through existing means. Related Work As illustrated in [6], a 3D control plane could be used as one way of enforcing a noninterference policy within a single shared cache, among two or more processes operating at differing sensitivity levels or security compartments. In particular, the control plane could prevent the shared resource from being used as an overt storage channel or covert timing channel between the processors. Other potential 3D security applications, such as enforcing a security policy over interprocessor connections, network interfaces or other I/O devices, are also described there. To our knowledge, the cache controller is the only demonstration to date of the use of 3D integration techniques in a security-oriented application. 3D techniques have been demonstrated for a variety of other uses, including imaging, performance monitoring, and adding cache memory [24,42,43]. 41 VIII Tentative Chapter Outline. Chapter 1: Title and Introduction Dissertation Statement Research Goals Chapter 2: Problem Statement Modern Weapons and High-Assurance Military Systems Rely on Microprocessors Trusting the Supply Chain Moving Fabrication Overseas Processors - More Complex, Designed in Software, Modifiable After Manufacture The "Kill Switch" Summary Chapter 3: Description of 3D Integration Techniques General Overview Terminology Using 3D Technology for Security Assumptions and Viability Limiting Factors and Alternatives Relationship of 3D to Other Monitoring Approaches Chapter 4: Security Policies, Level of Abstraction, Processor Design, and Threat Model Bits are Bits Security Policies and Principles - A Higher Level of Abstraction Covert Channels Side Channels The Processor Design Life Cycle The Threat Model Chapter 5: 3D Security for General Purpose Processors An Execution Monitor o An Example Architecture o A Leak Attack o Timing Requirements o How Execution Monitoring Looks from the Control Plane o Relationship to Execution Monitor Theory Keep-Alive Protections o Control, Test, and Debug circuits o Other Disabling Attacks o Liveness, Availability, Execution Monitors, and Keep-Alive Protections Storage and Retrieval o 3D Memory Verification Services 42 o Error Correction in Memory Arithmetic and Logic o Exploration of verification alternatives Data Paths o Examples and Definitions Chapter 6: Experimental Verification and Results Architectural Preliminaries Review and comparative analysis of existing MI detection techniques Review of MI taxonomies, types, and examples Mapping the common microprocessor architectural features to the types of malicious inclusions to which they are vulnerable* Identifying the key computation-plane architectural components which must be monitored by posts for successful 3D detection of malicious inclusions, and explain their importance* Detection of known, predefined MIs using a 3D security simulation, using a simple monitor* Detection of unknown, non-predefined threats, using a generalized monitor* Discuss the limits of what can and cannot be reasonably detected using these methods. Identify the tradeoffs of monitor size, number of monitor points against performance, and describe an optimum balance of performance and security* Evidence that the MIs detected in our demonstration are not detectable using previously described techniques Chapter 7: Related Work NPS Technical Report - 3D Security Vision and Outline Some Existing Malicious Examples Other Malicious Inclusion Tests - Power and Timing Detecting Network Intrusions, Viruses, and Malicious Code Reference Monitors Related Work - a 3D Cache Controller Chapter 8: Conclusions, Related Work, and Recommendations for Future Work. Viability and Limitations of the 3D security approach for MI detection* Relationship between 3D monitoring and related chip functions like performance monitoring, error detection, debugging, and developmental test* Future Work Application to the multiprocessor environment and advanced processors Target system background checks (idle cycle verification) Using 3D techniques to identify and correct non-malicious errors *Proposed Contribution 43 Appendix A: Alternatives to 3D Monitoring - Other Mitigation Strategies U.S. Government Initiatives, Existing Mitigation Strategies The Trusted Foundry Approach The Domestic Commercial Approach to Processor Trust DARPA Trust in Integrated Circuits Program Existing Chip Testing and Its Limitations Appendix B Architectural Layout, Key Security Nodes, and Interconnection Diagrams Appendix C 3D Monitoring System Code for the Control Plane Appendix D Output of Detection Results from All Simulations 44 IX Research Plan and Proposed Schedule Preliminaries Review existing examples of hardware malicious inclusions. Identify and map out the strategic computation and control monitoring points in general purpose microprocessor architectures. (Aug-Oct 2010) Develop expertise with integrated circuit software simulation tools. Select the software simulation tools for our demonstrations. (Jul-Sep 2010) Study the interrelationship between 3D security monitoring and related mechanisms, such as chip debugging, developmental testing, and performance monitors, and identify where any synergistic overlap may exist. (Aug-Oct 2010) Map the common architectural features of a microprocessor to existing taxonomies of malicious inclusions, like the one in [16]. In other words, for each common element, such as a memory cache, arithmetic-logic unit, floating point computation unit, execution pipeline, etc., identify the malicious inclusions that could attack that particular unit, describe how, and provide example sketches of how they would work. From these, select a few exemplary candidates for detection. (Aug-Oct 2010) Identify the type of monitoring system most suitable for detecting the example MIs. (Aug-Oct 2010) Select a target demonstration architecture and demonstrate its basic operating functions in software simulations. Compile examples of known (unclassified) examples of general MIs (of any architecture) from the literature and analyze their characteristics. (Aug-Oct 2010) Validation Experiments Design and implement a 3D security simulation with the elements of the computation plane, control plane, interconnections, and a monitoring system modeled after the one described earlier. (Oct-Dec 2010) Implement a subset of the proposed monitoring system in the control plane, and demonstrate the use of 3D security against several known (predefined) MIs. (OctDec 2010) Implement the monitoring system in the control plane more completely, and demonstrate detection and response against unknown MIs. (Jan-Mar 2011) Technical report or conference submission. (Spring 2011) Write-up and presentation. (Dissertation drafted by May-June 2011. Graduation target September 2011) 45 X References [1] Report of the 2005 Defense Science Board Task Force on High Performance Microchip Supply. Office of the Undersecretary of Defense for Acquisition, Technology, and Logistics. February, 2005. [2] McCormack, Richard. DoD Broadens "Trusted" Foundry Program to Include Microelectronics Supply Chain. Manufacturing & Technology News. Thursday, February 28, 2008. [3] Pearson, Chris. The Importance of "Trusted Suppliers". Wordpress.com. March 26, 2010 [4] National Security Council of the United States. Comprehensive National Cyber Security Initiative (partially declassified 2 March, 2010). http://www.whitehouse.gov/cybersecurity/comprehensive-national-cybersecurityinitiative. [5] Yier Jin, Nathan Kupp, and Yiorgos Makris. Experiences in Hardware Trojan Design and Implementation. Proceedings of the IEEE International Workshop on Hardware-Oriented Security and Trust (HOST), San Francisco, CA, July 2009. [6] Ted Huffmire, Tim Levin, Cynthia Irvine, Thuy Nguyen, Jonathan Valamehr, Ryan Kastner, and Tim Sherwood. High-Assurance System Support Through 3D Integration. Naval Postgraduate School, 9 November, 2007. [7] Yinung, Falan. Challenges to Foreign Investment in High-Tech Semiconductor Production in China. Web version. May, 2009. [8] Nystedt, Dan. Intel Got its New China Fab for a Bargain, Analyst Says. CIO.com, 2007. [9] Adee, Sally. The Hunt for the Kill Switch. IEEE Spectrum, May 2008. http://spectrum.ieee.org/semiconductors/design/the-hunt-for-the-kill-switch [10] Defense Advances Research Projects Agency, Arlington, Va., Contract Number HR0011-08-C0005). http://www.defenseprocurementnews.com/2010/02/#ixzz0kAvJaBtH [11] Adee, Sally. Trust in Integrated Circuits. IEEE Spectrum. Posted 1 May, 2008. http://spectrum.ieee.org/tech-talk/semiconductors/devices/trust_in_integrated_circuits. 46 [12] Yier Jin and Yiorgos Makris. Hardware Trojan Detection Using Path Delay Fingerprint. Proceedings of the IEEE International Workshop on Hardware-Oriented Security and Trust (HOST), Anaheim, CA, June 2008. [13] Dakshi Agrawal, Selcuk Baktir, Deniz Karakoyunlu, Pankaj Rohatgi, and Berk Sunar. Trojan Detection Using IC Fingerprinting. 2007 IEEE Symposium on Security and Privacy. [14] Reza Rad, Jim Plusquellic, and Mohammad Tehranipoor. Sensitivity Analysis to Hardware Trojans Using Power Supply Transient Signals. 2008 IEEE International Workshop on Hardware Oriented Security and Trust (HOST '08), pages 3-7. [15] Francis Wolff, Chris Papachristou, Swarup Bhunia, and Rajat Chakraborty. Towards Trojan-Free Trusted ICs: Problem Analysis and Detection Scheme. [16] Mohammed Tehranipoor and Farniaz Koushanfar. A Survey of Hardware Trojan Taxonomy and Detection. IEEE Design and Test of Computers, vol. 27, issue 1, January/February 2010. [17] Vishnu Patankar, Alok Jain, and Randal Bryant. Formal Verification of an ARM Processor. Twelfth International Conference on VLSI Design. Goa, India. January 1999. [18] Eric Beyne, Piet De Moor, Wouter Ruythooren, Riet Labie, Anne Jourdain, Harrie Tilmans, Deniz Sabuncuoglu Tezcan, Philippe Soussan, Bart Swinnen, Rudi Cartuyvels. Through-Silicon Via and Die Stacking Technologies for Microsystems-integration. International Electron Devices Meeting, 15-17 December, 2008. [19] Naval History and Heritage Command. Biographies in Naval History: Rear Admiral Grace Murray Hopper. http://www.history.navy.mil/bios/hopper_grace.htm [20] Dae Hyun Kim, Krit Athikulwongse, and Sung Kyu Lim. A Study of ThroughSilicon-Via Impact on the 3D Stacked IC Layout. ICCAD’09, November 2–5, 2009, San Jose, California, USA. [21] W. Rhett Davis, John Wilson, Stephen Mick, Jian Xu, Hao Hua, Christopher Mineo, Ambarish M. Sule, Michael Steer, and Paul D. Franzon. Demystifying 3D ICs: The Pros and Cons of Going Vertical. IEEE Design & Test of Computers, November-December 2005. [22] Kocher, Paul. Keynote Address at Hardware Oriented Security and Trust symposium. July 27, 2009. 47 [23] CPU-Tech company brochure for Acalis-872. [24] Mysore, Agrawal, Srivastava, Lin, Banerjee, and Sherwood. Introspective 3D Chips. International Conference on Architectural Support for Programming Languages and Operating Systems (ASPLOS). October 21-25, 2006. San Jose, CA. [25] King, Tucek, Cozzie, Grier, Jiang, and Zhou. Designing and Implementing Malicious Hardware. IEEE International Workshop on Hardware Oriented Security and Trust (HOST). 27 July 2009, San Francisco, CA. [26] Schneider, Fred. Enforceable Security Policies. ACM Transactions on Information and System Security, Vol. 3, No. 1, February 2000, Pages 30-50. [27] Anderson, James P. Computer Security Technology Planning Study. Technical Report ESD-TR-73-51, Air Force Electronic Systems Division, Hanscom AFB, Bedford, MA, 1972. [28] Sterne, D., Benson, G., Landwehr, C., LaPadula, L., and Sandhu, R. Reconsidering the role of the Reference Monitor. Proceedings of the Computer Security Foundations Workshop VII, 1994. Pages 175-176. [29] Denning, D. A Lattice Model of Information Flow. Fifth ACM Symposium on Operating Systems Principles, Austin, Texas, 19-21 November, 1975. [30] Harrison, Ruzzo, Ullman. Protection in Operating Systems. Communications of the ACM, August 1976. Volume 19, Number 8. [31] J.A. Goguen and J. Meseguer. Security Policies and Security Models. SRI International, 1982. [32] David D. Clark and David R. Wilson. A Comparison of Commercial and Military Security Policies. In 1987 IEEE Symposium on Security and Privacy. May 1987. [33] Hamming, Richard W. Learning to Learn: The Art of Doing Science and Engineering. CRC Press, 1997. [34] Song, Dawn, et. al. BitBlaze: A New Approach to Computer Security via Binary Analysis. ICISS 2008, pages 1-25. [35] Kemmerer, Richard. Shared Resource Matrix Methodology: An Approach to Identifying Storage and Timing Channels. ACM Transactions on Computer Systems. Volume 1, Number 3. August 1983, pages 256-277. 48 [36] Arvind, K. Asanovic, J. Emer. Massachusetts Institute of Technology Open Courseware 6.823: Computer System Architecture, Fall 2005. <http://ocw.mit.edu/OcwWeb/Electrical-Engineering-and-Computer-Science/6-823Fall2005/CourseHome/index.htm> Accessed 10 April 2010. License: Creative commons BY-NC-SA [37] Chen, C., and Hsiao, M. Error Correcting Codes for Semiconductor Memory Applications: A State-of-the-Art Review. IBM Journal of Research Development. Volume 28, Number 2. March 1984. [38] Avijit Dutta and Abhijit Jas. Combinational Logic Circuit Protection Using Customized Error Detecting and Correcting Codes. Ninth International Symposium on Quality Electronic Design. November, 2009. [39] IEEE Standard Test Access Port and Boundary-Scan Architecture. IEEE Std 1149.1-1990. Approved June 17, 1993. The Institute of Electrical and Electronics Engineers, Inc. ISBN 1-55937-350-4. [40] Chakraborty, Wolff, Paul, Papachristou, and Bhunia. MERO: A statistical Approach for Hardware Trojan Detection. Lecture Notes in Computer Science: Cryptographic Hardware and Embedded Systems 2009, pages 396-410. [41] Sgro, Joseph. Focused Ion Beam Milling of Semiconductors: An Imaging Challenge. Alacron, Inc. Accessed 12 May 2010. < http://www.alacron.com/downloads/Applications/FocusedIonBeam-062308.pdf> [42] Y. Kaiho, Y. Ohara, H. Takeshita, K. Kiyoyama, K.W. Lee, T. Tanaka, and M. Koyanagi. 3D integration technology for 3D stacked retinal chip. Proceedings of the IEEE International Conference on 3D System Integration, San Francisco, CA, September 2009. [43] D.L. Lewis and H.-H.S. Lee. Architectural evaluation of 3D stacked RRAM caches. Proceedings of the IEEE International Conference on 3D System Integration, San Francisco, CA, September 2009. [44] YuanYuan Zhou, Pin Zhou, Feng Qin, Wei Liu, And Josep Torrellas. Efficient and Flexible Architectural Support for Dynamic Monitoring. ACM Transactions on Architecture and Code Optimization, Vol. 2, No. 1, March 2005, Pages 3–33. [45] G. Edward Suh, Dwaine Clarke, Blaise Gassend, Marten van Dijk, Srinivas Devadas. Efficient Memory Integrity Verification and Encryption for Secure Processors. Proceedings of the 36th International Symposium on Microarchitecture, 2003. [46] B. Alpern and F. Schneider. Defining liveness. Technical Report TR 85-650, Dept. of Computer Science, Cornell University, October 1984. 49 [47] Miron Abramovici and Paul Bradley. Integrated Circuit Security - New Threats and Solutions. CSIIRW '09, April 13-15 2009, Oak Ridge, TN, USA. [48] Bowen Alpern, Fred B. Schneider. Defining Liveness. Information Processing Letters. Volume 21, Number 4. October 1985. Pages 181-185. [49] Leslie Lamport. Proving the Correctness of Multiprocess Programs. IEEE Transactions On Software Engineering, Volume SE-3, Number 2, March 1977. [50] David Lie, John Mitchell, Chandramohan A. Thekkath, Mark Horowitz. Specifying and Verifying Hardware for Tamper-Resistant Software. Proceedings of the 2003 IEEE Symposium on Security and Privacy. [51] X. Wang, M. Tehranipoor, and J. Plusquellic. Detecting Malicious Inclusions in Secure Hardware: Challenges and Solutions. Proc. IEEE Int’l Workshop HardwareOriented Security and Trust (HOST 08), IEEE CS Press, 2008, pp. 15-19. [52] Markoff, John. Old Trick Threatens Newest Weapons. New York Times, 27 Oct 2009. <http://www.nytimes.com/2009/10/27/science/27trojan.html?_r=2>. Accessed 6 June 2010. [53] Sharky, Brian. DARPA TRUST in Integrated Circuits Program, Industry Day Brief. 26 March 2007. [54] Hicks, Matt, Finnicum, Murph, King, Sam, Martin, Milo, and Smith, Jonathan. Overcoming an Untrusted Computing Base: Detecting and Removing Malicious Hardware Automatically. Proceedings of the 31st IEEE Symposium on Security & Privacy (Oakland), May 2010. [55] Ryan, Peter. Enforcing the Unenforceable. Security Protocols 2003, LNCS 3364, pp.178-182, 2005. 50 Appendix A. Alternatives to 3D: Other Mitigation Strategies Other strategies exist to mitigate the threat of malicious inclusions. They are complementary to the proposed research, and no one strategy alone will solve the problem entirely. A few of these are summarized below. U.S. Government Initiatives, Existing Mitigation Strategies On 2 March 2010, the U.S. Government declassified portions of its Comprehensive National Cyber Security Initiative (CNCI) [4]. Item number 11 of this initiative addresses the "Global Supply Chain." According to [3], the supply chain "has become so globalized that we can quickly lose track of where our technology is coming from. This creates opportunities for malicious actors to create backdoors, malware, and faulty hardware that make its way into our weapon systems, internet infrastructure, banking systems, and personal computing devices. The CNCI will require careful tracking of participants in the supply chain, which will steer more and more buyers towards Trusted Suppliers." The Trusted Foundry Approach In 2003, DoD established its Trusted Foundry program to help ensure that the U.S. maintains the capability to manufacture important processors for high-assurance DoD applications. Managed by the National Security Agency (NSA) and the Defense Microelectronics Activity (DMEA), this program had 29 accredited Trusted Suppliers as of March, 2010. DMEA is expanding its foundry certification program to include trusted evaluation of the design, aggregation/brokering, mask-making, assembly, and test phases of processor supply. [3] However, the Trusted Foundry program does not solve all of DoD's microchip supply requirements. The certified foundries will not be able to manufacture the full breadth of advanced processors desired by managers of DoD programs, and use of the Foundry system is not mandatory [2]. There are certified designers, like the one at Sandia, but Sandia's facility, for example, can currently only produce to the .35 micron level, well behind the state of the art [11]. Some DoD systems requiring the latest in circuit technology may not be able to obtain it all directly through the Trusted Foundry program. The Domestic Commercial Approach to Processor Trust Though it has been generally acknowledged that the U.S. lacks the capacity to domestically manufacture current-generation processors and secure the entire supply chain for the integrated circuits required by DoD and other high-assurance customers, there are some alternatives. For example, CPU-Tech recently introduced the Acalis family of secure processors. It features a tamper-resistant dual-PowerPC chip, fabricated within the Trusted Foundry program, with an accompanying development kit [23]. For the needs of some simpler applications, this type of system may suffice. As noted, though, the computational power 51 of products manufactured domestically has not kept pace, relative to systems made overseas, and our ability to produce integrated circuits in large volume has also diminished. In addition, this particular commercial system requires the use of proprietary hardware and closely-held proprietary software, limiting its utility. DARPA Trust in Integrated Circuits Program The Defense Advanced Projects Research Activity (DARPA) is sponsoring an ongoing competition to spur innovation in methods for detecting malicious inclusions in hardware. DARPA has contracted with MIT's Lincoln labs to design the malicious elements, USC's Information Sciences Institute to manufacture the chips, and Johns Hopkins University's Applied Physics Lab to assess the competitors' results. Teams from Raytheon, Luna Innovations, and Xradia participated in the initial round of the competition; publicly available contract data suggests that at least Raytheon has progressed to the third phase of the trial program [10]. As discussed above, however, external testing methods, though useful, are limited in terms of the type of malicious inclusions they will be able to detect. Existing Chip Testing and Its Limitations Many of the general challenges of traditional semiconductor verification are summarized in [9]: "Although commercial chip makers routinely and exhaustively test chips with hundreds of millions of logic gates, they can't afford to inspect everything... 'You don't check for the infinite possible things that are not specified,' says electrical engineering professor Ruby Lee, a cryptography expert at Princeton. 'You could check the obvious possibilities, but can you test for every unspecified function?' Nor can chip makers afford to test every chip. From a batch of thousands, technicians select a single chip for physical inspection, assuming that the manufacturing process has yielded essentially identical devices. They then laboriously grind away a thin layer of the chip, put the chip into a scanning electron microscope, and then take a picture of it, repeating the process until every layer of the chip has been imaged. Even here, spotting a tiny discrepancy amid a chip's many layers and millions or billions of transistors is a fantastically difficult task, and the chip is destroyed in the process. But the military can't really work that way. For ICs destined for mission-critical systems, you'd ideally want to test every chip without destroying it." Naturally, an exhaustive test of all possible sequences of chip execution is not possible. Even a test sequence for just a 64-bit integer arithmetic logic unit, such as multiplying 52 together all possible combinations of 64-bit unsigned integers, leads to (264)2 or 2128 possible combinations, and would not be feasible. Even at 210 operations per second, such a test would take 2118 seconds or, at around 225 seconds per year, around 283 years. And that's just for integer multiplication; testing all possible combinations of CPU opcodes and data fields, program counters, floating point operations, register values, cache movements, and interrupt flow, to name a few, and you would be testing essentially forever, and that's before even considering multi-core chips. As a result, experts from industry and academia have worked to develop intelligent, non-exhaustive methods for testing microprocessors for malicious inclusions; a few of these were described in an earlier section. Industry has focused a great deal of resources on proving the correctness of chip design, with respect to some specification. These proofs often use formal methods, taking advantage of theorem provers and model checkers. Although, as pointed out above, it is impossible to test every possible computational sequence in a chip in any reasonable amount of time, the formal methods approach allows us to prove certain properties about subsets of a processor's operation. For example, in [17], the authors use the PVS theorem prover to discover several bugs - differences between higher-level specification and actual circuit design - in an ARM processor. These methods, though important and useful, cover only the correctness of an integrated circuit's design; post-design phases in the supply chain are not covered. It is important to note, however, that some malicious changes inserted during the design phase could potentially be detected during formal verification of the circuit design. 53