16 AUGUST 2010
I Title and Introduction
Using 3D Circuit Integration to Detect Malicious
Inclusions in General Purpose Processors
Proposed Dissertation Statement
Hardware malicious inclusions in microprocessors present an increasing threat to U.S.
high-assurance computing systems, particularly those of the Department of Defense, due
to vulnerabilities at several stages in the acquisition chain. Existing testing techniques
are limited in their ability to detect these maliciously modified integrated circuits.
We propose a novel method, based on the evolution of three-dimensional (3D)
integrated circuit fabrication techniques and on execution monitor theory, by which
malicious inclusions, including those not detectable by existing means, may be detected
and potentially mitigated in the lab and in fielded, real time operation.
We propose to develop and implement techniques for detecting and mitigating hardware
malicious inclusions by utilizing 3D connections to monitor the control and data flows in
an untrusted, target commodity processor from a trusted attached processor called the
"control plane".
Research Goals
There are a number of potential new security-related applications of circuit-level three
dimensional (3D) architecture fabrication methods which provide certain novel
capabilities. Our research will focus on developing new techniques for identifying
hardware malicious inclusions, specifically those not detectable by existing methods, in
general purpose processors. To date, no other work has leveraged the capabilities of 3D
architectural techniques to identify malicious inclusions in processor hardware; existing
approaches are either destructive, or operate externally, and only during the test phase,
not during deployed use.
We will conduct experiments in support of an assessment of the feasibility of the 3D
approach to detecting malicious inclusions, specifically commenting on:
Which types of malicious inclusion that a 3D system is best, and least, able to detect
and mitigate
Which types of malicious inclusion that a 3D system can detect and mitigate that a
2D system cannot
How to most effectively mitigate the likeliest and most dangerous malicious
inclusions using the 3-D analysis approach
II Problem Description
Modern Weapons and High-Assurance Military Systems Rely on Microprocessors
Today's Defense Department relies on advanced microprocessors for its high-assurance
needs. Those applications include everything from advanced weaponry, fighter jets,
ships, and tanks, to satellites and desktop computers for classified systems. Much
attention and resources have been devoted to securing the software that runs these
devices and the networks on which they communicate. However, two significant trends
make it increasingly important that we also focus on securing the underlying hardware
that runs these high-assurance devices. The first is the U.S.' greater reliance on
processors produced overseas. The second is the evolution in the complexity of
hardware, along with the ease of making malicious changes to it.
Trusting the Supply Chain
Every year, more microprocessors destined for U.S. Department of Defense (DoD)
systems are manufactured overseas, and fewer are made inside the U.S. As a result, there
is a greater risk of processors being manufactured with malicious inclusions, or
"hardware Trojans," which could compromise high-assurance systems. This concern was
highlighted in a 2005 report by the Defense Science Board, which noted a continued
exodus of high-technology fabrication facilities from the U.S. [1]. Since this report,
"more U.S. companies have shifted production overseas, have sold or licensed high-end
capabilities to foreign entities, or have exited the business." [2]
One of the Defense Science Board report's key findings reads, "Throughout the past ten
years, the need for classified devices has been satisfied primarily through the use of
government owned, government- or contractor-operated or dedicated facilities such as
those operated by the NSA and Sandia. The rapid evolution of technology has made the
NSA facility obsolete or otherwise inadequate to perform this mission; the cost of
continuously keeping it near to the state of the art is regarded as prohibitive. Sandia is
not well suited to supply the variety and volume of DoD special circuits. There is no
longer a diverse base of U.S. integrated circuit fabricators capable of meeting trusted and
classified chip needs." [1]
Moving Fabrication Overseas
Today, most semiconductor design still occurs in the U.S., but some design centers have
recently developed in Taiwan and China [7]. In addition, major U.S. corporations are
moving more of their front-line fabrication operations overseas for economic reasons:
"Press reports indicate that Intel received up to $1 billion in incentives from the
Chinese government to build its new front-end fab in Dalian, which is scheduled to
begin production in 2010." [8]
"Cisco Systems has pronounced that it is a 'Chinese company,' and that virtually all of
its products are produced under contract in factories overseas." [2]
"Raising even greater alarm in the defense electronics community was the
announcement by IBM to transfer its 45-nanometer bulk process integrated circuit
(IC) technology to Semiconductor Manufacturing International Corp. (SMIC), which
is headquartered in Shanghai, China. There is a concern within the defense
community that it is IBM's first step to becoming a 'fab-less' semiconductor company.
IBM is the only state-of-the-art IC manufacturer that has a 'trusted' take-or-pay
contract with the Defense Department and the National Security Agency at its plant in
Vermont. Intel, the other cutting-edge U.S. integrated circuit maker, does not want to
do dedicated work for the U.S. government." [2]
The author of [9] notes, "almost all field-programmable gate arrays (FPGAs) are now
made at foundries outside the United States, about 80 percent of them in Taiwan.
Defense contractors have no good way of guaranteeing that these economical chips
haven't been tampered with. Building a kill switch into an FPGA could mean embedding
as few as 1,000 transistors within its many hundreds of millions."
In general, the large percentage of U.S. semiconductors manufactured in Taiwan is also
a longer-term concern because of the political uncertainty of future China-Taiwan
relations. In the case of political unification, which the U.S. may not be in a position to
prevent, China could hypothetically gain access to the manufacture of millions more
U.S.-bound processors in relatively short order, exacerbating supply chain concerns.
Processors - More Complex, Designed in Software, Modifiable After Manufacture
The Defense Science Board report observes, "Defense system electronic hardware ... has
undergone a radical transformation. Whereas custom circuits, unique to specific
applications, were once widely used, most information processing today is performed by
combinations of memory chips and programmable microchips ... Of the two classes of
parts, the latter have more intricate designs, which make them difficult to validate
(especially after manufacturing) and thus more subject to undetected compromise." [1]
Since modern processors are designed in software, the processor design plans become a
potential target of attack. John Randall, a semiconductor expert at Zyvex Corp., notes
that "any malefactor who can penetrate government security can find out what chips are
being ordered by the Defense Department and then target them for sabotage. If they can
access the chip designs and add the modifications, then the chips could be manufactured
correctly anywhere and still contain the unwanted circuitry. " [9]
In addition to the overseas fabrication threat, malicious design modifications could
theoretically occur either outside or inside the United States. According to IEEE
Associate Editor Sally Adee, "The Defense Department's assumption that onshore
assembly is more secure than offshore reveals a blind spot." Adds Samsung's Victoria
Coleman, "Why can't people put something bad into the chips made right here? " [9]
Such undetected logic can be inserted during the design phase, if malicious code is
inserted into the design template, or even after a chip has been manufactured. "Chip
alteration can even be done after the device has been manufactured and packaged,
provided the design data are available, notes Chad Rue, an engineer with FEI ... Skilled
circuit editing requires electrical engineering know-how, the blueprints of the chip, and
an etching machine (which) shoots a stream of ions at precise areas on the chip,
mechanically milling away tiny amounts of material ... You can remove material, cut a
metal line, and make new connections ... The results can be astonishing: a knowledgeable
technician can edit the chip's design just as easily as if he were taking 'an eraser and a
pencil to it.' " [9]
The "Kill Switch"
Though reports of actual malicious inclusions are often classified or kept quiet for other
reasons, some reports do surface, like this unverified account: "According to a U.S.
defense contractor who spoke on condition of anonymity, a 'European chip maker'
recently built into its microprocessors a kill switch that could be accessed remotely.
French defense contractors have used the chips in military equipment, the contractor told
IEEE Spectrum. If in the future the equipment fell into hostile hands, 'the French wanted
a way to disable that circuit,' he said." [9]
According to the New York Times, such a "kill switch" may have been used in the 2007
Israeli raid on a suspected Syrian nuclear facility under construction. The Times report
cites an unnamed American semiconductor industry executive, claiming direct
knowledge of the operation. [52]
High performance general purpose processors used in Department of Defense highassurance systems are increasingly being manufactured and assembled overseas. An
adversary with sufficient resources could maliciously modify a general purpose processor
at several different stages of the acquisition chain, from design and fabrication to
assembly and transport. As discussed in the following sections, our current ability to
detect and mitigate such malicious modifications in processors is limited, and therefore
new methods need to be developed.
III Description of 3D Integration Techniques
General Overview
In the last few years, hardware manufacturers and scientific researchers have been
studying methods of connecting silicon-based computational circuits in non-traditional
ways. Up until now, integrated circuit manufacturing has been limited to designs that are
essentially two-dimensional. Increasing the number of circuits per unit area has required
decreasing the size of the features in the circuit. However, techniques for decreasing
feature size are approaching their theoretical physical limits. New circuit interconnection
methods under development allow two or more computational planes, each of them an
essentially 2D structure, to be interconnected, allowing them to form a composite, threedimensional computing structure.
The most immediate benefits of this technology relate to speed, time, and distance. At
current computing speeds, electrons can only move a limited distance in one clock cycle.
Admiral Grace Hopper was famous for demonstrating the distance that electromagnetic
energy can travel in a nanosecond by showing off pieces of wire ("nanoseconds") just
under a foot in length [19]. The farther away an external memory cache sits from the
processor, for example, the more clock cycles it will take to conduct a memory
transaction between them. In [20], the authors demonstrate reductions in the average
wire lengths within a circuit, when implementing it with 3D technology, as compared to
traditional 2D technology only.
There are several different technologies under consideration for 3D integration. One
promising method involves the creation of "vias", which are direct metal connections,
much like ordinary wires. Since they will normally travel through a silicon plane, such
they are often referred to as "through-silicon vias", or TSVs [20], which are also
informally referred to as "posts".
Other possible 3D connection technologies include so-called "wireless superconnect",
"wire bonding," and "multi-chip modules" [6], as well as connection techniques relying
on electrical inductance. A survey of some of the techniques for 3D interconnects under
development is presented in [21].
A survey of various 3D fabrication techniques from [21].
Terminology. In describing our approach, we will often use the following terms:
3D interconnect - a connection between one integrated circuit and another integrated
circuit, each manufactured separately but attached during a later process. Sometimes we
will informally call these "posts", independent of the attachment technique.
3D fabrication technique - any technique, from the above descriptions or otherwise, for
joining two or more integrated circuits together at points within their computation circuits
(not just along their edges).
3D security (or 3Dsec) - a security-oriented application of the 3D interconnect methods
above, involving two integrated circuits: one untrusted target integrated circuit,
sometimes referred to as the "computation plane," and one trusted integrated circuit,
sometimes called the "control plane," which monitors and/or modifies the behavior of the
Malicious Inclusion1 (MI) - an unauthorized modification to an integrated circuit that can
cause the circuit's behavior to deviate from its specified functionality. Deviations may
Most of the hardware-oriented literature refers to malicious modifications as "hardware Trojans". The
term "Trojan" is normally associated in the Computer Science literature with an attack that requires a naive
action of acceptance by the victim, such as opening a link in an unsolicited e-mail. Since hardware
modifications are covert and the victim is usually unaware, we will use the term "malicious inclusions"
rather than "hardware Trojans" to preserve this distinction.
include, but are not limited to, unauthorized shutdown or impairment of the circuit,
subversion of the circuit's functions to facilitate an attack on its running software, or
corruption or compromise (leakage) of data passing through the integrated circuit.
Using 3D Technology for Security
Though a great deal of research has been done on the potential performance benefits of
3D integration, such as connecting an external memory cache, relatively little attention in
the industry has been focused on the potential for using 3D technology to enhance
security for high assurance users.
The main ideas for using 3D technology in the security context are identified and
outlined in [6]:
By fabricating the control plane with functions that are complementary to
(but separate from) the main processor, stacked interconnect offers the
potential to add security mechanisms on just a small subset of devices
without impacting the overall cost of the main processor. Just to be clear,
we are advocating the fabrication of a processor which is always
fabricated with connections built in for security (via an optional control
plane chip). The difference between the system sold to the cost-sensitive
consumer and the one that is sold to the high-assurance customer is only
whether a specialized security device is actually stacked on top of the
standard IC or not...
A security overlay also provides the freedom to place specific security
mechanisms directly above where they are needed ... For a given device
type, reconfiguration of the security policy mechanisms can be
implemented, thus efficiently supporting different user requirements. An
overlay also provides several clear theoretical benefits. As always, it is
critical to protect security mechanisms, but in this case they may be much
less prone to tampering as they are when they are entangled with the
monitored design."
The computation and control planes would be constructed separately (with the
interconnect locations specified in the design), then connected later, in a separate process,
as in the following diagrams:
Two possible arrangements of the computation and control plane integrated circuits.
Assumptions and Viability
For the purpose of our investigation, we will assume that economically feasible
techniques for connecting two or more integrated circuits will continue to develop. The
particular method of interconnection that wins industry favor is not relevant to our
approach, as long as it meets several criteria:
The time it takes for an electrical signal to propagate and stabilize across an
interconnect is sufficiently short. For example, in [24], Mysore, et al., perform a
detailed analysis of a 3D interconnect system that requires only a single clock buffer,
and hence only one cycle of latency, to facilitate 3D monitoring.
Heat dissipation technology is sufficient to allow the passthrough of both data and
power signals across the interconnects.
The number of total interconnects that could be produced to facilitate control-plane
monitoring will be sufficient (we will examine the approximate number of required
interconnects as part of our research). A simulation in [20] illustrates the practicality
of using on the order of 1,000-10,000 through-silicon vias (TSVs), for example.
The viability of many of the physical assumptions underlying the 3D security approach
was demonstrated in [24], in which the authors modeled a Pentium 4 computation plane
being monitored, using 1,024 3D interconnects, by an XScale ARM processor. The
authors used a variety of modeling techniques to demonstrate:
An increase to the (computation plane) commodity processor of only .021mm2 in area
and 1.4% in power, as a result of adding the 3D connection points.
An increase in power to drive data across the interconnects, with the monitoring plane
attached, of 23%, with the potential for reducing that to around 8%.
3D hardware monitoring can be performed with significantly less power and shorter
wires, compared to a comparable 2D monitoring scheme, because the monitoring
plane can be placed much closer. In the simulation, the 3D approach consumes half
the increased power compared to the 2D approach, and a twentieth of the increase in
the area imposed on the computation plane.
Even using the worst case thermal assumptions, tiling eight analysis chips on top of
the computation chip only led to a temperature increase of about 2.5C.
Relationship of 3D to Other Monitoring Approaches
Why not implement the monitoring logic right in the computation chip itself?
We are operating under the assumption that the commodity computation chip may be
from an untrusted source, and anyone with sufficient access to modify the processor
could also modify the monitoring logic. By adding the monitoring logic separately, via
3D integration, the monitoring logic can come from a more trusted source, be
reconfigurable, and be isolated from threats to the computation plane during most of the
development cycle.
Why not put the monitoring logic in a coprocessor?
A security coprocessor could use some of the same techniques we will explore, but is
limited by the bandwidth and fidelity of the target processor's main connection to the
printed circuit board. The 3D approach permits finer-grained access to the key
architectural nodes within the target processor. Also, the 3D approach has the potential
for performing the same security functions with shorter wires and lower power
requirements [24].
Why not just run some comparable monitoring logic in a hypervisor?
Any malicious inclusion in hardware has the potential to override the controls in the
software above it, no matter how secure that software is. The emerging threat to the
security of hardware must be dealt with by either by isolating the threat entirely (for
example, by full control over the supply chain), or by detecting and mitigating the threat
in hardware.
Why not just obfuscate the hardware design?
Obfuscation techniques have proved valuable in protecting intellectual property in
software, and more recently in reconfigurable hardware and system-on-chip (SoC)
designs [40]. However, in commodity microprocessors, such as those we consider in the
3D scenario, the performance cost of obfuscation techniques, in speed and power, will
limit their applicability in commodity designs. That said, a microprocessor whose design
has been obfuscated, but whose 3D posts make the same logical connections as the
unobfuscated design, could be monitored from a control plane in the same manner as the
unobfuscated processor. In this sense, obfuscation and 3D monitoring are
complementary, as far as deterring malicious inclusions - obfuscation makes malicious
inclusions more difficult to implement, and 3D monitoring techniques could still be used
to detect them.
Couldn't the adversary just design a malicious inclusion that circumvents the 3D posts?
It is true that an adversary, with sufficient knowledge of the processor design, the 3D
post locations, and the monitoring scheme, could design a malicious inclusion which
performs its malicious function and either bypasses the posts or modifies the signals
being monitored. We know of no software or hardware technique for precluding an
adversary with all this information from doing so. However, it would necessarily make
the adversary's task much more challenging. Combining 3D security with obfuscation of
the target processor's design adds another layer of difficulty for the adversary.
IV Security Policies, Level of Abstraction, and Threat Model
Bits are Bits
In any computing system, different concepts live at different levels of abstraction.
High-level software constructs, such as those found in object-oriented programming, may
or may not be meaningful to the operating system kernel. Low-level software constructs,
such as a small loop performing an iterative computation over an array, will not be aware
of object-oriented concepts, or the subjects and objects defined in an operating system
security policy, for example. Similarly, concepts defined in software, like a dynamically
linked code library, or even functions or types [34], may not have meaning at the
processor level, in hardware. A passage from Dr. Richard Hamming's book "Learning to
Learn: The Art of Doing Science and Engineering" emphasizes this point:
We see that the machine does not know where it has been, nor where it is going to go;
it has at best only a myopic view of simply repeating the same cycle endlessly.
Below this level, the individual gates and two-way storage devices do not know any
meaning - they simply react to what they are supposed to do. They too have no
global knowledge of what is going on, nor any meaning to attach to any bit, whether
storage or gating ... it is we who attach meaning to the bits (emphasis in original).
The machine is a machine in the classical sense; it does what it does and nothing else
Security Policies and Principles - A Higher Level of Abstraction
In light of the previous discussion, we mention a few popular security models and
concepts, and some of the characteristic constructs which comprise them, in order to
show that they necessarily exist above the hardware level of abstraction.
Basic access control safety policies: Subjects, Objects, Authorizations
Lattice-Based Information Flow Policy: Objects, Subjects, Security Classes [29]
Noninterference Information Flow Policy: Users, States, Commands, Outputs [31]
Integrity Policy: Users, Constrained Data Items, Transformation Procedures [32]
Reference Monitor Concept: Subjects and Objects [27]
In each case, at least one of the constructs on which the policy is defined, such as
subject or object, is defined at the software level of abstraction. Though the constructs in
a processor, such as a memory word, an interrupt, or an executing instruction, may be
supporting one of these higher-level constructs, the processor has no built-in awareness
of what is represented by them at the higher level. Though some work has been done on
using hardware support mechanisms to help facilitate the enforcement of security policies
at the software level [44,45], those methods are orthogonal to the ones explored here.
Our work focuses on using one processor to detect malicious changes made to another
processor, using the target processor's design specification as a baseline for what
constitutes non-malicious behavior.
Covert Channels
One way a safety policy or an information flow policy can be violated is through
exploitation of a covert channel. A covert channel is a conduit through which
information can be conveyed from a process operating at a higher sensitivity level to a
process operating at a lower sensitivity level. There are two basic types of covert
channels, storage channels and timing channels. In [35], Kemmerer defines conditions
necessary for covert channels and gives a structured methodology for identifying where
they could potentially occur. The minimum requirements for a storage channel are
identified as:
The sending and receiving processes must have access to the same attribute of a
shared resource.
There must be some means by which the sending process can force the shared
attribute to change.
There must be some means by which the receiving process can detect the attribute
There must be some mechanism for initiating the communication between the
sending and receiving processes and for sequencing the events correctly. This
mechanism could be another channel with a smaller bandwidth.
And the minimum requirements for a timing channel are identified as:
The sending and receiving processes must have access to the same attribute of a
shared resource.
The sending and receiving processes must have access to a time reference such as a
real-time clock.
The sender must be capable of modulating the receiver's response time for detecting
a change in the shared attribute.
There must be some mechanism for initiating the processes and for sequencing the
We note here that covert channel analysis, while not a formally specified security
policy, is an important security technique. It is also important to observe that, while the
processes which communicate with each other via covert channel exist in software, the
shared resource attribute can exist at either the software or the hardware level.
Semantically, we might find it useful to distinguish between a covert channel, which is an
implementation of a timing or storage channel as described by the Kemmerer criteria
above, and a covert channel mechanism, which is the shared resource attribute through
which the communication occurs. In this sense, the covert channel is the end-to-end
communication, including the processes, while the covert channel mechanism is solely
the medium through which communication occurs. Because hardware is primarily
process-unaware2, 3D monitoring techniques, which only observe hardware functionality,
are therefore constrained to looking for a shared resource attribute or covert channel
mechanism only.
One note about the interface between a hardware covert channel mechanism (or "shared
resource attribute") and a software process which tries to exploit it is that the covert
channel mechanism must at some point pass through a "process-visible", or "software
readable" part of the architecture, in order to meet the first criteria in each of Kemmerer's
lists, above. Therefore, intuitively, covert channel prevention by a 3D system would
involve monitoring all the process-visible elements of the architecture, and ensuring some
type of flushing of all process-visible content during context switches, if the environment
assumes that multi-level processes will be sharing execution time on the same processor,
and that the sensitivity level of a process may be conveyed to the processor by the OS.
Since the modified attribute must at some point be process-visible in order to be exploited
by software, it also stands to reasons that monitoring non-process-visible processor
elements will be unnecessary; i.e., monitoring all the process-visible elements is both
necessary and sufficient, in terms of addressing this type of attack.
However, it is important to note that the things that are observable at the process level
include not only registers and flags which can be read directly using the instruction set
interface, but also internal features whose values can be indirectly deduced. For example,
a process with access to the system clock might be able to infer that a cache miss has
occurred during a memory reference, or that a branch was not successfully predicted by a
branch predictor [53,54], thereby gaining knowledge of the state of the cache or the
branch-prediction buffer.
Some processors, like those of the Intel IA-32 architecture, provide hardware support for context
switching, using features like the Task State Segment (TSS). These hardware features can facilitate
resource (e.g., memory and I/O) assignment and bounds checking for a process, but they do not constitute
the type process awareness we associate with the operating system, which can explicitly access the value of
any resource being used by a software process.
A description of a 3D approach to covert channel mitigation was described for memory
caches in [6], but this type of experiment as applied to an entire general purpose
processor is beyond the scope of the proposed research, and is left to future work.
Side Channels
In this context, it is important to differentiate covert channel attacks from side channel
attacks. Side channel attacks use some property, often a physical property such as heat or
electricity, external to the system itself, in order to gain information about the system.
An example is externally evaluating the electromagnetic characteristics of a circuit while
the circuit performs cryptographic computations, in order to deduce some properties of
the unencrypted data or the encryption key. Side channel attacks and analysis are
important in hardware security, but orthogonal to the proposed investigation.
The Processor Design Life Cycle
A general-purpose processor's life cycle spans many phases, as summarized in the
following chart:
The potential for malicious modification varies from stage to stage. For example, some
processors are designed and verified in facilities certified to be "trusted". This does not
make their designs invulnerable, but gives us relatively greater confidence in their fidelity
at this stage. However, the physical fabrication process, especially for high-performance
processors, is largely beyond the control of DoD, for the latest-generation processor
technology. At the other end of the design cycle, installation and operation, DoD and
other high-assurance users will again normally have tight control of a processor's fielded
environment. In 2007, DARPA provided industry with a subjective assessment snapshot
of the relative risk in each of the phases [53]:
As discussed in a later section, existing methods for detecting malicious inclusions are
primarily based on detecting physical changes in the power and timing characteristics of
a processor, as observed from its input and output ports. These techniques rely on
possession of a known good, or "golden", sample processor, which acts as a baseline,
against which other processors are judged. This approach focuses on detecting changes
made to a processor in the design or fabrication stage, but it will not detect an early
design change that makes it way into all the processors in a production run, since the
"golden" model would also be affected.
More recently, Hicks, et al., outlined a technique for detecting some malicious designstage modifications [14]. In their approach, called Blue Chip, the high-level design is
analyzed for potential malicious inclusions, and a combination of hardware and software
uses interrupts to handle the potential threats. In this approach, the malicious change
must already be present in the high-level design; if it is introduced afterward, Blue Chip
will not detect it.
Our research will attempt to identify ways of extracting sufficient information from both
the architectural design specification and the high-level processor design as a basis for
parallel construction of a 3D monitor - both its interconnects and its operation. In this
way, malicious inclusions introduced in the low-level design phase, as well as the
fabrication, assembly, and distribution phases, may be detected.
The Threat Model
In theory, a malicious modification to a processor's design could be located almost
anywhere, but in practice it will be governed by several limiting factors:
The larger the modification, the easier it will be to detect. Existing work has
successfully demonstrated nondestructive detection of some malicious inclusions
which occupy as little as around .1% of the total processor area. [12,13,14]
When using post-design modification techniques like focused ion beam (FIB) milling,
it is very difficult and time consuming to make a large number of edited connections
over widely dispersed elements of a processor without being detected. [41] FIB
milling is also very challenging at state-of-the-art feature sizes.
Therefore, we will focus our efforts on detecting malicious inclusions that are relatively
small in size, local in scope, and that target those key circuits related to the expected
subversive intent. We assume that the adversary's primary goals will depend on the
hardware-hosted application, but are likely to center on either extraction of information
or denial of service. We also note that, due to the presence of triggers and delayed
activation [16], we can make no a priori assumptions about when a malicious inclusion
will become active or inactive.
For the purpose of this research, we will not consider attacks which reveal themselves
only physically, such as thermal attacks, but rather those which reveal themselves
logically, at the architectural level.
Several researchers have described malicious inclusions, or hardware Trojans, in
taxonomy form. Tehranipoor and Koushanfar summarized these efforts in [16]. Our
preliminary analysis suggests that malicious hardware can fit into more than one action
In the next section, we discuss several methods for employing 3D monitoring
techniques to counter the specific types of attacks that could be employed by a malicious
inclusion. The focus of our experiments is on the first monitoring method discussed, a
novel type of execution monitor governing the correctness of instruction-set execution.
V 3D Security for General Purpose Processors
The components of a simple general-purpose processor are generally classifiable
according to their function. For example, a circuit in a microprocessor may participate in
control-flow execution (participate in fetch-decode-execute-retire), be part of a data path
(like a bus), execute storage and retrieval (like a cache controller), assist with control, test
and debug (as in a debug circuit), or perform arithmetic and logic computation (like an
arithmetic-logic circuit, or ALU). This list may not be exhaustive, and some circuits'
functions may overlap, but broadly speaking we can subdivide the component circuits in
a processor using these classifications:
Control Flow Circuits
Data Paths
Memory Storage and Retrieval Circuits
Arithmetic and Logic Computation Circuits
Chip Control, Test, and Debug Circuits
The main focus of our research will be the detection of malicious inclusions in the first
category, control flow circuits. However, in considering processor malicious inclusions,
it is worth noting that in some cases a detection strategy is warranted, and in others a
mitigation strategy may be preferable. Of course, if a malicious inclusion is detected, we
will normally want to follow a detection of a malicious inclusion with some type of
mitigation. The following table lists each of the circuit functional types from above, and
pairs it with a potential 3D detection and/or mitigation strategy:
Circuit Type
Control Flow
Detection/Mitigation Technique
Control Flow Execution Monitor
(planned subject of experiments)
Chip Control, Test, and Debug
Data Paths
Memory Storage and Retrieval
Arithmetic and Logic Computation
Keep-Alive Protections
Datapath Integrity Verification
Load/Store Verification
Arithmetic/Logic Verification
We can associate these techniques with the malicious inclusion taxonomy from the
previous section:
Our research will describe all five techniques, with emphasis on the execution monitor
and keep-alive protections (relative to the chip's control, test, and debug circuits), which
we view as addressing many of the more serious potential threats. Our experiments will
demonstrate an implementation of the execution monitor, which governs the operation of
the instruction set of a general-purpose processor.
An Execution Monitor for Instruction Execution
In a general-purpose processor, we ask the question: what does it mean for the execution
flow to be "correct"? One way of characterizing the execution flow in a general-purpose
processor is by the action of the control circuits. In general, we characterize the
execution control circuits as the ones which carry out the implementation of the
instruction sequence. In other words, some circuits in a processor perform their functions
independently of the sequence of what instructions are loaded into the instruction register,
and other circuits are activated or deactivated in unique combinations which depend on
the sequence of instruction codes coming into the processor; our focus is on the latter
In the proposed research, we plan to explore this dependency characterization more
thoroughly, to see if the process of automatically identifying execution control circuits
can be generalized to apply to all general-purpose processors, from their high-level
design. Next, we consider a small example, to see whether the identification of such
circuits lends itself to a particular monitoring and enforcement strategy.
Example Architecture
The following example architecture is a bus-based MIPS architecture, given in MIT's
Open Courseware materials, based on its architecture course [36]:
In this example architecture, we have an instruction register, an arithmetic-logic unit
(ALU), a set of 32 registers, and some on-chip memory. They share access to a common
bus, and only one unit at a time may transmit a signal to the bus (more than one may read
data from the bus). Each unit has an enable signal which gives it access to and from the
bus. The memory and registers also have write signals, which permit the memory or
registers to be written to. If the enable signal is high but the write signal is low, a read
operation is assumed. For this architecture, all words are 32 bits, and there are 32
registers. The basic MIPS load-store instruction set is assumed. Also, this architecture
maintains the program counter (PC) in a special register, number 31. The ALU performs
basic operations like add, subtract, shift left/right, etc.
In this simple example, we do not consider the additional complexities of a pipelined or
superscalar execution, register renaming, out-of-order commit, speculative execution, or
interrupts. Also, for simplicity, we assume in the example that the results of ALU and
memory operations are immediately available (loaded on one cycle, and read the
following cycle).
The microcode representation in the chart is a way for us to analyze the inner workings
of the expected flow, based on what instruction is issued. In the microcode
representation, we set a control signal to 1 if it is enabled, or high, and set it to 0 if it is
disabled, or low. We set it a * to indicate "don't care," which in this context means that
the signal could be either 0 or 1 (or floating), without affecting the correctness of the
commanded operation.
All of the instructions begin with the microcode for the "fetch" operation, and then
proceed to different flows, such as add, shift, etc., based on the instruction named. In a
more complex architecture we could add to the "fetch" state by putting in other states
representing what happens during "interrupt," "retire," "commit," or "write back," etc.,
but for now we consider only "fetch" and an example ALU operation.
Here we differ from the source diagram by coloring the execution control circuits blue.
Why are they colored blue? Their operation, as illustrated in the microcode diagram, is
dependent on the instruction being executed. Stated informally, data wires and other
black (non-execution-control) wires perform their functions regardless of the instruction
and state of the fetch-decode-execute-retire cycle, whereas the blue (execution-control)
wires function in a manner dependent on the micro-state. It is this fact that we seek to
leverage from a monitoring standpoint, relative to a given class of MIs that target
elements of the execution control flow. In the example, we have selected the control
wires, and colored them blue, based on observation, in the absence of any formal
methodology. One of our research tasks is to examine whether an automatic general
procedure can be developed to, in all general purpose processors, unambiguously
distinguish the execution-control circuits from the non-execution-control circuits, in
support of 3D monitoring.
Suppose that we want to use this bus-based MIPS architecture to add the contents of two
registers, and place the result value in a third register. In this architecture, registerregister operations always use registers rs and rt as the sources, and register rd as the
destination. In the following diagrams, we illustrate how the operation is carried out, as
the microcode representation spells out the action of the execution-control circuits.
Leak Attack
Now consider how a malicious inclusion could potentially modify the control flow of
this register-register operation. In this case, we design a small malicious inclusion that is
able to read ALU inputs A and B; when it observes special trigger values on those
circuits, it causes the arithmetic result to "leak" to a secret predetermined memory
address by accessing the load_MA, Mem_Wrt, and En_Mem signals. It might look
conceptually like this:
The register-register operation now executes the same way in steps one and two, but in
the third step, the malicious inclusion is activated:
Note in the chart the highlighted deviation from normal execution in the load_MA,
Mem_Write, and en_Mem signals. Also note that when we considered only the
correctness of the specified operation itself, we were often able to put a * (don't care)
label for what happens to ancillary control signals during a micro-operation. However, in
the context of detecting malicious inclusions, we are also concerned with preventing
unspecified additional functionality from occurring. Therefore, it appears that we would
want to eliminate the use of * and instead specify a definite 0 or 1 (usually 0) for
ancillary control signals.
But how do we monitor and/or enforce the correct and complete operation, once it's
been specified? Suppose that, using 3D interconnects, we are able to monitor the value
of the control signals, from the 3D plane, and detect when execution deviates from the
specification, as described by the microcode state. In the following diagrams, we denote
with a green lightning bolt the signals that we might monitor.
Now the execution proceeds as before, but with the monitoring in place:
Now the incorrect control signals in load_MA, Mem_Write, and en_Mem are detected in
the third step. This type of attack is unlikely to be detected through normal verification
methods (which might, for example, try all possible ALU operand combinations and
check for the correct processor state after each operation) because of the computational
challenge of trying all 264 potential combinations of (32-bit) ALU operands in the test set.
In this simple example, the result of a register-register arithmetic operation was leaked
to memory. Given a more complex circuit which includes privilege levels, privileged
instructions, and a more powerful instruction set, more nefarious attacks are naturally
possible. [25]
Timing Requirements
Note that the signals to be monitored must be able to be measured during some
synchronous interval. Circuits operating at independent speeds or asynchronous clock
cycles will probably not be able to participate in the same monitoring group without some
additional complexity. Therefore, during our research we will examine signals grouped
in such a way that each group may be sampled synchronously:
In addition, at the physical level the 3D monitoring circuit must sample a signal after it
is stable and any gate delay has completed:
How Execution Monitoring Might Look from the 3D Control Plane
In the control plane, given the availability of a general-purpose processor, we could
write a simple program to look for unspecified relationships among control signals. But
the control plane logic, if it is complex, might not be fast enough to keep up with the
computation plane microcode, or a general-purpose processor might not be available. We
can construct a representation of the transition logic for the microcode using a finite
automata, which executes on a circuit in the control plane:
In this example, the execution begins in the state labeled FETCH_0, and for the first
step proceeds according to the instruction in IR. For subsequent transitions, the signals
observed in the execution control circuits must match the ones expected, as indicated by
each individual row in the microcode diagram. If any non-matching set of signals is
observed, the automata enters a non-accepting state and remains there until it is reset by
the control plane monitor, or the processor is reset. The control plane, on observing the
FAULT state, would conclude that a violation of expected execution flow has occurred,
then take some appropriate corrective action. Depending on the implementation, that
action could include disabling the processor, invoking a failsafe mode of operation, or
simply notifying the operating system in some way.
Relationship to Execution Monitor Theory
The DFA in this example has a defined start state, FETCH_0, but it differs from an
ordinary DFA in one important aspect, namely the need to consider infinite-length inputs,
since processor execution is unbounded. Büchi Automata are a special class of automata
that allow for infinite-length inputs. For a Büchi Automata to accept an infinite length
input, it must revisit at least one accepting state an infinite number of times. By
observation, we can see that if we construct the automata as in the example above, the
FETCH_0 state meets this criterion, since FETCH_0 state will be visited an infinite
number of times if and only if the automata accepts an infinite-length input sequence. As
long as the automata remains in an accepting state, the execution-flow predicate is
satisfied; if it ever enters a non-accepting state, it will never accept the input sequence
(there will be no defined transitions out of a non-accepting state like FAULT to an
accepting state), and the execution-flow predicate is violated.
We note that automata like the one described in this example meet the criteria of a class
of Büchi Automata called security automata, enforcing safety properties, as outlined by
Schneider in [26]. Therefore, if we define our security policy P as "the execution control
circuits assume sequential values only in accordance with the transitions permitted by the
microcode specification," then P is enforceable by an execution monitor, using the
security automata defined by the microcode representation. We believe that the proposed
work will be the first to describe the of use of a security automata, as defined in [26], in a
3D execution monitor, and these notions will be formalized in our report.
In a recent industry presentation, Abramovici and Bradley describe in general terms a
somewhat similar approach for a 2D system-on-chip (SoC) design [47]. They propose a
system using signal probe networks (analogous to the 3D posts described here) and
security monitors, local on-chip reprogrammable security logic units which operate based
on the signal probe network inputs. They mention that such a system could employ
customer-specified finite-state machines in the security monitors, but they offer no
further details on how the probe networks or security monitors should be constructed or
should function.
It remains to be demonstrated in our research that the concepts we have applied in this
small example may be applied in the general case, and under what conditions, along with
establishing criteria for completely and unambiguously identifying the execution control
circuits, as well as the automata definition. We plan a demonstration of this approach in
our processor simulation, showing that malicious inclusions that cause deviations from
instruction-specified microarchitectural flow can be detected using 3D interconnects and
a stateful execution monitoring system, such as the one described in the example.
Keep-Alive Protections
An adversary could also use MIs to disable an integrated circuit very directly. In
modern processors, there are many opportunities for what could be described as "zero
cycle" attacks, meaning there would essentially be no advance warning.
Control, Test, and Debug Circuits
A malicious inclusion could, for example, shut down a processor altogether. An
example is the use of a test port, like those specified by the IEEE Standard 1149.1,
Standard Test Access Port and Boundary-Scan Architecture [39]. Boundary scan
functionality is found on most general-purpose microprocessors today. It can include
both standard functionality, which is common across all devices, and proprietary
functionality, which a manufacturer can build in, often without official documentation.
During test and development, the boundary scan input/output pins can be connected to
external test equipment. When the processor family is fabricated, the circuitry in the
microprocessor that supports the test functionality is still present. So, even if there is no
external test equipment connected to the boundary scan input pins, there are still
corresponding circuits on the chip that could carry out that functionality, such as halting
or resetting the processor, if triggered - an easy denial of service attack avenue for a
knowledgeable adversary to exploit through the use of malicious inclusions.
Other Disabling Attacks
Denial of service attacks need not derive only from control-test-debug circuits. For
example, a malicious inclusion, once active, could access a processor's interrupt
mechanism, repeatedly invoking false interrupts that were not caused in the manner
specified, such as an actual input/output (I/O) transaction interrupt. Another method
might be for a malicious inclusion with access to the input to the instruction register (IR)
to override the intended instruction opcode with repeated no_op instructions, causing
meaningful execution to be overwritten. Many other potential attacks may be imagined.
Liveness, Availability, Execution Monitors, and Keep-Alive Protections
Keeping a circuit functioning normally, so that it is kept available for the intended use,
is often referred to by the term "liveness". As introduced by Lamport in [49] and
formally defined by Alpern and Schneider in [48], liveness informally means that
"something good eventually happens." Alpern and Schneider elaborate:
Examples of liveness properties include starvation freedom, termination, and
guaranteed service. In starvation freedom, which states that a process makes progress
infinitely often, the "good thing" is making progress. In termination, which asserts
that a program does not run forever, the "good thing" is completion of the final
instruction. Finally, in guaranteed service, which states that every request for service
is satisfied eventually, the "good thing" is receiving service.
However, they go on to point out that, while EM enforcement mechanisms enforce
security policies that are safety properties, "availability, if taken to mean that no principal
is forever denied use of some given resource, is not a safety property - any partial
execution can be extended in a way that allows a principal to access the resource, so the
defining set of proscribed partial executions that every safety property must have is
This is certainly true in the case of a general-purpose processor. For example, the
liveness policy statement "all memory read requests are eventually fulfilled" is never
violated, since the possibility always exists that a pending read request will be fulfilled at
some future time.
As pointed out by Ryan, even defining liveness, or "availability," in terms of a limited
time window does not make availability policies EM-enforceable: " Note that even
though time-limited availability can be characterized as a trace property, it still cannot be
enforced by an execution monitor. An execution monitor, by definition, can only block
actions and cannot force the target system to perform an action." [55]
In light of the previous discussion, rather than considering an execution monitor in this
category too, we plan to consider how the 3D control layer may be used to employ simple
"keep alive" circuitry, which either disables or overrides the potentially offending
control-test-debug circuits, which are susceptible to malicious inclusions, in the
computation plane.
We plan to discuss several of these potential denial of service attacks in more detail, and
describe ways that 3D control plane keep-alive circuits could be used to mitigate them.
This section of the proposed work does not include an experimental demonstration, and is
independent of the execution control circuit experiments discussed earlier.
Memory Storage and Retrieval
A typical general-purpose processor will have some small amount of on-chip memory,
such as an L1 cache. Informally, we can adopt a description of correct memory operation
from Suh, et al.: "Memory behaves correctly if the value the processor loads from a
particular address is the most recent value that it has stored to that address" [50]. A
service in the 3D control plane might be used to validate the process of storage and
3D Memory Verification Services
One-way functions, like hash functions, are easy to compute in the forward direction but
computationally complex to reverse. A great deal of research has established the value of
hash functions for verifying the integrity of memory transactions [45,50]. When storing a
small block of memory, we can separately store the hash of the memory block's value.
Later, when the block is retrieved, we can compute the hash value of the retrieved block,
and compare it with the stored hash. If the values do not match, then either the memory
block, the hash function, or the stored hash value has been modified. In one potential
scenario, the hash function and the stored hash value are kept in the 3D control plane, as
a service. We assume, when later reading a memory block that has been stored, that a
hash mismatch indicates the memory block was modified since it was last stored. A
diagram of this type of service might looks like this:
In this simple example, the 3D interconnects include the memory address, the read and write enable signals,
and the memory word, both into and out of the storage unit. When a store operation is commanded, the
control plane unit computes the hash of the incoming memory word and stores it at the associated address.
When a load operation is commanded, the control plane unit computes the hash of the outgoing memory
word and compares it to the hash stored the associated address, if a valid hash exists for that location.
Note that the amount of storage in the control plane can be reduced by using a hash
function that maps to a smaller domain size; however, we should avoid making it too
small, to reduce the chance that a memory block modification goes undetected due to a
hash collision.
Error Correction in Memory
Another important tool in considering the correctness of memory operation is the use of
error correction codes (ECCs). A malicious memory modification, affecting only the
value of some on-chip memory, may be detectable and even correctable using existing
ECCs, which are fielded in many dedicated memory chips today. There are many ECC
techniques in use [37] in memory; in our untrusted processor scenario, some or all of the
error-correction circuitry might be relocated to the control plane, to keep it from being
Our research will describe the general application of these notions, illustrating how the
3D control plane might be used to provide verification services for untrusted processor
memory, at a reasonable cost in terms of interconnects and monitoring logic.
Arithmetic and Logic
Arithmetic and logic in a general-purpose processor takes several forms. It may include
compositions of logic circuits for decision making, or perhaps a full floating-point
computation unit (FPU) or an integer arithmetic-logic unit (ALU).
Logic Circuits
In general, the problem of ensuring correct operation of logic circuits is comparatively
challenging, as noted by Dutta and Jas:
Detecting and correcting errors in logic circuits is much more difficult than in
memories. While concurrent error detection and correction mechanisms can be
efficiently incorporated in memories due to their regular structure, logic circuits
present a much greater challenge because of their irregular structure ... The easiest
concurrent error detection (CED) scheme for logic circuits is to use duplication where
the circuit is duplicated and the outputs are compared with an equality checker.
While this is a simple method to implement, it requires more than 100% overhead and
also it can not correct any of the errors. [38]
In the 3D scenario, we can duplicate selected logic circuits in the control plane, and
simply accept the overhead of duplication, comparing the duplicated results with the
original. However, if the overhead of doing so is unacceptable but some reduced error
detection rate is acceptable, a technique called non-intrusive concurrent error detection
(CED) may useful. Dutta and Jas [38] provide a diagram summarizing how this class of
techniques is implemented (left), and we conjecture a comparable arrangement in a 3D
scenario (right):
This structure lends itself well to the 3D scenario, since the inputs, outputs, and
functional logic are unchanged in the computation plane, while the prediction circuit,
compaction, and comparison circuits, in the control plane, could provide an error
indication without duplicating every gate of the original logic. Though it consumes less
overhead than strict duplication, non-intrusive CED will not detect all errors or malicious
changes in logic; also, construction of the prediction circuit may require extensive inputspace sampling [38]. Ultimately, the control plane logic verification technique will likely
involve some tradeoff between completeness and duplication overhead.
Arithmetic Circuits
Here again, some form of duplication is the most complete method of verifying a
circuit's correct operation. There may be other options for detecting a subset of errors or
malicious modifications, such as using some type of checksum or bounds-checking on
the outputs, but we would probably elect to duplicate an entire ALU or FPU in the
control plane, if there are significant concerns about malicious modifications to an
arithmetic unit. In our research, we will informally explore the design considerations
involved in implementing this type of monitoring in a 3D scenario.
Data Path Integrity
Our threat model grants an adversary the ability to implement a malicious inclusion
which can modify the logical value of a circuit while the data is in transit, propagating
from one part of the circuit to another. How is the correctness of data propagation along
a circuit path defined, and how can incorrect operation be detected? We propose to
informally investigate these questions.
First, we must investigate whether there can exist a complete, consistent, and
unambiguous method for defining what circuits are simply "data paths" in any generalpurpose processor. If we are able to define the control circuits, storage and retrieval
circuits, and arithmetic and logic circuits constructively, then it may be the case that the
data path circuits are simply "everything else."
The next question that must be answered in the security context, in regard to data path
integrity, is its scope. Informally, correct operation of the specified functionality asks
only whether: a data path that transmits a value along a circuit, from the output of one
element to the input of another element, always has the same logical value at both ends.
We might call this notion transit invariance.
However, transit invariance does not capture the problem of the presence of additional,
possibly malicious, circuits that affect confidentiality, integrity, and availability of the
data. These additional circuits may, for example, tap off the data path and create a copy
of the signal that was not in the processor specification. In the following diagram, a
signal propagates along a circuit through several buffers; its value does not change during
propagation, but an unauthorized (i.e., not in the specification) copy of the circuit value is
made and possibly used elsewhere:
In this case, also detecting the unauthorized signal would require a new definition,
which captures the notion that no additional functionality (in this case, no new circuits),
beyond those in the processor specification, are permitted. The problem is, how does one
(nondestructively) examine a complex processor, and determine that it has no "extra"
circuits? The 3D monitoring techniques we have outlined seem better suited for
detecting deviations from specified circuit behavior than for detecting additional circuit
behavior. It may be the case that, in terms of data path integrity, additional circuits are
tolerable only if they have no effect on the other essential functional categories execution flow, control-test-debug, arithmetic and logic, and storage and retrieval. We
plan to informally explore the question of data path integrity in a 3D monitoring context,
but it is not a planned subject of our experiments.
VI Experimental Verification Plan
To verify our research, we will conduct experiments on processor simulators, focusing
on detecting malicious inclusions by monitoring execution flow. The demonstration will
focus on the following tasks:
Review and analyze existing malicious inclusion detection techniques
Review existing malicious inclusion taxonomies, types, and examples
Map the common microprocessor architectural features (such as load-store unit,
reorder buffer, etc.) to the types of malicious inclusions to which they are vulnerable
Identify the key computation-plane architectural nodes which must be monitored by
interconnects for successful 3D detection of malicious inclusions, and explain their
Detect known, predefined MIs using a 3D security in a simulation, using a simple
Detect unknown, non-predefined threats, using a generalized monitor
Analyze the limits of what can and cannot be reasonably detected using these
methods. Identify the tradeoffs in monitor size and the number of monitor points
against performance, and describe an optimum balance of performance and security
Provide evidence that the malicious inclusions detected in our demonstration are not
detectable using previously described techniques
During the experimentation phase, we will consider the following questions:
Malicious Inclusions
 At the architectural level, what features are the easiest and most difficult to monitor
using 3D methods, in terms of the size of logic and number of connections required?
 At the architectural level, what are the most and least important features, in terms of
security threat potential, that must be monitored to detect MIs?
 What features of the computation plane architecture are most suitable for the
application of 3D monitoring, in terms of ease of fabrication?
 What are the unique characteristics of MIs that are not detectable using existing
testing methods?
 What are the characteristics of MIs that are detectable, or not detectable, using a 3D
 What MIs are simplest to design, yet have the most serious potential security
Control Plane Monitoring Techniques
 Once the computation-plane data to be monitored is available, what are the lowest
running-time control-plane algorithms for concluding the presence of MIs? What is
their time and space complexity, and how does the complexity of the 3D approach
compare with the complexity of other detection approaches? Can we implement a
state-based approach using automata? If so, how many states are required? How
can processor microcode flow be verified the fastest? Within how many clock
cycles can a monitor detect the activation of an MI?
When the CPU in the computation plane is idle, can the control plane use the
available cycles to conduct background checks on the computation plane, instead of
only monitoring the computation plane during commanded execution? What should
those tests look like, to maximize the detection rate per number of tests conducted?
We propose to use general-purpose architectural designs as baselines for the
computation plane and for the control plane. The computation plane model will then be
modified with malicious inclusions that must be detected. The control plane model will
be modified to connect to the computation plane with circuits that mirror where the 3D
"posts" would be on a silicon chip. The control plane will be configured to continuously
monitor the various posts, so it can follow the operation of the computation plane and
identify the malicious behavior associated with MIs.
The specifications for the computation plane, control plane, interconnections, and
malicious inclusions, and the test results will all be included in our report.
VII Comparison With Other Work
Naval Postgraduate School Technical Report - 3D Security Vision and Outline
In [6], the authors introduce the fundamental approach to 3D security, and list several
possible applications.
Our work will expand on this, to provide an architecture-level demonstration,
emphasizing the detection of malicious inclusions in hardware.
Some Existing Malicious Examples
In [5], the authors develop eight specific MIs (referred to by the authors as "hardware
trojans") for an Alpha architecture, in the context of protecting a cryptographic key.
They categorize MIs based on function: broadcast information to the attacker,
compromise the host circuit, or disable the host circuit. They demonstrate how MIs can
be designed in conjunction with circuit optimization, whereby the removal of some
existing circuits helps mask the addition of malicious circuits, making the MIs more
difficult to detect by existing power-analysis techniques. Their MIs operate at the
register transfer level (RTL), but they note that some MIs could be developed to operate
at the gate or transistor level.
Our technique is not designed to detect these kind of MIs, since they are built for a
special-purpose processing chip, and our method is designed for general-purpose
processor architectures. This work focuses on MI design, whereas our work will focus on
MI detection and response, which is not addressed in [5].
In [25], King, et al., develop two malicious inclusion attacks, one allowing an
escalation-of-privilege backdoor, and another implementing a secret "shadow memory"
system on the Leon3 processor.
Our 3D execution monitor technique should detect the types of MIs created for this
work, and we plan to test our system against either these particular attacks, on the Leon3
processor design, or comparable, similar attacks on a smaller processor design.
Other MI Tests - Power and Timing
The currently evolving category of test methods for detecting MIs may generally be
described as "fingerprinting" of either power or timing signals. A processor may have
measurable variations at its power ports under various inputs, or it may have measurable
variance in output signal delay for certain inputs. By studying how these power levels or
timing delays vary over a large sample of normal inputs on trusted "gold" chips,
researchers are able to establish a "fingerprint" of the baseline behavior. They then can
identify deviations from the "fingerprint" patterns if malicious inclusions are present,
when testing uncertified chips. These promising techniques are described in [5], [12],
[13], [14], and [15]. One common shortfall of nondestructive testing techniques like
these, though, is they do not scale well to very large, complex chips, and tend to perform
better when the malicious anomaly is located closer to an observation point on the
exterior, and occupies a reasonable (greater than approximately .1%, in tests so far - more
is better) portion of the overall number of circuits in the chip. In [5], the authors illustrate
techniques for inserting malicious inclusions while masking the effect of the added
malicious circuits through balanced reductions in existing circuits nearby, using
optimization or by eliminating unused legacy circuits. The existence of such disguise
techniques, and the difficulty of detecting path delay and power changes over large-scale,
complex chips will limit the application of these detection methodologies, especially
against a capable adversary.
Our approach is complementary to the "fingerprinting" methods in that 3D Security
techniques have the potential for finding a malicious inclusion which does not produce
large changes in the power or timing characteristics of the target processor. Malicious
inclusions could be as small as a few dozen gates, and masked as in [5], and would
therefore be very difficult to detect using these existing techniques. These MI detection
methods are very difficult to scale up, and can be performed only prior to installation;
our techniques will be easier to scale, and can run continually during operations in the
Detecting Network Intrusions, Viruses, and Malicious Code
Our proposal differs from network intrusion detection, software virus detection, and
other similar systems in two ways:
First, viruses and intruder network packets exist in software, at a higher level of
abstraction than the level we propose to monitor. Though the term "hardware virus" has
been used in some literature, a virus differs from a hardware malicious inclusion in the
sense that a virus seeks to replicate and propagate itself to other systems, as is possible
in software, whereas a malicious inclusion seeks to perform its function in place, in
hardware, on one system only.
Second, the concept of "correct flow" of control may not be well defined in software,
due to software's ability to change configurations and evolve over time. However,
correct flow is well defined in a microprocessor architecture (here we propose to monitor
non-programmable, general purpose chips; the problem of monitoring reprogrammable
circuits like FPGAs is not considered). In general, the processor design permits us to a
priori identify certain sequences or combinations of control circuit activations which will
never be authorized; using the 3D approach, it should be possible to detect and mitigate
some such sequences that are not detectable using existing methods. The principal
advantage of 3D that facilitates this is the ability to place the 3-D interconnects precisely
at any key circuits which need monitoring, not just the circuits accessible via external
pins. Part of our research is to examine which type of internal control signal and which
internal components of the integrated circuit need to be monitored, and which do not, for
each type of malicious inclusion.
As noted in [25], hardware malicious inclusions may facilitate subsequent software
attacks; we do not propose to consider the software side of such an attack in the scope of
this research, but do propose to consider how to detect hardware changes that might
facilitate it.
Reference Monitors
The reference monitor concept, first outlined in [27], has been well defined and
explored in the realm of high-assurance computing. The general properties and
limitations of security enforcement mechanisms, especially using automata-like
structures, are discussed in [26]. Sterne, et al., assert that "the security properties
required by most organizations cannot be enforced unilaterally by a reference monitor,
even when combined with the other components of a conventional Trusted Computing
Base (TCB)." [28]
We are not proposing a new or improved reference monitor or enforcement mechanism
in the sense of [26] or [27]. A reference monitor is a system which tries to prevent
subjects from gaining unauthorized access to objects, according to some definition,
usually in the operating system, of subject, object, and authorized access. Subjects and
objects are defined, in that context, at a coarser granularity and a higher level of
abstraction than the level at which we propose to operate.
We propose to monitor and verify the correctness of the microprocessor operation with
respect to its definition in an architectural-level specification, not with respect to subjects
and objects. A malicious inclusion does not know what a subject or an object is, but
rather is preprogrammed to modify the operation of the microprocessor in some way,
such as disabling the protection logic of a memory management unit [25], leaking a
secret [5], or surreptitiously disabling the processor entirely [9]. Our method examines
the mechanisms by which malicious inclusions perform these acts, and proposes new
ways to detect malicious inclusions that are not detectable through existing means.
Related Work
As illustrated in [6], a 3D control plane could be used as one way of enforcing a
noninterference policy within a single shared cache, among two or more processes
operating at differing sensitivity levels or security compartments. In particular, the
control plane could prevent the shared resource from being used as an overt storage
channel or covert timing channel between the processors. Other potential 3D security
applications, such as enforcing a security policy over interprocessor connections, network
interfaces or other I/O devices, are also described there.
To our knowledge, the cache controller is the only demonstration to date of the use of
3D integration techniques in a security-oriented application. 3D techniques have been
demonstrated for a variety of other uses, including imaging, performance monitoring,
and adding cache memory [24,42,43].
Appendix A. Alternatives to 3D: Other Mitigation Strategies
Other strategies exist to mitigate the threat of malicious inclusions. They are
complementary to the proposed research, and no one strategy alone will solve the
problem entirely. A few of these are summarized below.
U.S. Government Initiatives, Existing Mitigation Strategies
On 2 March 2010, the U.S. Government declassified portions of its Comprehensive
National Cyber Security Initiative (CNCI) [4]. Item number 11 of this initiative
addresses the "Global Supply Chain." According to [3], the supply chain "has become so
globalized that we can quickly lose track of where our technology is coming from. This
creates opportunities for malicious actors to create backdoors, malware, and faulty
hardware that make its way into our weapon systems, internet infrastructure, banking
systems, and personal computing devices. The CNCI will require careful tracking of
participants in the supply chain, which will steer more and more buyers towards Trusted
The Trusted Foundry Approach
In 2003, DoD established its Trusted Foundry program to help ensure that the U.S.
maintains the capability to manufacture important processors for high-assurance DoD
applications. Managed by the National Security Agency (NSA) and the Defense
Microelectronics Activity (DMEA), this program had 29 accredited Trusted Suppliers as
of March, 2010. DMEA is expanding its foundry certification program to include trusted
evaluation of the design, aggregation/brokering, mask-making, assembly, and test phases
of processor supply. [3]
However, the Trusted Foundry program does not solve all of DoD's microchip supply
requirements. The certified foundries will not be able to manufacture the full breadth of
advanced processors desired by managers of DoD programs, and use of the Foundry
system is not mandatory [2]. There are certified designers, like the one at Sandia, but
Sandia's facility, for example, can currently only produce to the .35 micron level, well
behind the state of the art [11]. Some DoD systems requiring the latest in circuit
technology may not be able to obtain it all directly through the Trusted Foundry program.
The Domestic Commercial Approach to Processor Trust
Though it has been generally acknowledged that the U.S. lacks the capacity to
domestically manufacture current-generation processors and secure the entire supply
chain for the integrated circuits required by DoD and other high-assurance customers,
there are some alternatives.
For example, CPU-Tech recently introduced the Acalis family of secure processors. It
features a tamper-resistant dual-PowerPC chip, fabricated within the Trusted Foundry
program, with an accompanying development kit [23]. For the needs of some simpler
applications, this type of system may suffice. As noted, though, the computational power
of products manufactured domestically has not kept pace, relative to systems made
overseas, and our ability to produce integrated circuits in large volume has also
diminished. In addition, this particular commercial system requires the use of proprietary
hardware and closely-held proprietary software, limiting its utility.
DARPA Trust in Integrated Circuits Program
The Defense Advanced Projects Research Activity (DARPA) is sponsoring an ongoing
competition to spur innovation in methods for detecting malicious inclusions in hardware.
DARPA has contracted with MIT's Lincoln labs to design the malicious elements, USC's
Information Sciences Institute to manufacture the chips, and Johns Hopkins University's
Applied Physics Lab to assess the competitors' results. Teams from Raytheon, Luna
Innovations, and Xradia participated in the initial round of the competition; publicly
available contract data suggests that at least Raytheon has progressed to the third phase of
the trial program [10]. As discussed above, however, external testing methods, though
useful, are limited in terms of the type of malicious inclusions they will be able to detect.
Existing Chip Testing and Its Limitations
Many of the general challenges of traditional semiconductor verification are
summarized in [9]:
"Although commercial chip makers routinely and exhaustively test chips
with hundreds of millions of logic gates, they can't afford to inspect
everything... 'You don't check for the infinite possible things that are not
specified,' says electrical engineering professor Ruby Lee, a
cryptography expert at Princeton. 'You could check the obvious
possibilities, but can you test for every unspecified function?'
Nor can chip makers afford to test every chip. From a batch of
thousands, technicians select a single chip for physical inspection,
assuming that the manufacturing process has yielded essentially
identical devices. They then laboriously grind away a thin layer of the
chip, put the chip into a scanning electron microscope, and then take a
picture of it, repeating the process until every layer of the chip has been
imaged. Even here, spotting a tiny discrepancy amid a chip's many
layers and millions or billions of transistors is a fantastically difficult
task, and the chip is destroyed in the process.
But the military can't really work that way. For ICs destined for
mission-critical systems, you'd ideally want to test every chip without
destroying it."
Naturally, an exhaustive test of all possible sequences of chip execution is not possible.
Even a test sequence for just a 64-bit integer arithmetic logic unit, such as multiplying
together all possible combinations of 64-bit unsigned integers, leads to (264)2 or 2128
possible combinations, and would not be feasible. Even at 210 operations per second,
such a test would take 2118 seconds or, at around 225 seconds per year, around 283 years.
And that's just for integer multiplication; testing all possible combinations of CPU
opcodes and data fields, program counters, floating point operations, register values,
cache movements, and interrupt flow, to name a few, and you would be testing
essentially forever, and that's before even considering multi-core chips.
As a result, experts from industry and academia have worked to develop intelligent,
non-exhaustive methods for testing microprocessors for malicious inclusions; a few of
these were described in an earlier section.
Industry has focused a great deal of resources on proving the correctness of chip design,
with respect to some specification. These proofs often use formal methods, taking
advantage of theorem provers and model checkers. Although, as pointed out above, it is
impossible to test every possible computational sequence in a chip in any reasonable
amount of time, the formal methods approach allows us to prove certain properties about
subsets of a processor's operation. For example, in [17], the authors use the PVS theorem
prover to discover several bugs - differences between higher-level specification and
actual circuit design - in an ARM processor.
These methods, though important and useful, cover only the correctness of an integrated
circuit's design; post-design phases in the supply chain are not covered. It is important to
note, however, that some malicious changes inserted during the design phase could
potentially be detected during formal verification of the circuit design.