Design and Implementation of Three-Dimensional

advertisement
Design and Implementation of Three-Dimensional
Logic Structures
by
Shamik Das
Submitted to the Department of Electrical Engineering and Computer
Science
in Partial Fulfillment of the Requirements for the Degrees of
Bachelor of Science in Electrical Science and Engineering
and
Master of Engineering in Electrical Engineering and Computer Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2000
© Shamik Das, MM. All rights reserved.
The author hereby grants to MIT permission to reproduce and
distribute publicly paper and electronic copies of this thesis document
in whole or in part, and to grant others the right to do so.
MASSACHUSETTS INSTITUTE
OF TECHNOLOGY
JUL 2 7 2000
Author ..........................
.....................
.LIBRARIES
Department of Electrical Engineering and Computer Science
May 22, 2000
C ertified by ......
...
.......................
SV
Joseph Jacobson
Associate-Professor, Media Arts and Sciences
:;jhis S3ervisor
-
Accepted by..........
. ........
Arthu+r ni.t
Chairman, Department Committee on Graduate Students
Design and Implementation of Three-Dimensional Logic
Structures
by
Shamik Das
Submitted to the Department of Electrical Engineering and Computer Science
on May 22, 2000, in Partial Fulfillment of the
Requirements for the Degrees of
Bachelor of Science in Electrical Science and Engineering
and
Master of Engineering in Electrical Engineering and Computer Science
Abstract
In this thesis, a computer-aided-design (CAD) system is developed that assists in the
design of novel three-dimensional integrated circuits. The software tools allow for the
specification of a multilayer transistor circuit by means that are readily accessible
to those familiar with two-dimensional CMOS VLSI design. This software system
provides desirable features such as SPICE circuit extraction and the ability to produce
the design formats necessary for automated fabrication (e.g. mask specifications for
lithography or Gerber data for inkjet printing). Finally, in this thesis, the software
tools are used to design a ring oscillator, a 3-D static RAM, and a 3-D cellular
automata machine.
Thesis Supervisor: Joseph Jacobson
Title: Associate Professor, Media Arts and Sciences
2
Acknowledgments
I am grateful to many people for their support in the development of this thesis.
My thesis advisor, Joe Jacobson, deserves thanks for his guidance and motivation, as
well as for many helpful discussions about the research. Babak Nivi, Colin Bulthaup,
and Eric Wilhelm were instrumental in fabricating test structures from the softwareproduced specifications. I also appreciate the many TFT discussions with Babak,
Colin, and Brent Ridley, as these were important for shaping the form the circuitdesign process was to take. Saul Griffith and Sawyer Fuller deserve thanks for their
input on laser and inkjet patterning of functional materials.
In addition, this thesis would not have been completed without the support of
many friends, brothers, and loved ones. I would especially like to thank my family
-
my parents, Dilip and Mala, and my sister, Alina - for their inspiration, direction,
and support.
3
Contents
1 Introduction
2
3
8
1.1
Design of the Layout Software . . . . .
10
1.2
Implementation of Test Circuits . . . .
12
1.2.1
Ring Oscillator
12
1.2.2
Static Random-Access Memory
13
1.2.3
Cellular Automata Machine
14
FluidLayout - The Layout Software
16
2.1
Overall Considerations . . . . . . . . .
17
2.2
Implementation . . . . . . . . . . . . .
20
2.2.1
2-D Slice Manipulation . . . . .
21
2.2.2
Circuit Partitioniing . . . . . .
24
2.3
Circuit Verification . . . . . . . . . . .
26
2.4
Circuit Fabrication . . . . . . . . . . .
27
2.5
Design Walk-Through
. . . . . . . . .
28
. . . . . . . . .
Some Basic Transistor Circuits
31
3.1
Minimum Criteria for the Technology . . . . . . . . . . . . . . . . . .
31
3.2
Design of Basic Circuits
34
. . . . . . . . . . . . . . . . . . . . . . . . .
4 The Static Random-Access Memory
42
4.1
Background and Motivation . . . . . . . . . . . . . . . . . . . . . . .
43
4.2
SRAM operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
44
4
5
4.3
Extensions to 3-D . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
46
4.4
Layout of a 3-D SRAM . . . . . . . . . . . . . . . . . . . . . . . . . .
51
The Cellular-Automata Machine
55
5.1
Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
55
5.1.1
Finite-State Machines
. . . . . . . . . . . . . . . . . . . . . .
55
5.1.2
Cellular-Automata Machines . . . . . . . . . . . . . . . . . . .
57
5.1.3
The Game of Life . . . . . . . . . . . . . . . . . . . . . . . . .
59
Layout of a 3-D Game of Life . . . . . . . . . . . . . . . . . . . . . .
60
5.2.1
Game of Life Cell . . . . . . . . . . . . . . . . . . . . . . . . .
60
5.2.2
Game of Life CAM Architecture . . . . . . . . . . . . . . . . .
62
5.2
6
Conclusion
67
A FluidLayout User's Guide
69
A .1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
A.2 Basic Layout
. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
69
A.3 Higher-Level Functions . . . . . . . . . . . . . . . . . . . . . . . . . .
71
A.3.1
Node Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . .
71
A.3.2
Translation, Rotation, and Reflection . . . . . . . . . . . . . .
72
A.4 Circuit-Level Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . .
74
A.4.1
Circuit Traversal
. . . . . . . . . . . . . . . . . . . . . . . . .
A.4.2
Cell Hierarchy Management
. . . . . . . . . . . . . . . . . . .
74
A.4.3
Magic Importation . . . . . . . . . . . . . . . . . . . . . . . .
75
A.4.4
Circuit Netlist Extraction
. . . . . . . . . . . . . . . . . . . .
75
A.4.5
VLSI/MEMS Fabrication
. . . . . . . . . . . . . . . . . . . .
75
A.5 Step-By-Step Design Walk-Through . . . . . . . . . . . . . . . . . . .
78
A.6 Summary of Useful Commands
88
. . . . . . . . . . . . . . . . . . . . .
5
74
List of Figures
2-1
CLayer object with embedded CRectangle objects.
. . . . . . . . . .
22
2-2
Corner-stitched CRectangle object. . . . . . . . . . . . . . . . . . . .
23
2-3
Area enumeration of CRectangle objects within a bounding rectangle.
24
2-4
Canonical technology used in FluidLayout. . . . . . . . . . . . . . . .
26
2-5
Box-outlining is used to place materials in FluidLayout. . . . . . . . .
28
2-6
Complete NMOS pulldown path.
. . . . . . . . . . . . . . . . . . . .
29
2-7
Placement of a metal2-+gate via results in a gate-+metal2 hint on the
second layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
29
2-8
The com plete inverter. . . . . . . . . . . . . . . . . . . . . . . . . . .
30
3-1
NM O S inverter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
32
3-2
NMOS inverter small-signal model about VM.
. . . . . . . . . . . . .
33
3-3
Layout of test devices. . . . . . . . . . . . . . . . . . . . . . . . . . .
35
3-4
3-D layout of a ring oscillator. The three inverters shown are stacked
to form the 3-D layout. . . . . . . . . . . . . . . . . . . . . . . . . . .
36
. . . . . .
39
. . .
40
3-5
Stamp pattern with spatially-separated material patterns.
3-6
Patterned gate for a NOR gate, SRAM cell, and ring oscillator.
3-7
Patterned source/drain for a NOR gate, SRAM cell, and ring oscillator. 41
4-1
Six-transistor circuit for individual bit storage. . . . . . . . . . . . . .
45
4-2
Eight-transistor circuit for use in 3-D SRAM.
. . . . . . . . . . . . .
47
4-3
Proper cell distribution improves aspect ratio and decreases bit-line
length........
4-4
....................................
3-D partitioning of rows allows for simple tri-stating of word lines. . .
6
48
50
4-5
First and second layers of an eight-layer 3-D SRAM . . . . . . . . . .
52
4-6
6-T SRAM cell layout. . . . . . . . . . . . . . . . . . . . . . . . . . .
53
4-7
Word-line tri-stating. . . . . . . . . . . . . . . . . . . . . . . . . . . .
53
4-8
Bit-line decoding using the word-line tri-state control signal. . . . . .
54
5-1
Finite state machine. . . . . . . . . . . . . . . . . . . . . . . . . . . .
56
5-2
Turing machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
57
5-3
Four cells of a cellular-automata machine.
. . . . . . . . . . . . . . .
58
5-4
Insertion sort bit-slice. . . . . . . . . . . . . . . . . . . . . . . . . . .
62
5-5
G am e of Life cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
63
5-6
Clock distribution in the 3-D Game of Life architecture . . . . . . . .
64
5-7
Layer of a 3-D Game of Life architecture comprising a 4 x 4 array of
cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
A-1 FluidLayout screenshot.
. . . . . . . . . . . . . . . . . . . . . . . . .
A-2 The main FluidLayout toolbar.
66
70
. . . . . . . . . . . . . . . . . . . . .
71
A-3 Partial view showing node selection . . . . . . . . . . . . . . . . . . .
72
A-4 Label dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
A-5 Subcircuit rotation. . . . . . . . . . . . . . . . . . . . . . . . . . . . .
73
A-6 Cell hierarchy management toolbar. . . . . . . . . . . . . . . . . . . .
74
A-7 Edit->Properties->Laser Setup .....
76
...................
A-8 4 x 2 box used for the source of an NMOS transistor. . . . . . . . . .
79
A-9 Completed source node of the NMOS transistor. . . . . . . . . . . . .
79
. . . . . . . . . . . . . . . . . . . . .
79
A-11 Inverter source and drain nodes. . . . . . . . . . . . . . . . . . . . . .
80
. . . . . . . . . . . . . . . . . . . . . . . . . . . .
81
A-10 NMOS source and drain nodes.
A-12 Complete inverter.
A-13 Window menu. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
82
A-14 First-layer inverter with contact pads. . . . . . . . . . . . . . . . . . .
83
A-15 First-layer inverter with via stacks to the second layer.
. . . . . . . .
84
A-16 Labeled first-layer inverter. . . . . . . . . . . . . . . . . . . . . . . . .
85
A-17File->Export menu. . . . . . . . . . . . . . . . . . . . . . . . . . . .
86
7
Chapter 1
Introduction
CMOS integrated circuits are traditionally fabricated on crystalline silicon wafers.
Transistor structures are created on the surface of these wafers by implantation into
the wafer and growth and deposition of material over the surface of the wafer. This
fundamentally results in a two-dimensional circuit layout, as the transistors are confined to inhabit the boundary of the silicon substrate.
However, a number of advances in solid-state technology have made possible
the development of more complicated three-dimensional transistor structures.
All
of these advances rely on the creation of two-dimensional circuit "layers" by standard means, and then interconnecting these layers into a multi-layer structure. For
example, the development of wafer-scale integration (WSI) allows for the creation of
three-dimensional circuits by stacking wafers using a lift-off and bonding process [24].
Also, the advent of silicon-on-insulator (SOI) technology allows for a third dimension
of circuitry by encapsulating an existing two-dimensional circuit with insulating material, planarizing this material, and placing the next layer of silicon on this insulator.
A very promising path to multi-layer transistor circuits involves the use of solutionprocessed metals, semiconductors, and insulators. Transistors can be laid out by
depositing the appropriate solutions onto an insulating surface, followed by curing to
produce the desired materials [17, 16]. This approach has the advantage that it does
not require integration on the wafer scale (which itself requires novel means of wafer
verification and packaging) and is theoretically extensible to thousands of layers.
8
Having multiple layers in which to fabricate transistors gives the circuit designer
the potential to improve the efficiency of circuits in terms of area, power, and speed.
The savings in area are two-fold: first, by utilizing the third dimension as space for
additional circuitry, integrated circuits can be made more dense without expanding
the "footprint" of the circuit and without having to improve the process technology.
This form of area improvement is best for memory devices such as SRAM, and also
good for DRAM and EEPROM, where the goal is to fit as many bits of memory as
possible into a given chip. Specifically, the use of n active layers allows for an n-fold
improvement in the storage capacity of a memory chip, with little area overhead in
the control circuitry.
The other approach to area-savings lies in the retargetting of two-dimensional
circuit layouts for a three-dimensional process technology. Theoretical results indicate that for many interesting 2-D circuit layouts, there exist corresponding 3-D
layouts that are more efficient in terms of area (by which, in the three-dimensional
context, we mean the aggregate area of all layers of the circuit) and maximum wirerun (the longest length of wire between any two active nodes). For example, the
n-point Fast Fourier Transform (FFT) network can be implemented in area O(n3/ 2 )
with maximum wire-run O(ni/2 ) with three dimensions of active circuitry, while in
a standard 2-D process, the same circuit would occupy area Q(n 2 ) with maximum
wire-run Q(n/ log n) [18]. A hypercube network of n nodes, used in many parallelprocessing schemes, can also be implemented in area O(n 3 /2 ) using three dimensions,
but requires area Q(n 2 ) using the standard two dimensions [11]. Finally, results in
[10] indicate that any n-device circuit that can be laid out in area A in two dimensions
can be laid out in area approximately (nA)1/ 2 using three.
The savings in power may be realized by reducing the switched capacitance in
circuits. By reducing the lengths of interconnect, the capacitance of internal nodes
can be reduced, thus reducing the dynamic power dissipation of the circuit.
For
example, a potentially important savings can be realized in the layout of H-trees,
which as indicated by [18] can be laid out with maximum wire-run O(ni/3 ) in three
dimensions but require wire runs of Q(n'/ 2 / log n) in two dimensions. Since clock
9
distribution nets are often realized as H-trees, the potential exists to save power by
utilizing the more efficient distribution architectures available with three dimensions.
Finally, as circuits get more and more complicated, lengths of interconnect will
affect timing characteristics. Reducing the wire run of circuits will reduce the charge
and discharge times on these wires and enable faster operation of circuits.
There are many reasons to develop the technology to fabricate three-dimensional
CMOS devices. While developing such technology is beyond the scope of this thesis,
it is important to realize that the ability to design logic with this technology must be
developed simultaneously. Therefore, in this thesis, software tools are developed that
are used to target circuit designs for a three-dimensional MOS process that has been
developed contemporaneously. These tools are then used to synthesize some circuits
that demonstrate the viability of the layout tools and some of the benefits of the new
medium.
1.1
Design of the Layout Software
Digital system design is usually done at three levels: behavioral, structural, and physical [23]. At the behavioral level, a digital system is specified by what it is intended
to do; at the structural level, by what functional building blocks (e.g. gates, adders,
registers, IP cores) are to be used; and at the physical level, by what construction
materials are to be used and in what geometry they are to be configured. Corresponding to this division are several layers of abstraction - architectural, registertransfer-language (RTL), logical, and circuit layers - at which the designer can work
[231.
In a typical design flow, the designer will often work through both of these chains in
parallel; for example, he or she may start by developing a behavioral and architectural
specification for a system and proceed to flesh out the implementation details down
to the circuit layer and at the physical level. Much of the task of fleshing out the
details of a system is done with computer-aided design (CAD) tools. The goal of any
suite of CAD tools for digital system design is to produce a working circuit, that is,
10
to specify fully a working design at the physical level and at the lowest abstraction
layer.
Typically, the design flow can be broken up into two phases - technology-independent
design and technology-dependent design. For example, the process of behavioral specification is ideally technology-independent, whereas physical design at the circuit level
is clearly technology-dependent. Each phase has an associated set of algorithms that
are generally implemented in separate CAD tools. The process of targetting an abstract system design for a particular technology is called technology mapping, and can
be done by a third set of CAD tools or as a final- or initial-stage operation of the two
phases of design.
In order to maximize the usefulness of any new technology, design tools must be
developed that allow both for the use of new features of the technology and for the
seamless integration of the technology with existing means of technology-independent
design. In this thesis, the focus is on the technology-dependent phase of design. CAD
tools are developed that allow the designer to work at the physical and structural
The emphasis is twofold: first, familiar
levels at arbitrary levels of abstraction.
graphical user interfaces (GUIs) are adapted for use in working in a three-dimensional
environment; second, the CAD tool allows for maximal use of the new features of the
this environment.
Specifically, a CAD tool for 3-D circuit layouts is designed using the open-source
Magic VLSI layout system as a basis [15, 12]. The primary features in Magic that
will be relied upon are the speed of the central algorithms and the familiarity of the
GUI. Magic uses a geometric representation of the physical layout of the system that
is based upon a scheme devised by Mead and Conway [13].
At the physical level, the design approach used for this thesis is to design each
layer of a multi-layer circuit as a distinct two-dimensional circuit. What this means
is that from a physical perspective, each individual layer of a multi-layer circuit
looks like a traditional two-dimensional circuit; therefore, the layout of each layer can
theoretically be done using available CAD tools. In fact, 3-D circuit design using
this approach is the subject of ongoing research, where a small number of layers is
11
considered [25]. However, the use of existing CAD tools becomes infeasible as the
number of layers becomes large. Thus, in this thesis, a CAD system is developed that
integrates the familiarity of two-dimensional circuit design with a means of managing
large numbers of layers and a direct means of wiring between the layers.
1.2
Implementation of Test Circuits
The viability of the layout software is best tested by using it to implement various test
circuits. These circuits should be chosen both to exhibit features of the medium and
to exhibit useful properties of the software. To this end, the layout of three circuits
has been carried out using the new software: a ring oscillator, a static random-access
memory (SRAM), and a simple cellular automata machine (CAM).
1.2.1
Ring Oscillator
In the development of any new technology in which circuits are to be fabricated, the
ring oscillator is a fundamental circuit in that it is the simplest circuit to demonstrate
the ability to cascade logic gates. That is, in any such technology, while the first goal
is always to fabricate individual transistors, it does not necessarily follow that these
transistors can be fashioned into a suitable multi-transistor logic gate. An individual
logic gate must provide gain from the input to the output, or else when the gates are
cascaded, the signals eventually decay to an ambiguous logic level [5].
A ring oscillator consists of an odd number of inverters cascaded in series into
a ring. Once the ring oscillator is powered, any latent signal is able to propagate
through the ring; this signal is inverted as it passes through each gate. When the
signal returns to its starting point, it returns as the inverse of the original signal since
there is an odd number of gates in the ring. So if the voltage at any particular node
of the circuit is observed as a function of time, the result is an oscillation with period
equal to twice the transit time of the signal through the ring.
Since there is an odd number of inverters in the ring, the circuit acts to provide
negative feedback on the signal. Oscillations are produced if feedback loop is unstable.
12
Thus, if the inverters that make up the ring do not provide sufficient gain (i.e. if the
loop gain never exceeds 1), the signal stabilizes to an ambiguous logic level midway
between the supply voltage and ground. In particular, it is desirable for the individual
transistors to have as large a transconductance, g-, as possible, where gm is measured
at the voltage corresponding to this ambiguous logic level. Having a sufficient gm
produces the necessary gain to drive the output signals away from ambiguous logic
levels and towards the voltage extremes (i.e. low or high voltage). Since g. is directly
proportional to the mobility, p, of carriers in the transistor channel, having a large
mobility is desired. However, it is possible to overcome the absence of large mobilities
to some extent, because of other factors on which gm is dependent.
For example,
one may either increase the width-to-length ratio of the transistor channel, decrease
the thickness of the gate oxide, or increase the supply voltage (thereby increasing the
midpoint voltage where g, is measured). Further, in any technology, the proper choice
of pullup (for an n-channel technology) or pulldown (for a p-channel technology)
can reduce the dependency on device parameters. Nonetheless, the efficacy of these
maneuvers is limited, due to circuit-area constraints, device-breakdown limits, and
second-order effects. Also, in a complementary technology, which is most desirable
due to power considerations, having high-quality basic device parameters is essential.
Since the ring oscillator is a planar circuit, there exists a "natural" two-dimensional
layout for the circuit at the transistor level.
However, having a third dimension
presents the opportunity to examine some new layout strategies.
1.2.2
Static Random-Access Memory
One immediate application of a viable three-dimensional integration technology is in
memories. The density of memory arises directly from the ability to pack as many
homogeneous cells into a given chip as possible. So the availability of multiple layers
in a chip allows for a direct approach to increasing density - simply stacking 2-D
memory circuits into a 3-D chip gives the desired increase in density.
This approach has been implemented at the system level by physically stacking
chips and using off-chip circuitry to control the chip-enable signals and I/O [4]. This
13
is, of course, not extensible to arbitrarily many layers. However, the same approach
can be taken at the chip level by internally wiring control signals to each layer of the
circuit and wiring the data lines together in the same way that the data pins have
been soldered together.
In this thesis, a simple 3-D static random-access memory (SRAM) is designed and
laid out using the CAD software. SRAM is chosen for several reasons. First, it can be
fabricated using MOS technology and does not require special transistor structures.
While other memories such as EEPROM and DRAM may see a greater push for increased density, these memories require dedicated fabrication technologies. SRAM,
on the other hand, can be fabricated in a standard logic technology. Secondly, while
read-only memories (ROMs) can also be fabricated using only standard MOS transistors, the need for high density is more prevalent in systems with writable storage. So
for this thesis, an SRAM is designed that exhibits writable and retrievable storage and
uses multiple layers of active material while using a standard two-dimensional pinout. Such an SRAM can therefore be made into a drop-in replacement for currently
available SRAM.
1.2.3
Cellular Automata Machine
While digital system design at the architecture level ideally is done without the fabrication technology in mind, the limitations of the technology inevitably play a role
in the selection of a computational architecture. For example, in designing a multiprocessing architecture, physical constraints to two dimensions lead to architectural
constrains in terms of the number of processors that can be imbedded in a given area
[11].
In particular, design choices are often driven by the problem to be solved by the
system. There are many computational problems that can be described efficiently
using certain physical architectures and thus solved by modelling the architecture
by a digital system. For example, single-instruction multiple-data (SIMD) architectures, and in particular cellular automata, have been shown to effectively model the
(inherently three-dimensional) dynamics of many problems in physics [22, 8].
14
Additionally, it has been shown that there exist cellular automata machine (CAM)
architectures that can do general-purpose computations, i.e. that are equivalent to a
Universal Turing Machine. The CAM architecture therefore can serve as a potential
alternative to the architectures found in traditional computer processors [20].
For CAMs that are designed to model 3-D physical processes, it becomes infeasible
to map the 3-D CAM into a two-dimensional circuit as the number of cells becomes
large - the cost of interconnect becomes prohibitive. However, with a true threedimensional technology, the mapping of the CAM architecture to an integrated circuit
is direct, thereby allowing the physical construction of machines that are impossible
to integrate onto a single 2-D chip.
Therefore, in this thesis, the software tools are used to design a simple 3-D CAM.
One of the simplest cellular automata machines to exhibit interesting global behaviors
is the Game of Life, devised by John Conway in 1970 [20].
The Game of Life is
specified for a two-dimensional architecture, but can be extended to three dimensions
[1, 2, 3]. It describes the cells as having one of two states (either "alive" or "dead"),
with the state of any given cell on the next cycle of the game being determined by
the states of its neighbors on the current cycle. The behavior of the machine as a
whole can thus be observed by visually inspecting the cells ("alive" being indicated
by a color or dot). Since a circuit that simulates the Game of Life can be readily
verified, such a circuit is designed in this thesis.
15
Chapter 2
FluidLayout - The Layout Software
There is a clear potential for circuit design innovation above and beyond what is
possible with a two-dimensional fabrication technology. Since all known routes to
three-dimensional circuit fabrication involve the construction of multiple layers of
two-dimensional circuits, with inter-layer interconnect done by vias, it is tempting
to propose that the design of three-dimensional circuits be carried out using existing
software tools for each two-dimensional layer of the circuit. This approach provides
the fastest route to working 3-D prototype circuits.
However, there are several drawbacks to this approach that will limit severely its
usability for designing complex 3-D circuits. First, as each layer of the circuit must be
managed as a separate design, the management overhead increases with the number
of layers. The designer is responsible for keeping track of interconnect between each
pair of layers. While this may be feasible for a fixed number of layers, it becomes
intractable for an arbitrary number of layers. Second, the automation of systemlevel tasks such as circuit netlist extraction and mask generation becomes difficult or
impossible without additional software scripts or programs.
A better approach is to design CAD software with integrated support for designing
three-dimensional circuits. From a system perspective, software with this capability
can provide the designer both with needed assistance in 3-D design management
and with useful system-level design tools. Simultaneously, the software can utilize
algorithms written for two-dimensional circuit design, as the individual operations
16
that will be carried out by the designer are the same in both 2-D and 3-D design.
In this thesis, such a software tool, named FluidLayout owing to the solutionprocessing fabrication technology being used, has been delevoped. FluidLayout provides designers of 3-D circuits an integrated environment for laying out all layers of
a circuit and for verification and fabrication of three-dimensional layouts.
2.1
Overall Considerations
Much as circuit fabrication has been limited to constructing two-dimensional devices,
circuit design has been fraught with limitations imposed by two-dimensional design
methodologies. Traditional pen-and-paper circuit design, for example, requires partitioning a 3-D circuit into 2-D layers, either by using separate sheets of paper or by
spatially separating the layers on a single sheet. In the case of circuit fabrication, the
gains introduced with a third dimension justify the expense of developing the fabrication technology. However, in the case of circuit design, it is better to make optimal
use of familiar design techniques rather than impose new design methodologies with
associated learning curves. In addition, the costs of implementing a truly 3-D user
interface are prohibitive.
In FluidLayout, therefore, three-dimensional circuit layout is done by managing
an arbitrarily large set of individual two-dimensional layers. The layout of each layer
is performed in the same manner as a two-dimensional layout would be performed
in many existing software packages.
The consequences of this design decision are
twofold. First, efficient algorithms for 2-D layout have already been developed and
the corresponding source code may be reused. Thus, the core layout manipulation
routines do not have to be redeveloped. Second, it is desirable to implement many
whole-circuit algorithms such as netlist extraction and mask generation. These algorithms can be extended from 2-D to 3-D while maintaining their efficiency in terms
of order-of-growth as a function of the number of transistors in the design.
There are two stated goals to be achieved with FluidLayout. First, the user should
be able to manage a true three-dimensional circuit with an arbitrary number of layers.
17
Second, the design process for individual layers should be familiar to designers of
two-dimensional VLSI circuits. These goals have been taken into consideration at all
stages of the software design process of FluidLayout.
For example, in typical two-dimensional design formats, the representation of a
VLSI layout is encoded as a list of material regions. Each region contains data that
identifies the type of material (e.g. polysilicon) and the boundary coordinates of the
material (e.g. the corners of a rectangle). The VLSI layout may then be stored as a
file containing a list of regions.
Thus, if an existing software tool uses this format as its native format, extension
of this software for use in designing 3-D circuits becomes difficult; there is no means
for differentiating the regions in the data file with respect to their locations along
the third dimension if the coordinates used are 2-D coordinates. However, there are
several remedies of varying efficacy.
The first is to differentiate the material types by layer. For example, polysilicon on
the first layer of transistors might be assigned material type poly-1 while polysilicon
on the second layer might be assigned type poly-2. Since many software packages
have support for adding or changing material types, this approach is straightforward
to implement. However, the approach also has several drawbacks. For example, the
user must define material types for each layer of transistors, a process that becomes
tedious as the number of layers grows. Second, since the user interface is not 3-Daware (i.e. not cognizant of the fact that poly-2 corresponds to a different layer than
poly-1), all transistor layers will be displayed simultaneously. While this may be
acceptable for two layers of transistors, it becomes unmanageable for more.
Another approach to handling 3-D circuits in existing software packages is to
manage each third-dimension layer as a separate circuit.
Small helper programs
may be written to perform the inter-layer registration, circuit netlist extraction, and
preparation of fabrication-ready output. While this approach is more sophisticated
than the previous approach, it has the drawback that the user has to run multiple
programs in order to obtain a working circuit; each program will have its associated
learning curve.
18
By contrast, FluidLayout organizes the material regions by layer. Rather than
store a VLSI layout as a collection of rectangles, FluidLayout stores a collection of
layers, where each layer is stored as a collection of rectangles. This is equivalent to
packing a collection of 2-D layout files into a single meta-file, and in fact, FluidLayout
has the means to import two-dimensional circuits (created in a traditional software
package) as individual layers of a three-dimensional circuit.
This approach has advantages over the others. First, a circuit in FluidLayout
may contain arbitrarily many layers. FluidLayout provides easy means to add layers
to a circuit while at the same time obviating the task of managing as many files
as there are layers. Second, since FluidLayout is aware of the three-dimensionality
of its circuits, the user interface can display the individual layers separately while
simultaneously being able to indicate inter-layer interconnections.
This display interface is another area of FluidLayout where careful consideration
was made of the 3-D nature of the circuits. The user must be able to manage all
the layers of a circuit without having to view them simultaneously. Similarly, when
laying out circuits, it must be clear both to the user and to FluidLayout as to which
layer is the target of the user's instructions.
Two approaches to solving these problems were considered. One way is to allow
the user to set a "visible range" of materials, where the materials are ordered according
to their physical order in the technology. For example, the user might wish to view
all materials between the gate on layer 7 and metal 2 on layer 9. This allows the
user to manage the entire circuit within a single document window, and also permits
the user to view as little as one material or as much as the entire circuit all at once.
However, there are several issues with this design. For example, if the user elects to
view a range of materials that spans more than one layer, there is potential ambiguity
when the user decides to place certain materials. For example, if the visible range
encompasses gate material on layers 2 and 3, and the user wants to place a new gate,
it is not clear to the interface whether this is a 2nd-layer gate or a 3rd-layer gate. A
similar manifestation of this problem is that in the same situation, it is difficult to
distinguish visually the two gate layers that are being viewed simultaneously. The
19
only corrective means is to restrict the visible range to at most one material of any
given type. However, this then prohibits the user from viewing and editing different
layers of the circuit simultaneously.
The second approach is to maintain a separate view window for each layer of the
circuit. Each view may then be treated exactly as a two-dimensional circuit. The only
additional function to be performed is to manage the inter-layer interconnect, which
can be done by message-passing between the views. This approach has the drawback
that the view windows number as many as the layers, meaning that simultaneous
editing of more than a few layers is prohibitively complicated. However, each layer
may be edited without ambiguity, and with the assumption that the user will not
want to edit more than three or so layers at a time, this option becomes the more
desirable choice for implementation.
2.2
Implementation
FluidLayout was written in Microsoft Visual C++ (version 5.0) for the Microsoft
Windows operating systems.
In the graphical user interface (GUI), a three-dimensional integrated circuit is
represented as an ordered set of 2-D slices. Each slice may be manipulated as an
individual 2-D circuit. The representation of a slice in the GUI is the traditional
mask representation, i.e. a top-down viewpoint with metals and semiconductor represented as colored rectangular paths on a Manhattan grid. Thus, the manipulation
of individual slices should be familiar to those experienced in 2-D integrated-circuit
design.
A circuit layout is stored internally in FluidLayout as a CCellDef object.
A
CCellDef object contains several 2-D circuit slices implemented as sets of CLayer
objects. Each CLayer object consists of a set of CRectangle objects that represent
the 2-D circuit slice materials. Additionally, a circuit layout may contain discrete
layouts within it as subcells of the layout; these are referenced via CRectangle objects
of type CELL. Thus, the designer can maintain a hierarchy of CCellDef objects that
20
represents designs at various levels of integration, and any given CCellDef object
may be used as a subcell of another CCellDef object.
The CCellDef methods are mainly used for editing the circuit layout. CCellDef
has methods for adding rectangles and subcells to the layout and erasing rectangles and subcells from the layout. There are also methods for producing copies of
individual slices with or without the subcell contents flattened into the slice.
Most of the layout manipulation is done within the Clayer object.
2.2.1
2-D Slice Manipulation
The design interface for an individual layer is modelled after that of the Magic VLSI
CAD system [15, 12]. Magic is a geometric box-painting tool that has algorithms for
interpreting box paintings as integrated-circuit layouts. A circuit layout is represented
as a set of colored rectangles in a 2-D coordinate system; the core algorithm in Magic
is thus an efficient means of rectangle manipulation [14].
Within FluidLayout, each 2-D slice of a 3-D circuit is represented as a collection
of CLayer objects. The different mask layers for each slice of a circuit (e.g. gate,
source/drain, metals, vias) are partitioned among different CLayer objects, with one
CLayer for all metals and semiconductor, one CLayer for each type of via, and one
CLayer for each type of subcell. This partitioning is done to maximize the efficiency
of top-level algorithms such as rendering the layout in the GUI and extracting the
circuit netlist, while not consuming excessive amounts of memory in overhead.
The CLayer object, depicted in Figure 2-1, contains a pointer to a CRectangle
object. Each CRectangle object contains its integer coordinates, its material type
(e.g. gate, source/drain, semiconductor, or via) and pointers to CRectangle objects
adjacent at the upper-right and lower-left corners. Thus, a CLayer object may be
thought of as a collection of disjoint rectangles that tiles a plane. Further, there are
efficient algorithms to traverse the plane from any particular starting point to a given
finish point and to iterate through all rectangles within a bounding rectangle [14].
These algorithms have been implemented in the Magic source code [15, 12] and are
readily implemented in FluidLayout.
21
CLayerobject
U
I
U
Figure 2-1: CLayer object with embedded CRectangle objects.
22
l
top
right
left
bottom
Figure 2-2: Corner-stitched CRectangle object.
Specifically, FluidLayout represents a VLSI layout as a set of corner-stitched rectangles, as discussed in
[14]. Figure 2-2 illustrates the corner-stitching of a rectangle.
This corner-stitching allows for linear-time searching of a CLayer object and lineartime area enumeration.
To do this, each CRectangle object has a GotoPoint method. Given a CRectangle
R and a destination point in the CLayer containing R, GotoPoint follows the top
and bottom pointers to reach the desired ordinate and then follows the left and
right pointers to reach the desired abscissa. Since following left
and right may
cause deviation from the desired ordinate, this procedure must be iterated until the
rectangle at the destination point is found. However, since the stitched objects are
convex, the algorithm is guaranteed to terminate
[14].
Similarly, each CLayer object has a Paint method derived from
[14]. Paint al-
lows the caller to paint a rectangular region of the layer. All CRectangle objects that
intersect this rectangular region are clipped against it, and the material types of the
resulting pieces are adjusted to perform the painting. The enumeration of rectangles
within the clipping region is done in linear time by the following procedure: by following down and right pointers from the upper-leftmost rectangle, the rectangles along
23
2
I
3
1
6
5
I
7
4
8
9
Figure 2-3: Area enumeration of CRectangle objects within a bounding rectangle.
the left edge of the clipping rectangle may be identified. For each of these, horizontal
swaths of the clipping region may be enumerated by following right pointers. A
sample area enumeration is shown in Figure 2-3.
The relevant VLSI algorithms can be expressed in terms of searches and area enumerations and are thus carried out efficiently in FluidLayout. For example, placement
of a wire is done by selecting the rectangular areas where metal is desired and calling
the Paint method on those areas. Viewing the circuit in the GUI is done by enumerating the rectangles within the view rectangle. For each enumeration, Windows
drawing methods are called to render the rectangle.
All that remains for implementation is the interconnection of distinct 2-D layers
into a 3-D circuit.
2.2.2
Circuit Partitioniing
There are many possible approaches to the problem of partitioning 3-D circuits among
2-D slices. From a design tools standpoint, the crux of the issue is that in 2-D design,
24
new fabrication technologies require extensive modification to the technology support
in the software. For example, suppose that a 2-D design package has support for a twopoly, three-metal CMOS process. If the technology for four metal layers is developed,
the design package must be modified to accommodate the new metal layer, both in
the internal representation and in the GUI. While the internal representation may
be readily modifiable or may already support arbitrary technologies, modifying or
extending the GUI is nontrivial, and in fact, 2-D design packages generally do not
have support in the GUI for arbitrarily many material layers.
On the other hand, FluidLayout can support arbitrarily many material layers
without modification of the 2-D slice object structure and without extensions to the
GUI. This is accomplished by assigning a finite number of material layers to each 2-D
slice and alloying the number of slices to vary arbitrarily. Each 2-D slice is assigned
gate, source/drain, and semiconductor material as well as two metal layers and all
inter-layer vias. The slice is also provided with vias from the top metal layer to the
gate layer of the next slice, thus forming the inter-slice interconnect. This material
set is necessary and sufficient to create arbitrary circuits within an individual 2-D
slice.
Arbitrary technology mappings can then be implemented by ignoring portions of
slices as necessary. For example, a 3-D process with six interconnect layers per slice
may be implemented by pairing adjacent CLayer objects and ignoring the semiconductor material on the second CLayer of each pair. Also, a 2-D process with arbitrarily
many metal layers can be implemented by ignoring all semiconductor material except
on the lowest slice.
The canonical technology in FluidLayout is centered around a bottom-gate thinfilm transistor structure [21]. Each layer of transistors is thus represented as a set of
TFTs along with two metal layers. This results in the technology shown in Figure 2-4.
With the implementation framework thus described, FluidLayout is able to perform various system-level procedures, such as circuit verification through netlist extraction and circuit fabrication.
25
gate (next layer)
q metal2
6
metall
semiconductor
~Tsource/drain
gate
Figure 2-4: Canonical technology used in FluidLayout.
2.3
Circuit Verification
One of the main advantages of having CAD software with integrated 3-D capabilities
is that the software can perform tasks such as circuit netlist extraction. FluidLayout
is able to extract connectivity information from 3-D integrated-circuit layouts.
In order to perform this netlist extraction, FluidLayout separates a given layout
into planes, where each plane contains a given material (gate, source/drain, metall,
or metal2) and the vias that connect that material to the next higher material along
the third dimension. Then, FluidLayout enumerates all the rectangles in each plane,
starting with the lowest. Each rectangle is checked to see if it belongs to a previouslydefined electrical node.
FluidLayout then checks adjacencies to determine if two
nodes have been assigned to a single wire, and if so, merges the nodes.
Finally, if
the rectangle is not assigned to a node, and is not adjacent to a node, a new node is
created. Further, if the rectangle is part of a via, the corresponding rectangle on the
next plane up is marked and added to the node. This allows the extraction procedure
to maintain electrical connectivity along the third dimension.
In FluidLayout, a CWire object is used to identify a node. Each CWire object
contains a list of pointers to the CRectangle objects associated with the node. Each
CRectangle object has a generic pointer that is used in netlist extraction to point to
the CWire object to which the rectangle belongs. Thus, when the area enumeration
is complete, all electrical nodes in the circuit have been identified, and each rectangle
is able to identify the node to which it belongs.
Once this is complete, FluidLayout extracts the semiconductor planes from the
26
layout.
The semiconductor rectangles are enumerated, and surrounding gate and
source/drain rectangles are identified. Each valid combination of gate, source, and
drain is used to construct a CTransistor object that identifies the electrical nodes for
the gate, source, and drain and the width and length of the transistor channel. The
list of CTransistor objects is then written to a text file using the standard SPICE
format for MOSFETs.
2.4
Circuit Fabrication
One motivation for writing FluidLayout is to have the ability to support any fabrication technologies that emerge in the laboratory. For example, it is desirable to be
able to target designs for an inkjet nanoparticle MEMS process [7] or a VLSI/MEMS
liquid embossing process [16], without having to modify the layout.
Support in FluidLayout for these facilities is provided by integrating methods that
handle the circuit extraction into the document class. For each fabrication process,
different file formats must be exported to support the fabrication. For example, the
target inkjet process is a 3-D gantry system that is computer-controlled and requires
G-code. The process prints materials in the same way that an inkjet printer prints
rectangles, so in FluidLayout, there are methods to extract the separate material
layers and raster the individual rectangles. Similarly, the liquid embossing process
uses an elastomeric stamp to separate a film of solution into desired and undesired
regions. A rectangular wire is thus created using a stamp whose raised surface is
the outline of the rectangle. When pressed onto a uniform film, the stamp then
drives away liquid corresponding to the wire outline. These stamps are created from
wafers, which are created using lithographic masks. These masks are specified using
the GDSII binary file format, so FluidLayout has methods for converting a set of
rectangles to their outlines and writing these outline rectangles to GDSII binary
output.
27
Place source/drain metal
Figure 2-5: Box-outlining is used to place materials in FluidLayout.
2.5
Design Walk-Through
To demonstrate the use and capabilities of the FluidLayout software, FluidLayout
is used here to lay out an inverter. A more complete walk-through is available in
Appendix A.
Figure 2-5 shows the use of box outlines, drawn by left-clicking at the lower left
and right clicking at the upper right, to place materials in the layout.
By using the materials toolbar, layout of an n-channel MOSFET is completed on
the first layer, as shown in Figure 2-6.
Interconnect to layer 2 is done through a via from the top metal on layer 1 (i.e.,
metal2) to the gate metal on layer2. As shown in Figure 2-7, the via is placed on
layer 1. FluidLayout marks a corresponding via on the second layer, as can be seen
in the view window for the second layer.
A p-channel MOSFET is laid out on the second layer, and the complete inverter
is shown in Figure 2-8.
To verify that the layout corresponds to an inverter, the SPICE deck extraction
28
Figure 2-6: Complete NMOS pulldown path.
Ready
|X:14, Y: 20
Figure 2-7: Placement of a metal2-+gate via results in a gate-+metal2 hint on the
second layer.
29
Figure 2-8: The complete inverter.
feature is used. The feature size is set to 5 microns per unit grid length (lambda).
The following is the circuit netlist produced:
C:\WINNT\Profiles\shamikd\Desktop\inverter. sp
***** Created by FluidLayout
***** Created on 5/1/2000
*****
M1 2 3 GND! 0 NTFT W=15u L=10u
M2 2 3 Vdd! Vdd! PTFT W=15u L=10u
This SPICE deck may then be used for verification of the layout.
A more comprehensive guide to using FluidLayout can be found in Appendix A.
As shown here, FluidLayout is a useful software CAD system for laying out and
fabricating 3-D circuits. These capabilities will now be demonstrated with several
test circuits.
30
Chapter 3
Some Basic Transistor Circuits
The immediate application of FluidLayout is in targetting simple, commonly-known
circuits for an emerging three-dimensional fabrication technology. This allows both
for testing the functionality of FluidLayout and for exploring the viability of the
technology.
3.1
Minimum Criteria for the Technology
A new transistor technology is viable for computation only if suitable multi-transistor
devices can be fabricated. In particular, it is possible for transistors to provide nonlinear input-output behavior, yet still be unsuitable for multi-transistor circuitry.
There are several criteria that need to be met. These criteria are evaluated within
the context of the metal-insulator-semiconductor field-effect technology discussed in
[17, 16].
Consider, for example, the NMOS inverter in Figure 3-1. The desired function of
this inverter is to take the signal represented by in and produce the logical negation
of that signal at out. Represented using voltages, therefore, if V1 is below VM,
then V 0 t should be above VM, and vice versa, where VM is some midpoint voltage
between Vdd and ground. To verify that this circuit produces the desired behavior,
the characteristics of the individual transistors must be examined.
31
Vdd
(W/L) 1
out
in
(W/L) 2
Figure 3-1: NMOS inverter.
The governing I - V relationships for the n-channel FET are given as follows:
(L)
ID
for VGS
-
VTn
VDS
- VTn < VDS
V
DS
DSn
(the linear regime) and
ID,SAT
for VGS
((Vn~ins
GS
=
I AnCins
(+)
(VGS -
Tn ) 2 (1
+ AnVDS)
(the saturation regime), where pn is the field-effect mobility, Cin,
is the gate insulator capacitance, W/L is the transistor channel width-to-length ratio,
VGS
is the gate-source voltage,
VTn
is the threshold voltage,
VDS
is the drain-source
voltage, and An is the channel-length modulation parameter [9].
It is clear, then, that this type of circuit produces the desired operation: for a
low input voltage, transistor 2 is turned off and transistor 1 pulls the output voltage
high, and for a high input voltage, transistor 2 is turned on, thus pulling the output
voltage low, provided transistor 2 is stronger than transistor 1. Transistor 2 is thus
called a pulldown, while transistor 1 is called a pullup.
As important as functionality, however, is the ability of the inverter to restore
logic levels. That is, while a 0 is represented ideally by 0 volts and a 1 is represented
ideally by
Vdd,
in actuality, this is not necessarily the case. However, a functioning
32
9 M1
Vout
r0
r
0~ vf
0
v
Figure 3-2: NMOS inverter small-signal model about VM.
logic gate should to some extent recognize and accommodate these deviations from
ideality. Consider, for example, a series of cascaded inverters, each of which outputs
0 volts for an input of Vdd and vice versa. Suppose the input to the first is slightly less
than VXd, say Vdd - AVi.
The output of this inverter will then be greater than 0 volts
by some amount, say AVst. If the inverter restores logic levels, then AV,0 t < AV".
If this is not the case, then as the signal passes through each inverter, the deviation
from the ideal will increase until the signal stabilizes at VM.
Level restoration follows if the gain of the inverter at VM is greater than one in
magnitude.
Consider an inverter whose output is Vdd minus the input. Then the
gain at VM is identically -1,
and deviations in the input are reflected exactly in the
output.
The gain of the inverter in Figure 3-1 at the midpoint voltage VM can be determined by examining the small-signal model of this inverter, shown in Figure 3-2.
From the model, it follows that the gain of the inverter is
yout
Vin
m2ro
_
2 + 9mlTo
33
where gm is the small-signal transconductance, defined as gm
=
2
pCims (EL)
and r, is the small-signal output resistance, defined as r, = (AID,SATY
1
[9].
ID,SAT,
Thus,
having a large transconductange and a large output resistance is critical to the performance of the inverter: consider the cases where gmir, is small or is large. If the
product gmiro is small compared to 1, then the gain is essentially -gm2ro/2, which is
also small. If, on the other hand, gmiro is sufficiently large, then the gain is approximately -gm2/gm1
=
g.
The transistor sizing is thus dictated by the need for
high gain.
The technology described in [17, 16] features transconductances on the order of
10-5
S and output resistances on the order of 106Q for a device with W = 292.5 pm
and L = 2 pm at a VM of about 10 volts. This indicates that a gain of greater than
unity is achievable with device sizes that currently can be fabricated.
3.2
Design of Basic Circuits
In order to test both the fabrication technology and the capabilities of FluidLayout,
some simple circuits are laid out using FluidLayout, and fabrication-ready output is
produced. The layout used here consists of an inverter, a NOR gate, a basic static
memory cell, and two ring oscillators.
The inverter implementation used in this layout is that in Figure 3-1. The device
sizes used are channel length of 10 pm, a channel width of 200 pm for the pullup
FET, and a channel width of 1200 pm for the pulldown FET. Using the above device
parameters, this should provide a gain of approximately -1.5.
The NOR gate uses two pulldown transistors wired in parallel. Each is identical
in size to the inverter pulldown.
The memory cell, discussed in detail in Chapter 4, uses a pair of coupled inverters
together with two access transistors.
Finally, the two ring oscillators are different layouts of the same circuit. This
circuit comprises three inverters wired in a series loop. Provided that the inverters
have sufficient gain, a latent signal on an input to one of the inverters is amplified to
34
Figure 3-3: Layout of test devices.
or to ground as it passes through the inverters. Further, when the signal returns
to its starting point, it does so as its inverse, so that the signal oscillates when viewed
at any fixed point. However, if the inverters do not have sufficient gain, the signal
Vdd
will decay to the midpoint voltage, VM. Thus, a ring oscillator is an ideal circuit to
test intrinsic device parameters.
Further, it is possible to examine different layout strategies with this multi-gate
circuit. In particular, the second ring oscillator, though electrically identical to the
first, is partitioned along the third dimension into three layers, with one inverter on
each layer. Power and ground signals are distributed through vias to the upper layers,
and the return signal from the output of the last inverter to the input of the first
inverter travels through a via stack located near the outputs.
35
.M
a
-
if
B
Figure 3-4: 3-D layout of a ring oscillator. The three inverters shown are stacked to
form the 3-D layout.
36
The first layer of the layout of the entire structure is shown in Figure 3-3. Figure 34 shows the 3-D layout of the ring oscillator, including the 2nd and 3rd layers of the
layout.
In order to verify the functionality of the devices in this structure, the SPICE
netlist is extracted using FluidLayout. 1
*****
C:\WINNT\Profiles\shamikd\Desktop\Maskl.sp
***** Created by FluidLayout
*****
***
Created on 5/1/2000
inverter
M1 invout inv_in GND! 0 NTFT W=1200u L=10u
M34 invout Vdd! Vdd! 0 NTFT W=200u L=10u
***
NOR gate
M2 norout norA GND! 0 NTFT W=1200u L=10u
M3 norout norB GND! 0 NTFT W=1200u L=10u
M35 norout Vdd! Vdd! 0 NTFT W=200u L=10u
*** SRAM cell (4,7 are internal bit storage nodes)
M4 7 sramWL sramBL 0 NTFT W=200u L=10u
M13 sramBLBAR sramWL 4 0 NTFT W=200u L=10u
M5 7 4 GND! 0 NTFT W=1150u L=10u
M6 GND! 7 4 0 NTFT W=1150u L=10u
M36 7 Vdd! Vdd! 0 NTFT W=150u L=10u
M37 Vdd! Vdd! 4 0 NTFT W=150u L=10u
***
M14
M38
M15
M39
M16
M40
2-D ring oscillator (three inverters)
6 ring GND! 0 NTFT W=1200u L=10u
6 Vdd! Vdd! 0 NTFT W=200u L=10u
5 6 GND! 0 NTFT W=1200u L=10u
5 Vdd! Vdd! 0 NTFT W=200u L=10u
ring 5 GND! 0 NTFT W=1200u L=10u
ring Vdd! Vdd! 0 NTFT W=200u L=10u
'This SPICE deck has been edited for clarity. For example, with typical circuit layouts, the
area enumeration algorithm results in all the n-channel devices grouped together and all the pchannel devices grouped together. The SPICE deck shown here has the transistors grouped by
function. Also, since one of the features of the technology is the ability to route gate across source
or drain, the netlist extraction will sometimes output a transistor as a parallel combination of two or
more smaller transistors. This will allow for more accurate capacitance modelling once the relevant
parameters have been obtained from the technology. However, in the SPICE deck shown here,
parallel transistors have been merged.
37
***
3-D ring oscillator
M29 3 ring_3D GND! 0 NTFT W=1200u L=10u
M41 3 Vdd! Vdd! 0 NTFT W=200u L=10u
M42 2 3 GND! 0 NTFT W=1200u L=10u
M47 2 Vdd! Vdd! 0 NTFT W=200u L=10u
M48 ring_3D 2 GND! 0 NTFT W=1200u L=10u
M53 ring-3D Vdd! Vdd! 0 NTFT W=200u L=10u
Functional simulation can then be performed using this netlist to verify the performance of the circuits.
The circuit can now be extracted to output that can be used for fabrication. The
process in [17, 16] uses elastomeric liquid embossing to form circuit patterns. Each
material layer (gate, source/drain, etc.) is patterned using a unique part of the stamp.
This stamp is created using a wafer as a mold. Thus, FluidLayout is used to extract
the circuit to a GDSII binary stream that can be used to fabricate a wafer mask.
This mask pattern is shown in Figure 3-5.
From this mask, an elastomeric stamp is created. This stamp is used to pattern
solution-processed materials, which are then cured. The resulting structures form the
gate, source/drain, semiconductor, and interconnect for the circuits. For example,
Figure 3-6 shows the gate metal for the NOR gate, SRAM cell, and 2-D ring oscillator.
Figure 3-7 shows the source/drain metal for these circuits. Thus, FluidLayout is useful
both for laying out circuit structures and for fabrication of these circuits.
For the remainder of this thesis, some designs will be examined that utilize more of
the potential of FluidLayout. In particular, the focus is no longer on rapid prototyping
of three-dimensional integrated circuits, but instead on exploring the architectural
ramifications of being able to lay out circuits in 3-D.
38
l
laye r 13 a=eem
layer 3 pte-via
layer 3 gate
I
layer 2
somreMrain-via
ayvi2 nmtti2
layer 2nuttl-
t
lUavr 2 -> lApr 3 via
e-pe
layer 2 gate-via
ser%
ukto
r
I
i
layer 1
sourceldrain-via
layer 3 p-type
sermicqndutor
I111layer 2 metal
layer 2 gate
liarige3 -y
layer 1 metal l
layer I zstul 1 -ia
layer 1 retal 2
layer 1 source/dain
layer 1 n-type
smnicnductor
-layer I - layer.12 via
TI1i1'
I iL I
layer 1 gate
I
layer 1 gate-via
layer 1 p-type
semicondutor
IM-
I
Figure 3-5: Stamp pattern with spatially-separated material patterns.
39
-.
C
I
V
47
C
a
ii
B
Fl
H
0
a
4
I
4
Figure 3-6: Patterned gate for a NOR gate, SRAM cell, and ring oscillator.
40
2
-S0
4
1.r
41
~
K
*
It
~'
~____
-- k-a-
Iji
:
1:
9e*
~
VZY
K
-
-'I
"7
.0:
1'
Si
K'
AI4
0
I--,
p
_
-Yr
Figure 3-7: Patterned source/drain for a NOR gate, SRAM cell, and ring oscillator.
41
a
'I
Chapter 4
The Static Random-Access
Memory
The technology for fabricating true three-dimensional integrated circuits is very much
a new technology; it has yet to approximate tried-and-true 2-D fabrication in terms
of feature size and circuit speed. However, preliminary research ([17, 16]) suggests
that the fabrication of multilayer transistor circuits with transistor properties rivaling
that of conventional silicon MOSFETs is neither unreasonable as a research goal, nor
unrealistic as a commercial technology within the near future. This is one of the main
reasons that FluidLayout has been engineered to handle complex three-dimensional
integrated circuits while at the same time providing the file formats necessary for
rapid prototyping of basic circuits.
In order to demonstrate the capabilities of the FluidLayout software as well as
the benefits of the three-dimensional medium, FluidLayout has been used to design
a multilayer static random-access memory (SRAM). This implementation, shown in
Figure 4-5, is capable of statically storing 512 bits of data in a chip whose footprint
is the same as that of a 64-bit 2-D SRAM. Further, read/write power dissipation for
this chip is approximately that of the 64 bit 2-D SRAM as well.
It is helpful to examine this 3-D SRAM in the context of its 2-D counterpart.
42
4.1
Background and Motivation
A static random-access memory must provide certain features by virtue of its name.
Memory signifies that the circuit must have a read operation, and optionally, a write
operation, by which a user may store data via a write (for writable memories) and
retrieve the same data later, via a read. The memory may therefore be read-only
(ROM) or read-write.
Memories are further characterized by their access mechanism.
In a multibit
memory, the storage location of a particular bit or group of bits may be identified by an
address. The access mechanism may restrict the user to data retrieval from sequential
addresses; this is implemented in circuits such as the FIFO (first-in, first-out) and
the shift register. Alternatively, access to random addresses may be permitted. This
random-access memory generally requires more sophisticated control circuitry.
Finally, memories may be categorized according to the permanence of their storage. Non-volatile memories do not require external power to retain their contents.
For example, erasable, programmable read-only memories (EPROMs) implement the
write mechanism using special circuitry and/or voltages that are not accessible during
the normal read operation of the circuit. A specific, commonplace example of this
type of memory is the Flash ROM. By contrast, volatile memories lose the contents of
memory if the power supply is removed. Volatile memories may be further categorized
as static or dynamic depending on the mode of storage. In static memories, such as
SRAM, the circuit actively reinforces the value of the bit stored in memory. In order
to overwrite a bit, the write circuitry must be able to overcome the static protection
on the old bit in memory. Conversely, in dynamic memories, such as DRAM, bits are
stored using the capacitance of internal nodes (or using an explicit capacitor for each
bit). No circuitry protects the stored value; thus, the memory may be overwritten by
charging or discharging the capacitor. Since capacitors tend to discharge naturally
through leakage, dynamic memories must be refreshed periodically to prevent loss of
data. This refresh introduces an overhead that may be recouped due to the increased
storage density of bits in DRAM versus bits in SRAM.
43
Of the various types of memory, writable memories have seen the largest push for
increased density. There are multiple reasons for this: first, as computers become
faster and more sophisticated, the size of desirable computations grows, and thus
the need for computer memory grows also; increased DRAM density allows one to
increase the amount of main memory available to a computer, and increased SRAM
density allows computer architects to increase the amount of memory cache available
to a processor. Second, the advent of digital media has created a similar push in
the consumer goods market, as digital cameras store pictures on Flash ROMs, and
users of personal computers transfer media from laptops to palmtop organizers and
portable audio players using any number of memory-card interfaces.
Of these three types of memory (Flash ROM, DRAM, and SRAM), Flash ROM
and DRAM place the most emphasis on bit density, and thus would benefit most
from the area improvements that a three-dimensional process would bring. However,
both Flash ROM and DRAM utilize special fabrication processes. In the case of
Flash ROM, special transistor structures are used to provide the non-volatility, while
in DRAM, explicit capacitors are often used for the bit storage mechanism. On the
other hand, SRAM may be fabricated using the same technology as is used for logic,
which is why all embedded memory has been in the form of SRAM until quite recently.
It is for this reason that the layout of a three-dimensional SRAM is explored.
SRAM benefits from the third dimension as much as DRAM or Flash memory would;
and while the density of 3-D DRAM would exceed that of 3-D SRAM, the fabrication
technology for 3-D SRAM is expected to be more readily available.
4.2
SRAM operation
In SRAM of any dimensionality, the basic storage mechanism remains the same. The
core of bit storage in the SRAM is the coupled-inverter cell shown in Figure 4-1.
The bit may be thought of as being stored at node
Q.
Thus, in this circuit, the
coupled inverters provide positive feedback on the stored bit, ensuring that the bit
is not erased or overwritten unless so desired. As long as power is supplied to the
44
BL
BL
WL
M1
M2
Figure 4-1: Six-transistor circuit for individual bit storage.
inverters, the state of Q and
Q is
maintained; this provides the static functionality of
the memory cell [23].
Access to the cell contents is provided through access FETs MI and M2. To read
the cell contents, the word line WL is pulled high, turning on Mi and M2. The bit
lines BL and BL are then pulled to
Q and Q by Mi
and M2 respectively.
To write to the cell, the desired bit and its inverse are asserted on BL and BL
respectively. WL is then pulled high. Ml and M2 must then be able to reverse the
state of
Q and Q in
order to write to the cell.
Since each inverter comprises two transistors, the cell as a whole is referred to as a
6-T (six-transistor) cell. The cell may be compressed to five transistors by removing
MI or M2; however, this requires very careful design of the cell, as reversing the
internal state during a write is now more difficult.
An SRAM comprises any number of these cells arranged into an array of rows and
colums. All of the cells in a given row share the same word line, and all of the cells
in a given column share the same bit lines. This has two consequences: first, read
and write operations act on whole words at a time, where a word consists of however
many bits there are in a single row of the memory. Second, since columns share bit
lines, no more than one row at a time may be permitted access to the bit lines of the
memory. Thus, an SRAM must contain control circuitry that limits bit-line access to
45
a single row. This circuitry is therefore called row-select or row-decode circuitry.
Further, in practical SRAM design, the word size is often much smaller than
the desired size in words of the memory. A straightforward implementation would
thus result in a memory that is much taller than it is wide. However, packaging
constraints generally favor circuits that are square. The standard approach to solving
this problem is to split the memory into N columns, where N is chosen to make
the layout approximately square. A given row selection will thus address N words
simultaneously. To refine this selection to the one desired word, column-select or
column-decode circuitry is added to the memory.
Typically, then, an SRAM interface consists of A address bits and D data bits.
The A address bits are split into R row bits and A - R column bits. The memory
core of the SRAM thus consists of
2R
rows and D x 2 A-R columns of individual 6-T
cells.
4.3
Extensions to 3-D
A cursory examination of the two-dimensional SRAM structure shows that to first
order, a particular cell is addressed via a row (word) line and a column (bit) line. The
2-D memory is thus addressed by a row-column matrix. It seems natural to extend
this idea to three dimensions by incorporating a third dimension into the matrix
addressing. At the cell level, this is done by adding extra transistors to the circuit.
The addition of transistors M3 and M4, as shown in Figure 4-2, allows a thirddimension word line, WL2, to control access to the bit lines. However, an immediate
objection to the circuit in Figure 4-2 is that it requires two additional transistors
per bit of storage. This overhead could be prohibitively expensive in high-density
SRAMs.
Proposed solutions include making the 3-D SRAM cell single-ended by
removing transistors M2 and M4 and bit line BL. This results in a six-transistor cell
whose area is essentially that of the standard 2-D SRAM cell.
However, if the SRAM is to be extended to three dimensions, this must be done
in a way that considers the overall performance of the memory. In a conventional
46
BL
BL
WL2
WL
M3
M2
M1
M4
Figure 4-2: Eight-transistor circuit for use in 3-D SRAM.
SRAM, each row must be coupled to an active row decoder circuit, often implemented
as an AND of address bits, that drives the word line for that row.
This active
decoder is necessary to prevent contention for the bit lines, which would happen if
two word lines were simultaneously high. (The columns, by contrast, may be decoded
using pass transistors since it is permissible to leave the bit lines in an unknown or
disconnected state.) Therefore, using the 3-D matrix-of-word-lines approach to 3-D
SRAM, there are necessarily more word lines to drive. In the single-ended SRAM
described above, for example, a second set of word lines (running perpendicular to
the first set) is needed. Thus, in addition to the decoder circuit for each row, this
SRAM implementation requires an active decoder for each column.
In addition, there are power considerations in choosing how to address the cells in
the SRAM matrix. The bit lines are usually high-capacitance wires whose switching
consumes a good deal of power. It is desirable to minimize the number and length
of these bit lines, and to avoid switching the bit lines if possible. For an N-bit data
bus, it would be optimal to have exactly 2N bit lines (N if the memory is singleended). However, two considerations interfere as the number of words in the SRAM
grows: first, with the number of bit lines fixed, the length of the bit lines grows with
the number of words, and thus the switching time and power consumption increase.
Second, as the number of words grows, the aspect ratio (width-to-length ratio) of the
47
N-K ''
col. decode
K
Figure 4-3: Proper cell distribution improves aspect ratio and decreases bit-line
length.
SRAM deviates further from 1:1, which is the square footprint that is most desirable
for packaging. For high-density memory, having multiple sets of bit lines is a necessity.
For example, a commonplace 2Kx8 (16384-bit) two-dimensional SRAM may be
partitioned into 2048 rows and 8 columns. This would require only 16 bit lines, but
these wires would be prohibitively long, and the memory would be many times taller
than it would be wide. A better packaging scheme would be to have 128 rows and
128 columns and to use a 16:1 decoder to provide 8 bits of output. While this means
that potentially 16 times as many bit lines could switch simultaneously, the switched
capacitance is reduced by the same factor (since the length of the lines has been
reduced).
While the power consumption of the latter configuration is thus essentially the
same as that of the former, it is possible to utilize the benefits of the former to
reduce the power consumption of the latter. Specifically, the main benefit of the
2048x8 configuration is that for any given access, this configuration activates the
fewest number of memory cells (in this case, eight). The 128x 128 approach activates
128 memory cells and then selects the relevant eight bits of those. The reason for this
is that the 16:1 decoder can be implemented more compactly using pass transistors
that come after the memory-cell activation, thus trading off chip area for power
48
consumption. If, instead, the decoder were implemented at the cell level, only eight
cells would need to be activated. This could be coupled with a redundant 16:1 decoder
so as not to increase the length of bit lines. (The in-cell decoder eliminates the need
for an external column decoder; the bit lines may be wired together in bundles of 16.
However, this increases the bit-line capacitance.)
In two-dimensional memories, in-cell decoding results in a number of trade-off
choices, as integrating another decoder into the cell increases the cell size and decreases the bit density. The issue is that the word lines are used to activate cells, and
in the 128x128 cell example, a single word line activates 128 cells. In-cell decoding
is thus just the use of an orthogonal set of word lines to select a subset of the 128
cells for activation. If, instead, parts of a word line could be independently tri-stated,
the 16:1 decoding could be done without increasing the cell size. For example, a
128-cell word line could be separated into 16 blocks of eight cells. When the address
decoder drives a word line high, block decoders could be used to select which of the
16 blocks receives the word line. The other 15 blocks would remain at their previous
state (possibly active). The previously redundant 16:1 decoder would then be used
to ensure that only the currently active cells have access to the data bus.
The difficulty lies in implementing this tri-state scheme in a two-dimensional memory. The motivation for using word lines and bit lines is that the memory cells are
self-wiring and the word line control is simple. Further, and more importantly, the
visibility of signals on the word lines is sequential. That is, the memory cells on a
single word line receive signals on that line in order from closest to the decoder to
farthest away. If any signal on a word line is not meant for a particular cell, there
must be a "pass-through" mechanism for the cell so that the signal may be transmitted to cells further down the chain. This mechanism requires that there be two word
lines for each row - a master word line that traverses the entire row, and a per-block
slave word line that is wired to the master through a pass transistor (which serves
as the tri-state switch). This additional complexity results in increased chip area or
diminished functionality (e.g. by sacrificing bit lines to recover area for the tri-state
transistors).
Further, the power requirements increase by a significant proportion
49
row of cells in a
2-D memory
master word line
++++
slave wo rd line
access transistor
two stacked rows in a
emo
3-Df
y1
Figure 4-4: 3-D partitioning of rows allows for simple tri-stating of word lines.
since the master word line must be driven in both cases.
While this approach is complicated for two-dimensional memories, in three-dimensional
memories, the resulting tree structure of master and slave word lines can be geometrically rearranged for a 3-D memory that is efficient in both area and power. As shown
in Figure 4-4, the blocks that share slave word lines may be treated as rows in a 3-D
SRAM. The tri-state circuitry may therefore be stacked along the third dimension,
thus reducing the master word line to a series of stacked vias. The total length of the
word lines is thus reduced back to essentially that of the original 2-D SRAM without
tri-stating. In addition, the power dissipation is reduced by a factor asymptotically
equal to the number of layers relative to a 2-D SRAM of the same capacity. This may
be formalized as follows: suppose a 2-D SRAM has been partitioned into MV/7 rows
and NvY columns; further assume that the SRAM cells are of unit length and width.
The word lines are then of length N/7 while the bit lines are of length MVI. The
SRAM is now to be repackaged into a 3-D SRAM with L layers, via the following
procedure. The 2-D SRAM is first repackaged into M rows and NL columns, with a
L:1 column decoder. This reduces the length of the bit lines to M; however, as vIE
times as many bit lines are switching, no power savings are realized thus far. In fact,
50
the word lines are now longer by a factor of vi; to counteract this effect, the rows
are then partitioned into L blocks of N cells each, and each block is assigned a slave
word line that is fed from the master word line and tri-stated. (It is noted for the
time being that as stated before, this configuration still dissipates appreciably more
power in the 2-D case.) To construct the 3-D SRAM, each block of N cells in a row
is assigned to one of L layers along the third dimension, as illustrated in Figure 4-4.
This restores the original aspect ratio of M : N. The repackaging of the 2-D SRAM
into three dimensions is complete. The worst-case switched capacitance (two word
lines and all bit lines on a layer) is reduced from 2N L + 2MNL = 2NVI( 1+ M L)
to 2N + 2MN = 2N(+
M).
This is therefore the approach that will be taken in laying out a 3-D SRAM, which
is explored in the following section.
4.4
Layout of a 3-D SRAM
The first and second layers of a 512-bit 3-D SRAM are shown in Figure 4-5. The
design of this SRAM is based on the concepts outlined above.
The memory cell used in the 3-D SRAM is a standard 6-T cell, shown in Figure 46. This means that volume utilization is asymptotically equal to the area utilization
of a 2-D SRAM of the same capacity.
In order to obtain the power savings available to the 3-D SRAM, each of eight
layers of the SRAM uses three address bits as a "layer select." That is, three address
bits are used to provide the tri-state signal for the entire layer. If a given layer is
not selected, all word lines and bit lines on that layer are disconnected from the row
decoders and bit-line drivers (located on the first layer). The row decoder outputs
and individual bits on the data bus are thus actually tri-stated buses wired along
the third dimension. Figure 4-7 shows the connection of a row decoder to the word
line through a pass transistor. Figure 4-8 shows the connection of bit lines to the
tri-stated buses making up the data lines.
Thus, the SRAM effectively acts as eight 2-D SRAMs whose I/O pins are wired
51
if"'
%M
JI,
26R1_%,i 6.:- 6;-.%.-%
IWO.
Ot lop L'
1 %Z
-Vw
'all a
47,lir
0
"6 it
46
tfka
II
maqup 7: n77uMA
El
U
go PIP q
--ir
U
LI
t4'
Figure 4-5: First and second layers of an eight-layer 3-D SRAM.
52
Figure 4-6: 6-T SRAM cell layout.
tri-state signal
to
-
row decoder
access FET
Figure 4-7: Word-line tri-stating.
53
word lmie
_
Wii-state si'gnal
~,,bit-hne
write -enable
Figure 4-8: Bit-line decoding using the word-line tri-state control signal.
together, with the important difference that the row decoders and bit-line drivers are
reused for all eight layers. Like a stack of 2-D SRAM chips soldered together, only
one layer consumes power for any single operation. The power consumption of the
512-bit 3-D SRAM shown here is thus essentially that of a single 64-bit layer. The
circuit area, as can be seen in Figure 4-5, is little more than that of a 2-D SRAM with
an 8:64 aspect ratio, as the seven upper layers are dedicated to storage cells. The
area overhead incurred by the 3-D implementation is in the form of access transistors
that are clearly visible in the layout of the second layer; this overhead is not due to
the three-dimensionality of the architecture, but instead a tradeoff for maintaining
low power consumption as described above. Finally, while speed has not been the
primary consideration here, care has been taken not to sacrifice performance in this
regard. All word and bit lines are strictly shorter in the 3-D case, and the layer select
computation is done in parallel with the row decoding. At any rate, the use of sense
amplifiers would greatly reduce the dependency on bit line speed, and consequently
diminish any speed advantage presented by a 3-D SRAM with this architecture.
It is clear, then, that memory architectures benefit directly from implementation
in a three-dimensional medium. It is useful to consider now the suitability of this
medium for computation.
54
Chapter 5
The Cellular-Automata Machine
Ever since the advent of integrated-circuit technology, general-purpose computation
has been one of its main applications. Today, about half the world semiconductor
market is in computer chips. Further, research into novel computational architectures
is highly active. Many of these architectures could benefit from integration in a 3-D
technology; one such architecture is the cellular automata machine.
5.1
Background
Much of computational theory is centered around what are called finite automata.
In particular, interest is centered here on a particular class called deterministic finite
automata (DFAs), also known as finite-state machines (FSMs).
5.1.1
Finite-State Machines
A finite-state machine is a quintuple (Q, E, 6, qO, F).
Q is a finite set of states in which
the machine can be. E is a finite alphabet from which the inputs to the machine are
taken. 6 is the transition map, defined as a function 6 :
Qx
E --+ Q that maps a
combination of the current state of the machine and the current input to the next
state of the machine. qO is the initial state of the machine, and F C Q is the set of
accepted final states.
55
inputs
0
combinational
0
logic
1
0
register
current state
0
A. Finite-State Machine
B. Hardware Implementation
Figure 5-1: Finite state machine.
The finite-state machine, depicted in Figure 5-1, is thus designed for synchronous
operation. The FSM is set to state qo and an input is provided. At each tick of a
global clock, 6 determines the next state, and the current input is discarded for the
next input. If at the end of the input stream, the state of the machine is in F, then
the machine is said to accept the input.
The FSM admits a straightforward hardware implementation, also shown in Figure 5-1. FSMs are thus used for many types of control hardware.
More important, however, is a modification of an FSM called a Turing Machine.
A Turing machine comprises an FSM and an infinite tape. At each tick of a global
clock, the FSM reads from the tape (its input) and decides (1) its next state, (2)
an output to the tape, and (3) whether to move left or right on the tape. A Turing
machine can thus be defined by a septuple (Q, E, F, 6, qO, B, F).
Q is
again the set of
states of the FSM. F is the finite tape alphabet. B is a special character, the "blank."
E C F - {B} is then the input alphabet. 6 : Q x F
-
Qx
F x {left, right} is then
the transition function for the Turing machine. F is again the set of accepted final
states.
Turing machines are important for general-purpose computation because for the
56
Finite-State
Machine
Read/Write Head
#I#I#I#0
I1 l0l11111101011#1#1##
I#I.l##
Infinite Tape
Finite Alphabet (0,1,#,...,],/,@)
Figure 5-2: Turing machine.
appropriate choice of parameters, there exists a Turing machine that can take as
input a specification of an arbitrary Turing machine and an input to that machine,
and simulate the behavior of the specified machine on that input. The simulator is
thus called a Universal Turing Machine (UTM). It has been shown that a UTM is
capable of solving a vast class of useful problems [19].
In particular, a UTM is computationally equivalent to modern general-purpose
processors, in that a UTM can simulate the behavior of any of these processors, and
vice versa. Therefore, any architecture that is equivalent to a UTM is suitable for
general-purpose computation. Since certain cellular-automata machines can be shown
to be equivalent to a UTM, it is worthwhile to explore this architecture further.'
5.1.2
Cellular-Automata Machines
The cellular-automata machine (CAM) is a variant on the finite-state machine much
in the same way a Turing machine varies from the FSM: the CAM adds unbounded
memory to the system. The chief difference is that in a Turing machine, one pro'A more in-depth review of these concepts is available in many texts, such as [19].
57
FSM
FSM
state
state
FSM
FSM
Figure 5-3: Four cells of a cellular-automata machine.
cessing element has access to the unbounded memory, but in a CAM, the memory is
distributed as finite chunks among an infinite array of processing elements.
Specifically, a CAM is a regular array of cells, each of which is an FSM. For any
given cell, the input to the cell is a set of states of other cells; the set of cells whose
states are polled is called the neighborhoodof the given cell. CAMs generally also have
the property of uniformity; that is, each cell has the same FSM and the topology of
the neighborhood is invariant from cell to cell. It is, therefore, only the input (i.e. the
initial configuration of states) that varies from program to program, as implemented
on a CAM.
Part of a hypothetical CAM is thus shown in Figure 5-3. This CAM features a
square array and exhibits a common neighborhood known as a Moore neighborhood,
which contains all cells in the immediate "square". Thus, in this CAM, each cell has
eight neighbors.
Unlike UTMs, CAM architectures are highly variable, in that there are many
choices for the cell topology and for the neighborhood.
For example, a hexagonal
grid is used for many problems in physics. The CAM shown in Figure 5-3 uses a
58
2-D mesh. As is no doubt evident, the topology to be considered here is the 3-D
rectangular mesh.
This topology is suitable for a number of reasons. First and foremost, it has been
shown that there exist CAMs using 2-D and 3-D meshes that are equivalent to a
UTM [20]. Secondly, it admits a more efficient layout in three dimensions than in
two.
To demonstrate these efficiencies, a sample CAM is laid out that executes the
Game of Life in three dimensions.
5.1.3
The Game of Life
The Game of Life, as devised by John Conway [6], is a simple specification for a
two-dimensional cellular-automata machine that exhibits a variety of interesting behaviors. The cells are arranged in a Moore configuration so that each cell talks to
its eight nearest neighbors. At any given time, each cell is in one of two states, alive
or dead. The transition rule for a cell is as follows: a living cell remains alive if and
only if it has exactly two or three living neighbors (the environment condition), and
a dead cells becomes alive if and only if it has three living neighbors (the fertility
condition).
More formally, a candidate Game is specified by the quadruple (El, E", F, F"),
where a living cell remains alive if and only if it has m living neighbors with E <
m K E, and a dead cell becomes alive if and only if it has n living neighbors with
F < n K F,. The Conway Game of Life is thus (2, 3,3,3), hereafter termed Life 2333
[1].
Life 2333 exhibits some important behaviors. For example, there exist cell configurations that are stable (i.e. that maintain the same state indefinitely).
configurations oscillate (i.e.
cycle through a finite set of states).
Other
Further, some
configurations, called gliders, translate across the plane. Finally, Life 2333 has configurations called glider guns, which are stationary configurations that emit gliders
at regular intervals.
It has been shown that using these glider guns, Life 2333 can be made to emulate
59
Boolean logic, and thus is equivalent to a UTM [6]. Fabrication of a Game of Life
CAM is thus potentially more interesting than fabrication of some special-purpose
machines.
All that remains is to extend Life to three dimensions. The main difficulty is
that a Moore neighborhood on a 3-D mesh consists of 26 cells. Thus, the range of
candidate Games is greatly increased over the 2-D case. However, there do exist
several candidate Games in three dimensions. Two of these are Life 4555 and Life
5766. While Life 4555 exhibits more prolific behaviors, Life 5766 has the interesting
property that many special configurations in Life 2333 are extensible to Life 5766.
In fact, Life 5766 is provably most analogous to Life 2333. However, it is not known
whether a glider gun exists for either three-dimensional game [1].
Nonetheless, Life 5766 is used as the CAM to be laid out, as a wealth of configurations are already known.
5.2
Layout of a 3-D Game of Life
The layout of a 3-D cellular-automata machine consists of two basic steps. First,
the individual cell must be laid out; second, the cells must be wired together, and
global signals must be distributed. It is in this second phase where layout efficiencies
will be examined. This is for several reasons: primarily, this is due to the fact that
the cell contents are highly variable from CAM to CAM; thus, optimizations for one
particular CAM may not generalize. Secondarily, the motivation behind the CAM is
towards simple cell design, and the design of simple FSMs is not terribly interesting.
Nevertheless, the cell design will be discussed.
5.2.1
Game of Life Cell
The cell behavior in the Game of Life, and in Life 5766 in particular, is easy to
specify in a rule-based form. This means that Life is easy to implement in software.
However, Life does not admit a simple expression in hardware. The hardware implementation in three dimensions is confounded by the fact that the neighborhood size is
60
26 cells as opposed to 8 in the two-dimensional case. Thus, neither a straightforward
combinational-logic approach nor a lookup-table approach is feasible.
The standard approach to Life-type problems in hardware is to emulate the software mechanism. In software implementations, the cell states are stored in memory
and an accumulator is used to count the number of live cells in the neighborhood. A
hardware implementation could then consist of a number of adders.
However, a neighborhood of 26 cells implies 25 additions, some of which are necessarily five bits wide. The resulting logic structure would both be unnecessarily large
and compute unnecessary information. The only way to implement such a cell using
logic or arithmetic is through hardware reuse; that is, computation of the state must
be done over several clock cycles, where different inputs are considered on each cycle.
By contrast, an area-efficient single-cycle implementation must recognize that permutations in the input data are irrelevant and that only two result bits are necessary to
compute the output (these being whether El < n < E, and whether F < n < F).
Combinational logic is thus inefficient because each permutation of the input data
must be recognized by a separate pulldown or pullup path. Lookup-table implementations fail for this reason as well. On the other hand, an adder implementation is
inefficient because it produces extraneous output; it is not possible to discard higherorder bits of the sum, even though these bits are not desired for all practical Life
implementations. 2
Thus, in order to simplify the FSM structure, non-computational approaches must
be considered. One such approach is through sorting nets. A sorting net is a hardware
structure that takes n unsorted inputs and produces n outputs that are the inputs
in ascending or descending numerical order. Sorting nets differ from conventional
sorting algorithms in two ways: first, comparisons are done in parallel, and second,
the sequence of comparisons is independent of the input. Sorting nets are therefore
2
In the construction of a practical CAM, i.e. one that is more adept than the Game of Life at
general-purpose computation, the computational structure of the FSM is paramount. That is, the
cell would be implemented via a lookup table or via pipelined combinational logic, and the ability
to do single-cycle computation on all 26 inputs would be sacrificed. The emphasis would thus be on
optimization of the logic, rather than on optimization of the overall CAM structure, which is the
focus here.
61
A,
A2
A,
Vdd
AA
B,
B2
B,
B
Figure 5-4: Insertion sort bit-slice.
ideally suited for hardware implementation.
Life may be implemented by taking the input bus and sorting the bits on the
bus.
If the nth-highest bit is a 1, this then guarantees that there are at least n
live neighbors.
It is then straightforward to compute the next state of the cell by
examining the relevant wires in the bus.
An area-efficient wide sorting net can be implemented as an insertion sort. Insertions are efficient because each bit is inserted either at the top of the list or at the
bottom. Consider the circuit in Figure 5-4. Bits A 3 , A 2 , and A 1 are sorted, and A 0
is to be inserted. Bits B 3 .. .B 0 are thus also sorted.
A 26-bit sorter may be constructed using this method. Moreover, all but the fifth
through eighth highest bits may be discarded for Life 5766. The next state may then
be computed as a combinational function of these four bits and the current state.
The Game of Life cell is shown in Figure 5-5.
5.2.2
Game of Life CAM Architecture
Of vital interest is the usability of the 3-D medium for system-level optimizations of
the CAM architecture. In particular, the inter-cell communication and clock distribution both benefit from implementation in three-dimensions. System I/O can also
be done in an efficient manner. To experiment with some of these optimizations, the
Game of Life cell is fitted into a 4 x 4 x 4 rectangular mesh.
Inter-cell communication is a difficult problem mainly because of the number of
wires - 26 per bit of state. This problem confounds implementation of a 3-D CAM
62
state logic
output rester
Figure 5-5: Game of Life cell.
in a 2-D medium. With a three-dimensional medium, the problem reduces to that of
wiring the eight-neighbor 2-D configuration. Each cell requires inputs from its eight
neighbors on the same layer, nine neighbors from one layer down, and nine neighbors
from one layer up. An efficient 3-D wiring scheme is to wire the eight same-layer
neighbor inputs to the cell, and then wire each input to three via stacks. The first
stack connects to the gate on that layer, the second stack goes up to the next layer,
and the third stack goes down to the layer below. Beneath the second stack lies a
via stack coming up from the layer below, and above the third stack lies a via stack
coming down from the layer above. Thus, when the cells are aligned along the third
dimension, the vias align to provide I/O between layers.
Clock distribution is done through an H-tree network. The clock net on the first
two layers of the CAM is shown in Figure 5-6. The four peripheral vias connect the
four-cell H-trees on the second layer to the four-cell H-trees on the first layer, thus
forming a 3-D H-tree. The central via then connects the 32-cell H-tree shown in the
63
|-
Figure 5-6: Clock distribution in the 3-D Game of Life architecture.
figure to the 32-cell H-tree on the third and fourth layers of the CAM.
The savings in wire length can be seen immediately. Distributing the clock signal
to an 8 x 8 2-D grid would require four H-trees like the one in the left half of Figure 5-6; this 3-D implementation requires two. Further, the four H-trees in the 2-D
implementation would have to be connected into an H-tree, requiring yet more wire.
In this 3-D circuit, the trees are connected by the central via stack. Thus, the 3-D
medium features wire-length savings, and thus power savings, that can be exploited
readily.
Of course, a 3-D architecture is useless if there is no way in which it can be
programmed. System I/O thus becomes an important concern. While various bitserial methods such as shift-register implementations have been proposed for this
purpose, the approach taken here is to integrate a 3-D memory interface with the
CAM. Specifically, each row receives a word line and each column receives a bit line.
Each cell also receives a write-enable signal that is used to program the CAM and that
64
also serves to disconnect the state logic from the output register. External addressing
and control can then be used to select a multi-bit word for reading out of the CAM
while the CAM is operating, or can be used to stall the CAM and alter some or all of
its state. The CAM architecture shown here may thus be visualized as a 3-D memory
with integrated processing elements.
A single layer of the 4 x 4 x 4 Game of Life architecture is shown in Figure 5-7.
65
-*
0
C
.0
0
Chapter 6
Conclusion
Currently, integrated circuits are fabricated using a CMOS technology which constrains the circuits to a two-dimensional geometry. This places fundamental limits on
the density of both memory elements and processors in a single chip. The development of new fabrication technologies, however, has made feasible the implementation
of three-dimensional (multiple-layer) integrated circuits. The possibility of creating
such devices will open new doors for circuit design and fabrication.
In order to be prepared to utilize this technology, software design tools have been
developed that allow circuit designers to utilize their background in 2-D CMOS logic
design to develop 3-D circuits. This thesis centers in the development of such tools.
Specifically, a computer-aided design system called FluidLayout has been developed that integrates a familiar means of circuit layout with the means to handle 3-D
circuits of arbitrarily many layers. In addition to providing a familiar circuit design
interface, FluidLayout also provides useful system-level functions such as the ability
to extract circuit netlists and the ability to produce file formats useful for fabrication.
The tools and technology have been used in this thesis to design some interesting
circuits.
Some basic transistor circuits have been laid out to test the viability of
FluidLayout and of the medium. FluidLayout has been used to specify the circuit
arrangement and to verify functionality of the circuit through netlist extraction. A
wafer mask has been produced using the specification provided by FluidLayout.
To explore some interesting ramifications of the technology, a 3-D static RAM has
67
also been designed using FluidLayout. This SRAM has a 512-bit capacity, but has
the same footprint as that of a 64-bit 2-D SRAM. Further, wiring costs have been
reduced so that the power consumption of this memory is approximately that of the
64-bit 2-D SRAM as well. The I/O packaging of this 3-D SRAM is arranged so that
it may be used as a drop-in replacement for conventional SRAM; thus, immediate
benefits may be seen from this technology.
Finally, a 3-D cellular automata machine has been designed in order to explore
some system issues in 3-D design. This CAM implements a 3-D version of Conway's
Game of Life in hardware. The architecture features an efficiently-distributed global
clock signal and a 3-D memory-style I/O interface.
As the technology matures, circuits can be fabricated using the tools and techniques developed in this thesis. The potential benefits introduced by such new devices
are enormous.
68
Appendix A
FluidLayout User's Guide
A.1
Overview
FluidLayout is a design tool for geometric layout of three-dimensional integrated
circuits and basic microelectromechanical systems (MEMS). That is, within FluidLayout, it is possible to construct arbitrary metal-insulator-semiconductor structures
so long as the structures are confined to a Manhattan (900) grid.
Figure A-i shows the main appearance of FluidLayout. The components outlined
in the figure will be discussed in detail.
A.2
Basic Layout
Basic layout is done by painting rectangular boxes on a grid. A colored box is created
by first drawing the box outline (the edit box) and then selecting the paint with which
to fill it. The left mouse button is used to select the lower-left corner of the box and
the right mouse button is used to select the upper-right corner. The box, once sized,
may be moved by clicking the left mouse button at the desired lower-left coordinate.
Once the box has been sized, it must be painted. There are two ways of doing
this. The first is to select the desired material from the main toolbar, shown in
Figure A-2. Substrate materials are identified by their color and their associated tool
tips (for example, passing the mouse over the red button produces a "Gate" pop-up).
69
intil
F01
r
VIZI Jql ml o z!
ir-tlvAlR-,,l
low
d
.. . . . .. . .. . . . . .
ciell f 6 dlb if
oq
.............
i
tna i
albini
.........
......
----------
.........
--------
.......
------------I
IN
... ...
.....
t
IF
0
. .......... .
. ..............
..........
C
71-
....
.....
-------- --
Figure A-2: The main FluidLayout toolbar.
Via materials are identified by the color of the underlying metal along with an "X"
through the button.
The other way to paint boxes is to use existing paint. The middle mouse button
acts as a copy button, in that if it is clicked over a painted region, the paint in that
region is added to the paint in the edit box. Also, if the middle mouse button is
clicked over empty space, the edit box is cleared.
To make rectangle placement easier, several features are supplied. The first is
a grid whose visibility may be toggled by the grid toolbar button or the 'g' key.
Rectangle placement is aligned to this grid regardless of its visibility. Another feature
is the current-coordinate window. The coordinates given are relative to the lower left
of the entire circuit, so this information is especially useful in coordinating object
locations across layers. Finally, zoom buttons are provided in order to allow the user
to view any part of the layout easily. Specifically, three buttons are provided: zoom
in (2:1), zoom out (1:2), and view all. Further, the 'z'/'Z' and 'v' keys are enabled
as shortcuts; 'z' zooms to the edit box, 'Z' is equivalent to "zoom out", and 'v' is
equivalent to "view all."
A.3
A.3.1
Higher-Level Functions
Node Labeling
For circuit verification purposes, it is desirable to label some of the nodes of a circuit.
This is done in FluidLayout using the "select" and "label" functionality.
A node that is to be labeled must first be selected. In Figure A-3, a selected node
is shown. FluidLayout outlines the selected rectangle in bold white. A rectangle may
be selected using the 's' key.
71
Figure A-3: Partial view showing node selection.
Once selected, a rectangle may be labeled by pressing the 't' key or the label
button (marked 'T') on the toolbar. Doing so causes the dialog in Figure A-4 to pop
up.
The user may then enter an appropriate label.
A.3.2
Translation, Rotation, and Reflection
The ability to move portions of a circuit around the layout is extremely useful. In
FluidLayout, this is accomplished with the Cut, Copy, and Paste facilities.
The
contents of the edit box are copied to the clipboard (and in the case of Cut, erased
from the layout), and may be placed at a new location by relocating the edit box
there.
Additionally, FluidLayout supports 900 rotations about the lower-left corner of
the edit box and reflections about the left side of the edit box. A rotation is depicted
in Figure A-5.
72
Car
Figure A-4: Label dialog.
ILI
Figure A-5: Subcircuit rotation.
73
oLi
Figure A-6: Cell hierarchy management toolbar.
A.4
A.4.1
Circuit-Level Tools
Circuit Traversal
As discussed before, FluidLayout provides zoom-in, zoom-out, and view-all functionality for circuits on a layer. The mechanism for creating and viewing different layers
of the circuit is provided from the pull-down menus.
For example, one can add a layer to the circuit by pulling down the Window menu
and selecting New Layer. One can then open a view window for any of the existing
layers by pulling down the View menu and selecting Switch to Layer. A dialog box
will then pop up in which the user can specify the layer to view.
A.4.2
Cell Hierarchy Management
Every FluidLayout design can be used as a subcell of another design. Thus, FluidLayout has full support for cell hierarchy. Cell manipulation is done through the cell
toolbar, shown in Figure A-6.
The cell toolbar has buttons for loading cells into the design, adding cells to the
layout, and for manipulation of individual cells. Specifically, the cell toolbar has the
following buttons from left to right:
Load Cell allows the user to load a cell from disk into memory and associate the
cell with the current working design.
Add Cell pops up a dialog box from which the user can select a loaded cell for
addition to the circuit layout.
The next three buttons specify the operation mode for cell manipulation. A cell
may be accessed by double-clicking on the cell. The access mode is specified by the
toolbar button that is currently depressed.
74
Copy Cell signifies that double-clicking should create a copy of the accessed cell.
Move Cell signifies that double-clicking should move the accessed cell.
Erase Cell signifies that double-clicking should erase the accessed cell.
With this interface, it is feasible to construct circuits with arbitrary hierarchical
depth.
A.4.3
Magic Importation
FluidLayout has the ability to import circuit layouts created in Magic. Text files
with the
.mag
extension are recognized as Magic layout specifications, and those
rectangles with material types that correspond to FluidLayout materials are imported
to FluidLayout's native format.
This option is selected by pulling down the File menu and selecting the Import->MAGIC
file (.mag) option.
A.4.4
Circuit Netlist Extraction
FluidLayout is able to extract the connectivity information in VLSI circuit layouts
and produce a netlist in SPICE deck format. This format is suitable for functional
and timing simulation using the Berkeley SPICE circuit simulator or its variants.
(The user must supply SPICE models for the transistors; this information cannot be
provided by FluidLayout.)
The user may select this option by pulling down the File menu and selecting the
Export->SPICE deck (.sp) option.
A.4.5
VLSI/MEMS Fabrication
At some point, it is desirable to target a design for fabrication. There are two steps
that must be taken: first, the fabrication technology must be specified, and second,
the circuit must be extracted to a suitable output file.
75
In~tSetp, Laser Setup -stj
Lambda (0.01:
Laser position:
_.1
-135
X:
Lasei curent (mA):
300
Printing feed rate:
500
-Restore
-Defaults1
Figure A-7: Edit->Properties->Laser Setup
Fabrication Technology Specification
FluidLayout is designed to understand fabrication technologies on a system-by-system
basis. This is mainly because the fabrication methods are drastically different from
each other. For example, FluidLayout is able to handle both inkjet and embossing
means of fabrication.
FluidLayout accommodates this by maintaining a property sheet for each design.
This sheet may be accessed by pulling down the Edit menu and selecting Properties.
Figure A-7 shows the property sheet for laser rasterization of the layout.
76
The individual fabrication parameters are set as described here:
Inkjet Setup
The inkjet is a 48-nozzle liquid printing head mounted on an X-Y-Z
gantry. This property sheet therefore configures the size of the printing head and
various printing parameters.
Lambda corresponds to the size of the grid spacing. This should be set to the line
width of the inkjet, since rectangles are rastered, and it is preferred that rectangles
are printed as such as opposed to sets of snake-like lines.
Inter-nozzle spacing refers to the space between each of the 48 nozzles.
Printing feed rate is the rate of gantry movement during printing.
Non-printing feed rate is the rate of gantry movement when not printing.
Pre-print (acceleration) spacing is used to produce smooth, even lines. The
inkjet is given a small amount of space to accelerate to full speed before printing.
Laser Setup
The laser setup is similar to that of the inkjet, as the laser is mounted
on the gantry also.
Laser position is the (x, y) location of the laser in gantry coordinates.
Laser current is the operating current of the laser. This is used for automated
operation of the laser via the serial port of the controlling computer.
Stamp Setup
The stamp setup uses a flexible stamp to pattern liquid materials.
Each stamp contains all the patterns for a circuit (e.g the gate pattern is followed
by the gate-via pattern); these patterns are spatially separated on the stamp. Each
pattern consists of rectangular outlines that are used to separate the desired material
rectangles from undesirable excess material.
Layer-to-layer spacing thus gives the spacing from a given layer on the stamp
to the next. For example, if the gate pattern is at (0, 0) and the spacing is set to 1000
microns, then the gate-via pattern is at (1000, 0).
Width of stamp outline refers to the width of the rectangular outlines in the
stamp patterns.
77
Printer Use
The gantry printing system is designed so that laser and inkjet print-
ing may be done interchangeably, meaning that different material parts of the same
circuit may be done with either inkjet or laser. Thus, this property sheet allows the
user to specify which method is to be used for printing.
Fabrication File Production
Once the technology is set, all that remains is to produce output that can be fabricated. This can be done in three ways, depending on the target process.
File->Export->MMI code is used to produce G-code for the gantry system.
File->Export->VLSI Nanoprinting GDSII produces GDSII binary stream data
that is almost universally accepted for mask fabrication.
File->Export->MEMS Nanoprinting GDSII produces GDSII binary data, but
the layout is interpreted as a MEMS process and the release layers are generated
accordingly.
A.5
Step-By-Step Design Walk-Through
As an example, a three-dimensional ring oscillator is laid out here. Once FluidLayout
is started, the user is presented with the screen in Figure A-1.
The grid is off by
default; it should be turned on by clicking on the grid toolbar button or by hitting
the 'g' key.
First, an NMOS transistor is created.
To do so, a source node is created by
left-clicking at some location and right-clicking to define a 4 x 2 box, as shown in
Figure A-8.
Clicking on the blue toolbar button fills in this box with source/drain material,
as shown in Figure A-9.
By left-clicking, the edit box can be relocated to the right of the new source node
as a suitable location for a drain node. Then, by middle-clicking over the source
node, the source node material can be copied to the new edit-box location, yielding
Figure A-10.
78
-
r
I
I
I
I
I
-
I
-
II-
-
I-
--
I
[
II
I
I
I
Figure A-8: 4 x 2 box used for the source of an NMOS transistor.
I
I
I
I
I
I
I
I --
I
I
J
~
-
I
i-
-i
I-
Figure A-9: Completed source node of the NMOS transistor.
I
.
-
-
L
-
.-
Figure A-10: NMOS source and drain nodes.
79
Source and drain nodes for the PMOS transistor can be created using the same
method. The resulting pattern is shown in Figure A-11.
Figure A-11: Inverter source and drain nodes.
The common gate may be laid out using the edit box and clicking on the red
toolbar button to fill it. Similarly, the n-type and p-type semiconductor can be
placed using the green and brown buttons respectively. The power and ground lines
can also be laid out using the source/drain material. Finally, the transistor drains
can be wired together to form the output. This results in the image in Figure A-12,
which can be seen in its entirety by pressing the 'v' key or clicking on the 'V' toolbar
button.
It is now desired to create inverters on the second and third layers of this circuit,
80
I
I
--
;
I
;
I - I
I
1
V
I I
-
i
'u- - r -a
I
-..L
J
I
-L--P..
I
I
r
-r--r-
L
r
INI
f
I
Figure A-12: Complete inverter.
81
V
-
I
;
I
thus making a three-dimensional ring-oscillator. This is started by creating a new
layer from the Window menu, shown in Figure A-13.
_e
P
I-ayout
f
I
NJA w
I
I -
II
L_
b.ta
I
I
ye
-
Figure A-13: Window menu.
This layout process could then be repeated for each new layer. Instead, the Copy
facility is used to duplicate the inverter. This is done in four steps. First, the edit box
is used to outline the area to be copied. Then either control-C or the copy toolbar
button is used to copy the contents of the edit box. Next, in the view window for
the second layer, the edit box is placed at the desired location for the copy (it is only
necessary for the lower-left corner of the box to be in the correct place; this can be
verified from the current-coordinate indicator). Finally, the Paste function is used by
pressing control-V or the paste toolbar button.
Next, vias must be placed to connect the three layers. Starting with the first
layer, contact pads to the input, output, power, and ground are laid out, as shown in
Figure A-14.
Each contact pad may then be wired to a series of vias that connect it to the next
layer up. Specifically, this is done by outlining the contact pad with the edit box,
then selecting the via material to place. In this case, the gate pad needs gate-via,
source/drain-via, metall-via, and metal2-via to connect it to layer 2. A via is placed
by clicking on the appropriate toolbar button. Specifically, the (xyz)-via button looks
like the (xyz) button with an 'X' through it.
Upon placing the metal2 via, which connects to the next layer up, a corresponding
82
I
-.-
-
I
1
- 1-
I
I
I
I
I
I
1
1
1
jr
-4
-
L
L
.-
--
-
----
-
L
-
-
F
I
-
i J
--
I II
r
L----
-
I
I
L,
I
A
f,
I.
-
- -- i- - r
i J .
--
1-
- 1.
-
r
1
4,L
~
~
~
i
I
.
-JLJ
J
L
-
.1
J-1
I
t
I
i~~
- - r
J L J- J
J.
-
$,I
i
r-
-i
-i
-k'I
L
"Lr """1
- J1
"g"
-
--
A
-
,
L1
J
J_-J
J..L
.--
.L
t I
I
I
1.1
I
I
f
I I
I
r - -r -
1
-4 - --I-
Ij
I
V
IP
I
-
- r
J
L-
-
4-1-
I-
I 'r~~
'I
-J
--
-
1
I
I
-A
I
I
I
~I
-v
t
I
1
i -
1
L __1U J
L
-- -4 -- P.--I- -4--7
i
1
- -
J--L
---
i -
-
Figure A-14: First-layer inverter with contact pads.
83
I
I
I
- - J -L
"W
J--Lj
'J~~-
-17,
-
--
-
--
via is marked on the next layer. This is shown in Figure A-15.
rT -r 1-
I I
J-
I
I
_1
I
~
I
11
-
_L
J
i-
L
I
-
--
of
I..
-
r -
4I
i
j
I
1
-
-
11II
_
L
1
-
-
-
"1
-
- -,
- ri
r
J
L
1
L
I
I
1l
1
1
1
_kI
r i
I
I
I
J-
L
II
T '
I
I
I
I
I
e
-
I
I
I
I
I
I
S
r
,I-
--
-I
-
--
Figure A-15: First-layer inverter with via stacks to the second layer.
The inverters on the second and third layers may be wired similarly. This completes the ring-oscillator structure.
Finally, the nodes may be labeled for circuit-verification purposes. This is done
by placing the mouse pointer over the desired node and pressing the 's' key to select
it. Then, pressing the 't' key or clicking on the label button (marked 'T') pops up
a dialog box in which the user can label the node. Vdd, ground, and the oscillator
output are labeled in this way, as shown in Figure A-16.
84
1
I
I
II
-
I
I
4
1
I
I
I
L
N_-
-
i-
I_
I
~
I
I
L---I---
-
m
I
J_
I
-1
-y
a
--
1
-I
*
-'------I
I-I
I
I
-I
I
I
I
I
I
I
. I:
I
I
I
I
- .
--
II-
I
. I
II
I
I
II
--
r
i
r
--------
~
... Ji..
~
a
A
i
i
-i
L
IL
Ie
A
a
a
I
ir
-.-
J-L
Figure A-16: Labeled first-layer inverter.
85
I-
I
t
i
I
I
J
--
L. --
The circuit is then extracted to a SPICE deck via the File->Export menu, shown
in Figure A-17.
m
, Edit View
New
Window
Help
Opery..
Close
S ave
11Sae
As...
Pr'nt Preview
Pint Setup...
UMI code (
_prg}
(.gds}
GDSI
VLSI Nanoprinting
1 ring-oscillator
MEMS Nandrinting G D II (.gds)
2 C:\Users\...3 DGameOfLife
3 CAIsers\...\3DGame0fLifeCel
4 C:\Users\...\Mask 1
SPICE, dek (.sp)
-41
I
Exit
Figure A-17: File->Export menu.
This produces the SPICE circuit netlist shown here.
***** C: \WINNT\Profiles\shamikd \Desktop\ring-oscillator. sp
***** Created by FluidLayout
***** Created on 5/1/2000
M1 3 ring GND! 0 NTFT W=10u L=10u
M2 3 ring Vdd! Vdd! PTFT W=30u L=10u
M3 2 3 GND! 0 NTFT W=10u L=10u
M4 2 3 Vdd! Vdd! PTFT W=30u L=10u
M5 ring 2 GND! 0 NTFT W=10u L=10u
M6 ring 2 Vdd! Vdd! PTFT W=30u L=10u
86
Finally, it is desired to extract the circuit to output that can be fabricated. Using
the same File->Export menu, VLSI nanoprinting GDSII is selected. This produces
a binary stream that can be used to turn a mask, which is used to produce the
wafer mold from which a circuit stamp can be made. The circuit design flow is thus
complete.
87
A.6
Summary of Useful Commands
Desired Action
Method, Shortcut, or Menu
place edit box
left-click at lower-left corner
resize edit box
right-click at upper-right corner
paint box
click on applicable toolbar button or
middle-click over an area with the same paint
zoom to edit box
'z' key
zoom in
zoom-in toolbar button (+ magnifying glass)
zoom out
zoom-in toolbar button (- magnifying glass)
view entire circuit
'v' key or view-all toolbar button (marked 'V')
toggle grid
grid toolbar button or 'g' key
cut contents of edit box
Cut toolbar button or ctrl-X
copy contents of edit box
Copy toolbar button or ctrl-C
paste contents of edit box
Paste toolbar button or ctrl-V
rotate contents of edit box
Rotate toolbar buttons (circular arrows)
select rectangle/node
's' key
label rectangle/node
't' key or Label toolbar button (marked 'T')
manipulate cell
double-click left mouse button over cell
load cell
Load Cell toolbar button (image of cell from floppy disk)
add cell
Add Cell toolbar button (image of cell from list)
copy cell
Copy Cell toolbar button (image of duplicate cells)
move cell
Move Cell toolbar button (arrow)
erase cell
Erase Cell toolbar button (obliterated cell image)
edit fabrication parameters
Edit->Properties menu
import Magic layout
File->Import menu
export SPICE deck
File->Export->SPICE
export fabrication files
File->Export menus
88
Bibliography
[1] Carter Bays.
Candidates for the game of life in three dimensions.
Complex
Systems, 1:373-400, 1987.
[2] Carter Bays. A new game of three-dimensional life. Complex Systems, 5:15-18,
1991.
[3] Carter Bays. A new candidate rule for the game of three-dimensional life. Complex Systems, 6:433-441, 1992.
[4] Claude Bertin et al. Integrated multichip memory module structure.
United
States Patent 5,502,667, March 1996.
[5] A. R. Brown et al. Logic gates made from polymer transistors and their use in
ring oscillators. Science, 270:972-974, 1995.
[6] J. H. Conway, E. R. Berlekamp, and R. K. Guy. Winning Ways for Your Mathematical Plays. Academic Press, New York, 1983.
[7] Sawyer Fuller and Joseph Jacobson. Ink jet fabricated nanoparticle mems. In
13th Annual IEEE Conf. on MEMS, 2000.
[8] Information Mechanics Group. Cam8: A parallel, uniform, scalable architecture
for cellular automata experimentation. On the World-Wide-Web at http://www.
im. Ics. mit. edu/ cam8.
[9] Roger T. Howe and Charles G. Sodini. Microelectronics: An IntegratedApproach.
Prentice-Hall, NJ, 1997.
89
[10] F. T. Leighton and Arnold L. Rosenberg.
Three-dimensional circuit layouts.
SIAM Journal on Computing, 15(3):793-813, 1986.
[11] Charles E. Leiserson. Vlsi theory and parallel supercomputing. MIT/LCS/TM
402, Massachusetts Institute of Technology Laboratory for Computer Science,
May 1989.
[12] Magic - a vlsi layout system. On the World-Wide-Web at http://research. compaq. com/ wrl/ projects/ magic/ index. html.
[13] C. A. Mead and L. A. Conway. Introduction to VLSI Systems. Addison-Wesley,
Reading, MA, 1980.
[14] J. Ousterhout. Corner stitching: A data structuring technique for vlsi layout
tools. IEEE Trans. Computer-Aided Design, CAD-3(1):87-100, 1984.
[15] J. Ousterhout et al. Magic: A vlsi layout system. In Proceedings of the 21st
IEEE Design Automation Conference, pages 152-159, 1984.
[16] B. A. Ridley et al. Solution-processed inorganic transistors and sub-micron nonlithographic patterning using nanoparticle inks. In Materials Research Society
Proceedings, Fall 1999.
[17] B. A. Ridley, B. Nivi, and J. M. Jacobson. All-inorganic field-effect transistors
fabricated by printing. Science, 286:746-749, 1999.
[18] Arnold L. Rosenberg. Three-dimensional integrated circuitry, pages 69-80. VLSI
Systems and Computations. Computer Science Press, Rockville, MD, 1981.
[19] Michael Sipser. Introduction to the Theory of Computation. PWS Pub., Boston,
1997.
[20] Francoise F. Souli6, Yves Robert, and Maurice Tchuente, editors.
Automata
Networks in Computer Science: Theory and Applications. Princeton University
Press, Princeton, NJ, 1987.
90
[21] Andrew C. Tickle. Thin-Film Transistors;A New Approach to Microelectronics.
1961.
[22] Tomaso Toffoli and Norman Margolus.
Cellular Automata Machines: A New
Environment for Modeling. MIT Press, Cambridge, MA, 1991.
[23] Neil H. E. Weste and Kamran Eshraghian. Principles of CMOS VLSI Design:
A Systems Perspective. Addison-Wesley, Reading, MA, 1993.
[24] Ronald Williams and Ogden Marsh. Future wsi technology: stacked monolithic
wsi. IEEE Transactions on Components, Hybrids, and Manufacturing Technology, 16:610-614, 1993.
[25] P. Zavracky. 3d microelectronics. On the World-Wide-Web at http://www. ece.
neu. edu/ edsnu/ zavracky/ mfl/ programs/ 3d/ 3dmicro.html.
91
Download