Design and Implementation of Three-Dimensional Logic Structures by Shamik Das Submitted to the Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degrees of Bachelor of Science in Electrical Science and Engineering and Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY June 2000 © Shamik Das, MM. All rights reserved. The author hereby grants to MIT permission to reproduce and distribute publicly paper and electronic copies of this thesis document in whole or in part, and to grant others the right to do so. MASSACHUSETTS INSTITUTE OF TECHNOLOGY JUL 2 7 2000 Author .......................... ..................... .LIBRARIES Department of Electrical Engineering and Computer Science May 22, 2000 C ertified by ...... ... ....................... SV Joseph Jacobson Associate-Professor, Media Arts and Sciences :;jhis S3ervisor - Accepted by.......... . ........ Arthu+r ni.t Chairman, Department Committee on Graduate Students Design and Implementation of Three-Dimensional Logic Structures by Shamik Das Submitted to the Department of Electrical Engineering and Computer Science on May 22, 2000, in Partial Fulfillment of the Requirements for the Degrees of Bachelor of Science in Electrical Science and Engineering and Master of Engineering in Electrical Engineering and Computer Science Abstract In this thesis, a computer-aided-design (CAD) system is developed that assists in the design of novel three-dimensional integrated circuits. The software tools allow for the specification of a multilayer transistor circuit by means that are readily accessible to those familiar with two-dimensional CMOS VLSI design. This software system provides desirable features such as SPICE circuit extraction and the ability to produce the design formats necessary for automated fabrication (e.g. mask specifications for lithography or Gerber data for inkjet printing). Finally, in this thesis, the software tools are used to design a ring oscillator, a 3-D static RAM, and a 3-D cellular automata machine. Thesis Supervisor: Joseph Jacobson Title: Associate Professor, Media Arts and Sciences 2 Acknowledgments I am grateful to many people for their support in the development of this thesis. My thesis advisor, Joe Jacobson, deserves thanks for his guidance and motivation, as well as for many helpful discussions about the research. Babak Nivi, Colin Bulthaup, and Eric Wilhelm were instrumental in fabricating test structures from the softwareproduced specifications. I also appreciate the many TFT discussions with Babak, Colin, and Brent Ridley, as these were important for shaping the form the circuitdesign process was to take. Saul Griffith and Sawyer Fuller deserve thanks for their input on laser and inkjet patterning of functional materials. In addition, this thesis would not have been completed without the support of many friends, brothers, and loved ones. I would especially like to thank my family - my parents, Dilip and Mala, and my sister, Alina - for their inspiration, direction, and support. 3 Contents 1 Introduction 2 3 8 1.1 Design of the Layout Software . . . . . 10 1.2 Implementation of Test Circuits . . . . 12 1.2.1 Ring Oscillator 12 1.2.2 Static Random-Access Memory 13 1.2.3 Cellular Automata Machine 14 FluidLayout - The Layout Software 16 2.1 Overall Considerations . . . . . . . . . 17 2.2 Implementation . . . . . . . . . . . . . 20 2.2.1 2-D Slice Manipulation . . . . . 21 2.2.2 Circuit Partitioniing . . . . . . 24 2.3 Circuit Verification . . . . . . . . . . . 26 2.4 Circuit Fabrication . . . . . . . . . . . 27 2.5 Design Walk-Through . . . . . . . . . 28 . . . . . . . . . Some Basic Transistor Circuits 31 3.1 Minimum Criteria for the Technology . . . . . . . . . . . . . . . . . . 31 3.2 Design of Basic Circuits 34 . . . . . . . . . . . . . . . . . . . . . . . . . 4 The Static Random-Access Memory 42 4.1 Background and Motivation . . . . . . . . . . . . . . . . . . . . . . . 43 4.2 SRAM operation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 44 4 5 4.3 Extensions to 3-D . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 46 4.4 Layout of a 3-D SRAM . . . . . . . . . . . . . . . . . . . . . . . . . . 51 The Cellular-Automata Machine 55 5.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55 5.1.1 Finite-State Machines . . . . . . . . . . . . . . . . . . . . . . 55 5.1.2 Cellular-Automata Machines . . . . . . . . . . . . . . . . . . . 57 5.1.3 The Game of Life . . . . . . . . . . . . . . . . . . . . . . . . . 59 Layout of a 3-D Game of Life . . . . . . . . . . . . . . . . . . . . . . 60 5.2.1 Game of Life Cell . . . . . . . . . . . . . . . . . . . . . . . . . 60 5.2.2 Game of Life CAM Architecture . . . . . . . . . . . . . . . . . 62 5.2 6 Conclusion 67 A FluidLayout User's Guide 69 A .1 Overview . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 A.2 Basic Layout . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 69 A.3 Higher-Level Functions . . . . . . . . . . . . . . . . . . . . . . . . . . 71 A.3.1 Node Labeling . . . . . . . . . . . . . . . . . . . . . . . . . . . 71 A.3.2 Translation, Rotation, and Reflection . . . . . . . . . . . . . . 72 A.4 Circuit-Level Tools . . . . . . . . . . . . . . . . . . . . . . . . . . . . 74 A.4.1 Circuit Traversal . . . . . . . . . . . . . . . . . . . . . . . . . A.4.2 Cell Hierarchy Management . . . . . . . . . . . . . . . . . . . 74 A.4.3 Magic Importation . . . . . . . . . . . . . . . . . . . . . . . . 75 A.4.4 Circuit Netlist Extraction . . . . . . . . . . . . . . . . . . . . 75 A.4.5 VLSI/MEMS Fabrication . . . . . . . . . . . . . . . . . . . . 75 A.5 Step-By-Step Design Walk-Through . . . . . . . . . . . . . . . . . . . 78 A.6 Summary of Useful Commands 88 . . . . . . . . . . . . . . . . . . . . . 5 74 List of Figures 2-1 CLayer object with embedded CRectangle objects. . . . . . . . . . . 22 2-2 Corner-stitched CRectangle object. . . . . . . . . . . . . . . . . . . . 23 2-3 Area enumeration of CRectangle objects within a bounding rectangle. 24 2-4 Canonical technology used in FluidLayout. . . . . . . . . . . . . . . . 26 2-5 Box-outlining is used to place materials in FluidLayout. . . . . . . . . 28 2-6 Complete NMOS pulldown path. . . . . . . . . . . . . . . . . . . . . 29 2-7 Placement of a metal2-+gate via results in a gate-+metal2 hint on the second layer. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 29 2-8 The com plete inverter. . . . . . . . . . . . . . . . . . . . . . . . . . . 30 3-1 NM O S inverter. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 32 3-2 NMOS inverter small-signal model about VM. . . . . . . . . . . . . . 33 3-3 Layout of test devices. . . . . . . . . . . . . . . . . . . . . . . . . . . 35 3-4 3-D layout of a ring oscillator. The three inverters shown are stacked to form the 3-D layout. . . . . . . . . . . . . . . . . . . . . . . . . . . 36 . . . . . . 39 . . . 40 3-5 Stamp pattern with spatially-separated material patterns. 3-6 Patterned gate for a NOR gate, SRAM cell, and ring oscillator. 3-7 Patterned source/drain for a NOR gate, SRAM cell, and ring oscillator. 41 4-1 Six-transistor circuit for individual bit storage. . . . . . . . . . . . . . 45 4-2 Eight-transistor circuit for use in 3-D SRAM. . . . . . . . . . . . . . 47 4-3 Proper cell distribution improves aspect ratio and decreases bit-line length........ 4-4 .................................... 3-D partitioning of rows allows for simple tri-stating of word lines. . . 6 48 50 4-5 First and second layers of an eight-layer 3-D SRAM . . . . . . . . . . 52 4-6 6-T SRAM cell layout. . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4-7 Word-line tri-stating. . . . . . . . . . . . . . . . . . . . . . . . . . . . 53 4-8 Bit-line decoding using the word-line tri-state control signal. . . . . . 54 5-1 Finite state machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . 56 5-2 Turing machine. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 57 5-3 Four cells of a cellular-automata machine. . . . . . . . . . . . . . . . 58 5-4 Insertion sort bit-slice. . . . . . . . . . . . . . . . . . . . . . . . . . . 62 5-5 G am e of Life cell. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 63 5-6 Clock distribution in the 3-D Game of Life architecture . . . . . . . . 64 5-7 Layer of a 3-D Game of Life architecture comprising a 4 x 4 array of cells. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . A-1 FluidLayout screenshot. . . . . . . . . . . . . . . . . . . . . . . . . . A-2 The main FluidLayout toolbar. 66 70 . . . . . . . . . . . . . . . . . . . . . 71 A-3 Partial view showing node selection . . . . . . . . . . . . . . . . . . . 72 A-4 Label dialog . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 A-5 Subcircuit rotation. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 73 A-6 Cell hierarchy management toolbar. . . . . . . . . . . . . . . . . . . . 74 A-7 Edit->Properties->Laser Setup ..... 76 ................... A-8 4 x 2 box used for the source of an NMOS transistor. . . . . . . . . . 79 A-9 Completed source node of the NMOS transistor. . . . . . . . . . . . . 79 . . . . . . . . . . . . . . . . . . . . . 79 A-11 Inverter source and drain nodes. . . . . . . . . . . . . . . . . . . . . . 80 . . . . . . . . . . . . . . . . . . . . . . . . . . . . 81 A-10 NMOS source and drain nodes. A-12 Complete inverter. A-13 Window menu. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 82 A-14 First-layer inverter with contact pads. . . . . . . . . . . . . . . . . . . 83 A-15 First-layer inverter with via stacks to the second layer. . . . . . . . . 84 A-16 Labeled first-layer inverter. . . . . . . . . . . . . . . . . . . . . . . . . 85 A-17File->Export menu. . . . . . . . . . . . . . . . . . . . . . . . . . . . 86 7 Chapter 1 Introduction CMOS integrated circuits are traditionally fabricated on crystalline silicon wafers. Transistor structures are created on the surface of these wafers by implantation into the wafer and growth and deposition of material over the surface of the wafer. This fundamentally results in a two-dimensional circuit layout, as the transistors are confined to inhabit the boundary of the silicon substrate. However, a number of advances in solid-state technology have made possible the development of more complicated three-dimensional transistor structures. All of these advances rely on the creation of two-dimensional circuit "layers" by standard means, and then interconnecting these layers into a multi-layer structure. For example, the development of wafer-scale integration (WSI) allows for the creation of three-dimensional circuits by stacking wafers using a lift-off and bonding process [24]. Also, the advent of silicon-on-insulator (SOI) technology allows for a third dimension of circuitry by encapsulating an existing two-dimensional circuit with insulating material, planarizing this material, and placing the next layer of silicon on this insulator. A very promising path to multi-layer transistor circuits involves the use of solutionprocessed metals, semiconductors, and insulators. Transistors can be laid out by depositing the appropriate solutions onto an insulating surface, followed by curing to produce the desired materials [17, 16]. This approach has the advantage that it does not require integration on the wafer scale (which itself requires novel means of wafer verification and packaging) and is theoretically extensible to thousands of layers. 8 Having multiple layers in which to fabricate transistors gives the circuit designer the potential to improve the efficiency of circuits in terms of area, power, and speed. The savings in area are two-fold: first, by utilizing the third dimension as space for additional circuitry, integrated circuits can be made more dense without expanding the "footprint" of the circuit and without having to improve the process technology. This form of area improvement is best for memory devices such as SRAM, and also good for DRAM and EEPROM, where the goal is to fit as many bits of memory as possible into a given chip. Specifically, the use of n active layers allows for an n-fold improvement in the storage capacity of a memory chip, with little area overhead in the control circuitry. The other approach to area-savings lies in the retargetting of two-dimensional circuit layouts for a three-dimensional process technology. Theoretical results indicate that for many interesting 2-D circuit layouts, there exist corresponding 3-D layouts that are more efficient in terms of area (by which, in the three-dimensional context, we mean the aggregate area of all layers of the circuit) and maximum wirerun (the longest length of wire between any two active nodes). For example, the n-point Fast Fourier Transform (FFT) network can be implemented in area O(n3/ 2 ) with maximum wire-run O(ni/2 ) with three dimensions of active circuitry, while in a standard 2-D process, the same circuit would occupy area Q(n 2 ) with maximum wire-run Q(n/ log n) [18]. A hypercube network of n nodes, used in many parallelprocessing schemes, can also be implemented in area O(n 3 /2 ) using three dimensions, but requires area Q(n 2 ) using the standard two dimensions [11]. Finally, results in [10] indicate that any n-device circuit that can be laid out in area A in two dimensions can be laid out in area approximately (nA)1/ 2 using three. The savings in power may be realized by reducing the switched capacitance in circuits. By reducing the lengths of interconnect, the capacitance of internal nodes can be reduced, thus reducing the dynamic power dissipation of the circuit. For example, a potentially important savings can be realized in the layout of H-trees, which as indicated by [18] can be laid out with maximum wire-run O(ni/3 ) in three dimensions but require wire runs of Q(n'/ 2 / log n) in two dimensions. Since clock 9 distribution nets are often realized as H-trees, the potential exists to save power by utilizing the more efficient distribution architectures available with three dimensions. Finally, as circuits get more and more complicated, lengths of interconnect will affect timing characteristics. Reducing the wire run of circuits will reduce the charge and discharge times on these wires and enable faster operation of circuits. There are many reasons to develop the technology to fabricate three-dimensional CMOS devices. While developing such technology is beyond the scope of this thesis, it is important to realize that the ability to design logic with this technology must be developed simultaneously. Therefore, in this thesis, software tools are developed that are used to target circuit designs for a three-dimensional MOS process that has been developed contemporaneously. These tools are then used to synthesize some circuits that demonstrate the viability of the layout tools and some of the benefits of the new medium. 1.1 Design of the Layout Software Digital system design is usually done at three levels: behavioral, structural, and physical [23]. At the behavioral level, a digital system is specified by what it is intended to do; at the structural level, by what functional building blocks (e.g. gates, adders, registers, IP cores) are to be used; and at the physical level, by what construction materials are to be used and in what geometry they are to be configured. Corresponding to this division are several layers of abstraction - architectural, registertransfer-language (RTL), logical, and circuit layers - at which the designer can work [231. In a typical design flow, the designer will often work through both of these chains in parallel; for example, he or she may start by developing a behavioral and architectural specification for a system and proceed to flesh out the implementation details down to the circuit layer and at the physical level. Much of the task of fleshing out the details of a system is done with computer-aided design (CAD) tools. The goal of any suite of CAD tools for digital system design is to produce a working circuit, that is, 10 to specify fully a working design at the physical level and at the lowest abstraction layer. Typically, the design flow can be broken up into two phases - technology-independent design and technology-dependent design. For example, the process of behavioral specification is ideally technology-independent, whereas physical design at the circuit level is clearly technology-dependent. Each phase has an associated set of algorithms that are generally implemented in separate CAD tools. The process of targetting an abstract system design for a particular technology is called technology mapping, and can be done by a third set of CAD tools or as a final- or initial-stage operation of the two phases of design. In order to maximize the usefulness of any new technology, design tools must be developed that allow both for the use of new features of the technology and for the seamless integration of the technology with existing means of technology-independent design. In this thesis, the focus is on the technology-dependent phase of design. CAD tools are developed that allow the designer to work at the physical and structural The emphasis is twofold: first, familiar levels at arbitrary levels of abstraction. graphical user interfaces (GUIs) are adapted for use in working in a three-dimensional environment; second, the CAD tool allows for maximal use of the new features of the this environment. Specifically, a CAD tool for 3-D circuit layouts is designed using the open-source Magic VLSI layout system as a basis [15, 12]. The primary features in Magic that will be relied upon are the speed of the central algorithms and the familiarity of the GUI. Magic uses a geometric representation of the physical layout of the system that is based upon a scheme devised by Mead and Conway [13]. At the physical level, the design approach used for this thesis is to design each layer of a multi-layer circuit as a distinct two-dimensional circuit. What this means is that from a physical perspective, each individual layer of a multi-layer circuit looks like a traditional two-dimensional circuit; therefore, the layout of each layer can theoretically be done using available CAD tools. In fact, 3-D circuit design using this approach is the subject of ongoing research, where a small number of layers is 11 considered [25]. However, the use of existing CAD tools becomes infeasible as the number of layers becomes large. Thus, in this thesis, a CAD system is developed that integrates the familiarity of two-dimensional circuit design with a means of managing large numbers of layers and a direct means of wiring between the layers. 1.2 Implementation of Test Circuits The viability of the layout software is best tested by using it to implement various test circuits. These circuits should be chosen both to exhibit features of the medium and to exhibit useful properties of the software. To this end, the layout of three circuits has been carried out using the new software: a ring oscillator, a static random-access memory (SRAM), and a simple cellular automata machine (CAM). 1.2.1 Ring Oscillator In the development of any new technology in which circuits are to be fabricated, the ring oscillator is a fundamental circuit in that it is the simplest circuit to demonstrate the ability to cascade logic gates. That is, in any such technology, while the first goal is always to fabricate individual transistors, it does not necessarily follow that these transistors can be fashioned into a suitable multi-transistor logic gate. An individual logic gate must provide gain from the input to the output, or else when the gates are cascaded, the signals eventually decay to an ambiguous logic level [5]. A ring oscillator consists of an odd number of inverters cascaded in series into a ring. Once the ring oscillator is powered, any latent signal is able to propagate through the ring; this signal is inverted as it passes through each gate. When the signal returns to its starting point, it returns as the inverse of the original signal since there is an odd number of gates in the ring. So if the voltage at any particular node of the circuit is observed as a function of time, the result is an oscillation with period equal to twice the transit time of the signal through the ring. Since there is an odd number of inverters in the ring, the circuit acts to provide negative feedback on the signal. Oscillations are produced if feedback loop is unstable. 12 Thus, if the inverters that make up the ring do not provide sufficient gain (i.e. if the loop gain never exceeds 1), the signal stabilizes to an ambiguous logic level midway between the supply voltage and ground. In particular, it is desirable for the individual transistors to have as large a transconductance, g-, as possible, where gm is measured at the voltage corresponding to this ambiguous logic level. Having a sufficient gm produces the necessary gain to drive the output signals away from ambiguous logic levels and towards the voltage extremes (i.e. low or high voltage). Since g. is directly proportional to the mobility, p, of carriers in the transistor channel, having a large mobility is desired. However, it is possible to overcome the absence of large mobilities to some extent, because of other factors on which gm is dependent. For example, one may either increase the width-to-length ratio of the transistor channel, decrease the thickness of the gate oxide, or increase the supply voltage (thereby increasing the midpoint voltage where g, is measured). Further, in any technology, the proper choice of pullup (for an n-channel technology) or pulldown (for a p-channel technology) can reduce the dependency on device parameters. Nonetheless, the efficacy of these maneuvers is limited, due to circuit-area constraints, device-breakdown limits, and second-order effects. Also, in a complementary technology, which is most desirable due to power considerations, having high-quality basic device parameters is essential. Since the ring oscillator is a planar circuit, there exists a "natural" two-dimensional layout for the circuit at the transistor level. However, having a third dimension presents the opportunity to examine some new layout strategies. 1.2.2 Static Random-Access Memory One immediate application of a viable three-dimensional integration technology is in memories. The density of memory arises directly from the ability to pack as many homogeneous cells into a given chip as possible. So the availability of multiple layers in a chip allows for a direct approach to increasing density - simply stacking 2-D memory circuits into a 3-D chip gives the desired increase in density. This approach has been implemented at the system level by physically stacking chips and using off-chip circuitry to control the chip-enable signals and I/O [4]. This 13 is, of course, not extensible to arbitrarily many layers. However, the same approach can be taken at the chip level by internally wiring control signals to each layer of the circuit and wiring the data lines together in the same way that the data pins have been soldered together. In this thesis, a simple 3-D static random-access memory (SRAM) is designed and laid out using the CAD software. SRAM is chosen for several reasons. First, it can be fabricated using MOS technology and does not require special transistor structures. While other memories such as EEPROM and DRAM may see a greater push for increased density, these memories require dedicated fabrication technologies. SRAM, on the other hand, can be fabricated in a standard logic technology. Secondly, while read-only memories (ROMs) can also be fabricated using only standard MOS transistors, the need for high density is more prevalent in systems with writable storage. So for this thesis, an SRAM is designed that exhibits writable and retrievable storage and uses multiple layers of active material while using a standard two-dimensional pinout. Such an SRAM can therefore be made into a drop-in replacement for currently available SRAM. 1.2.3 Cellular Automata Machine While digital system design at the architecture level ideally is done without the fabrication technology in mind, the limitations of the technology inevitably play a role in the selection of a computational architecture. For example, in designing a multiprocessing architecture, physical constraints to two dimensions lead to architectural constrains in terms of the number of processors that can be imbedded in a given area [11]. In particular, design choices are often driven by the problem to be solved by the system. There are many computational problems that can be described efficiently using certain physical architectures and thus solved by modelling the architecture by a digital system. For example, single-instruction multiple-data (SIMD) architectures, and in particular cellular automata, have been shown to effectively model the (inherently three-dimensional) dynamics of many problems in physics [22, 8]. 14 Additionally, it has been shown that there exist cellular automata machine (CAM) architectures that can do general-purpose computations, i.e. that are equivalent to a Universal Turing Machine. The CAM architecture therefore can serve as a potential alternative to the architectures found in traditional computer processors [20]. For CAMs that are designed to model 3-D physical processes, it becomes infeasible to map the 3-D CAM into a two-dimensional circuit as the number of cells becomes large - the cost of interconnect becomes prohibitive. However, with a true threedimensional technology, the mapping of the CAM architecture to an integrated circuit is direct, thereby allowing the physical construction of machines that are impossible to integrate onto a single 2-D chip. Therefore, in this thesis, the software tools are used to design a simple 3-D CAM. One of the simplest cellular automata machines to exhibit interesting global behaviors is the Game of Life, devised by John Conway in 1970 [20]. The Game of Life is specified for a two-dimensional architecture, but can be extended to three dimensions [1, 2, 3]. It describes the cells as having one of two states (either "alive" or "dead"), with the state of any given cell on the next cycle of the game being determined by the states of its neighbors on the current cycle. The behavior of the machine as a whole can thus be observed by visually inspecting the cells ("alive" being indicated by a color or dot). Since a circuit that simulates the Game of Life can be readily verified, such a circuit is designed in this thesis. 15 Chapter 2 FluidLayout - The Layout Software There is a clear potential for circuit design innovation above and beyond what is possible with a two-dimensional fabrication technology. Since all known routes to three-dimensional circuit fabrication involve the construction of multiple layers of two-dimensional circuits, with inter-layer interconnect done by vias, it is tempting to propose that the design of three-dimensional circuits be carried out using existing software tools for each two-dimensional layer of the circuit. This approach provides the fastest route to working 3-D prototype circuits. However, there are several drawbacks to this approach that will limit severely its usability for designing complex 3-D circuits. First, as each layer of the circuit must be managed as a separate design, the management overhead increases with the number of layers. The designer is responsible for keeping track of interconnect between each pair of layers. While this may be feasible for a fixed number of layers, it becomes intractable for an arbitrary number of layers. Second, the automation of systemlevel tasks such as circuit netlist extraction and mask generation becomes difficult or impossible without additional software scripts or programs. A better approach is to design CAD software with integrated support for designing three-dimensional circuits. From a system perspective, software with this capability can provide the designer both with needed assistance in 3-D design management and with useful system-level design tools. Simultaneously, the software can utilize algorithms written for two-dimensional circuit design, as the individual operations 16 that will be carried out by the designer are the same in both 2-D and 3-D design. In this thesis, such a software tool, named FluidLayout owing to the solutionprocessing fabrication technology being used, has been delevoped. FluidLayout provides designers of 3-D circuits an integrated environment for laying out all layers of a circuit and for verification and fabrication of three-dimensional layouts. 2.1 Overall Considerations Much as circuit fabrication has been limited to constructing two-dimensional devices, circuit design has been fraught with limitations imposed by two-dimensional design methodologies. Traditional pen-and-paper circuit design, for example, requires partitioning a 3-D circuit into 2-D layers, either by using separate sheets of paper or by spatially separating the layers on a single sheet. In the case of circuit fabrication, the gains introduced with a third dimension justify the expense of developing the fabrication technology. However, in the case of circuit design, it is better to make optimal use of familiar design techniques rather than impose new design methodologies with associated learning curves. In addition, the costs of implementing a truly 3-D user interface are prohibitive. In FluidLayout, therefore, three-dimensional circuit layout is done by managing an arbitrarily large set of individual two-dimensional layers. The layout of each layer is performed in the same manner as a two-dimensional layout would be performed in many existing software packages. The consequences of this design decision are twofold. First, efficient algorithms for 2-D layout have already been developed and the corresponding source code may be reused. Thus, the core layout manipulation routines do not have to be redeveloped. Second, it is desirable to implement many whole-circuit algorithms such as netlist extraction and mask generation. These algorithms can be extended from 2-D to 3-D while maintaining their efficiency in terms of order-of-growth as a function of the number of transistors in the design. There are two stated goals to be achieved with FluidLayout. First, the user should be able to manage a true three-dimensional circuit with an arbitrary number of layers. 17 Second, the design process for individual layers should be familiar to designers of two-dimensional VLSI circuits. These goals have been taken into consideration at all stages of the software design process of FluidLayout. For example, in typical two-dimensional design formats, the representation of a VLSI layout is encoded as a list of material regions. Each region contains data that identifies the type of material (e.g. polysilicon) and the boundary coordinates of the material (e.g. the corners of a rectangle). The VLSI layout may then be stored as a file containing a list of regions. Thus, if an existing software tool uses this format as its native format, extension of this software for use in designing 3-D circuits becomes difficult; there is no means for differentiating the regions in the data file with respect to their locations along the third dimension if the coordinates used are 2-D coordinates. However, there are several remedies of varying efficacy. The first is to differentiate the material types by layer. For example, polysilicon on the first layer of transistors might be assigned material type poly-1 while polysilicon on the second layer might be assigned type poly-2. Since many software packages have support for adding or changing material types, this approach is straightforward to implement. However, the approach also has several drawbacks. For example, the user must define material types for each layer of transistors, a process that becomes tedious as the number of layers grows. Second, since the user interface is not 3-Daware (i.e. not cognizant of the fact that poly-2 corresponds to a different layer than poly-1), all transistor layers will be displayed simultaneously. While this may be acceptable for two layers of transistors, it becomes unmanageable for more. Another approach to handling 3-D circuits in existing software packages is to manage each third-dimension layer as a separate circuit. Small helper programs may be written to perform the inter-layer registration, circuit netlist extraction, and preparation of fabrication-ready output. While this approach is more sophisticated than the previous approach, it has the drawback that the user has to run multiple programs in order to obtain a working circuit; each program will have its associated learning curve. 18 By contrast, FluidLayout organizes the material regions by layer. Rather than store a VLSI layout as a collection of rectangles, FluidLayout stores a collection of layers, where each layer is stored as a collection of rectangles. This is equivalent to packing a collection of 2-D layout files into a single meta-file, and in fact, FluidLayout has the means to import two-dimensional circuits (created in a traditional software package) as individual layers of a three-dimensional circuit. This approach has advantages over the others. First, a circuit in FluidLayout may contain arbitrarily many layers. FluidLayout provides easy means to add layers to a circuit while at the same time obviating the task of managing as many files as there are layers. Second, since FluidLayout is aware of the three-dimensionality of its circuits, the user interface can display the individual layers separately while simultaneously being able to indicate inter-layer interconnections. This display interface is another area of FluidLayout where careful consideration was made of the 3-D nature of the circuits. The user must be able to manage all the layers of a circuit without having to view them simultaneously. Similarly, when laying out circuits, it must be clear both to the user and to FluidLayout as to which layer is the target of the user's instructions. Two approaches to solving these problems were considered. One way is to allow the user to set a "visible range" of materials, where the materials are ordered according to their physical order in the technology. For example, the user might wish to view all materials between the gate on layer 7 and metal 2 on layer 9. This allows the user to manage the entire circuit within a single document window, and also permits the user to view as little as one material or as much as the entire circuit all at once. However, there are several issues with this design. For example, if the user elects to view a range of materials that spans more than one layer, there is potential ambiguity when the user decides to place certain materials. For example, if the visible range encompasses gate material on layers 2 and 3, and the user wants to place a new gate, it is not clear to the interface whether this is a 2nd-layer gate or a 3rd-layer gate. A similar manifestation of this problem is that in the same situation, it is difficult to distinguish visually the two gate layers that are being viewed simultaneously. The 19 only corrective means is to restrict the visible range to at most one material of any given type. However, this then prohibits the user from viewing and editing different layers of the circuit simultaneously. The second approach is to maintain a separate view window for each layer of the circuit. Each view may then be treated exactly as a two-dimensional circuit. The only additional function to be performed is to manage the inter-layer interconnect, which can be done by message-passing between the views. This approach has the drawback that the view windows number as many as the layers, meaning that simultaneous editing of more than a few layers is prohibitively complicated. However, each layer may be edited without ambiguity, and with the assumption that the user will not want to edit more than three or so layers at a time, this option becomes the more desirable choice for implementation. 2.2 Implementation FluidLayout was written in Microsoft Visual C++ (version 5.0) for the Microsoft Windows operating systems. In the graphical user interface (GUI), a three-dimensional integrated circuit is represented as an ordered set of 2-D slices. Each slice may be manipulated as an individual 2-D circuit. The representation of a slice in the GUI is the traditional mask representation, i.e. a top-down viewpoint with metals and semiconductor represented as colored rectangular paths on a Manhattan grid. Thus, the manipulation of individual slices should be familiar to those experienced in 2-D integrated-circuit design. A circuit layout is stored internally in FluidLayout as a CCellDef object. A CCellDef object contains several 2-D circuit slices implemented as sets of CLayer objects. Each CLayer object consists of a set of CRectangle objects that represent the 2-D circuit slice materials. Additionally, a circuit layout may contain discrete layouts within it as subcells of the layout; these are referenced via CRectangle objects of type CELL. Thus, the designer can maintain a hierarchy of CCellDef objects that 20 represents designs at various levels of integration, and any given CCellDef object may be used as a subcell of another CCellDef object. The CCellDef methods are mainly used for editing the circuit layout. CCellDef has methods for adding rectangles and subcells to the layout and erasing rectangles and subcells from the layout. There are also methods for producing copies of individual slices with or without the subcell contents flattened into the slice. Most of the layout manipulation is done within the Clayer object. 2.2.1 2-D Slice Manipulation The design interface for an individual layer is modelled after that of the Magic VLSI CAD system [15, 12]. Magic is a geometric box-painting tool that has algorithms for interpreting box paintings as integrated-circuit layouts. A circuit layout is represented as a set of colored rectangles in a 2-D coordinate system; the core algorithm in Magic is thus an efficient means of rectangle manipulation [14]. Within FluidLayout, each 2-D slice of a 3-D circuit is represented as a collection of CLayer objects. The different mask layers for each slice of a circuit (e.g. gate, source/drain, metals, vias) are partitioned among different CLayer objects, with one CLayer for all metals and semiconductor, one CLayer for each type of via, and one CLayer for each type of subcell. This partitioning is done to maximize the efficiency of top-level algorithms such as rendering the layout in the GUI and extracting the circuit netlist, while not consuming excessive amounts of memory in overhead. The CLayer object, depicted in Figure 2-1, contains a pointer to a CRectangle object. Each CRectangle object contains its integer coordinates, its material type (e.g. gate, source/drain, semiconductor, or via) and pointers to CRectangle objects adjacent at the upper-right and lower-left corners. Thus, a CLayer object may be thought of as a collection of disjoint rectangles that tiles a plane. Further, there are efficient algorithms to traverse the plane from any particular starting point to a given finish point and to iterate through all rectangles within a bounding rectangle [14]. These algorithms have been implemented in the Magic source code [15, 12] and are readily implemented in FluidLayout. 21 CLayerobject U I U Figure 2-1: CLayer object with embedded CRectangle objects. 22 l top right left bottom Figure 2-2: Corner-stitched CRectangle object. Specifically, FluidLayout represents a VLSI layout as a set of corner-stitched rectangles, as discussed in [14]. Figure 2-2 illustrates the corner-stitching of a rectangle. This corner-stitching allows for linear-time searching of a CLayer object and lineartime area enumeration. To do this, each CRectangle object has a GotoPoint method. Given a CRectangle R and a destination point in the CLayer containing R, GotoPoint follows the top and bottom pointers to reach the desired ordinate and then follows the left and right pointers to reach the desired abscissa. Since following left and right may cause deviation from the desired ordinate, this procedure must be iterated until the rectangle at the destination point is found. However, since the stitched objects are convex, the algorithm is guaranteed to terminate [14]. Similarly, each CLayer object has a Paint method derived from [14]. Paint al- lows the caller to paint a rectangular region of the layer. All CRectangle objects that intersect this rectangular region are clipped against it, and the material types of the resulting pieces are adjusted to perform the painting. The enumeration of rectangles within the clipping region is done in linear time by the following procedure: by following down and right pointers from the upper-leftmost rectangle, the rectangles along 23 2 I 3 1 6 5 I 7 4 8 9 Figure 2-3: Area enumeration of CRectangle objects within a bounding rectangle. the left edge of the clipping rectangle may be identified. For each of these, horizontal swaths of the clipping region may be enumerated by following right pointers. A sample area enumeration is shown in Figure 2-3. The relevant VLSI algorithms can be expressed in terms of searches and area enumerations and are thus carried out efficiently in FluidLayout. For example, placement of a wire is done by selecting the rectangular areas where metal is desired and calling the Paint method on those areas. Viewing the circuit in the GUI is done by enumerating the rectangles within the view rectangle. For each enumeration, Windows drawing methods are called to render the rectangle. All that remains for implementation is the interconnection of distinct 2-D layers into a 3-D circuit. 2.2.2 Circuit Partitioniing There are many possible approaches to the problem of partitioning 3-D circuits among 2-D slices. From a design tools standpoint, the crux of the issue is that in 2-D design, 24 new fabrication technologies require extensive modification to the technology support in the software. For example, suppose that a 2-D design package has support for a twopoly, three-metal CMOS process. If the technology for four metal layers is developed, the design package must be modified to accommodate the new metal layer, both in the internal representation and in the GUI. While the internal representation may be readily modifiable or may already support arbitrary technologies, modifying or extending the GUI is nontrivial, and in fact, 2-D design packages generally do not have support in the GUI for arbitrarily many material layers. On the other hand, FluidLayout can support arbitrarily many material layers without modification of the 2-D slice object structure and without extensions to the GUI. This is accomplished by assigning a finite number of material layers to each 2-D slice and alloying the number of slices to vary arbitrarily. Each 2-D slice is assigned gate, source/drain, and semiconductor material as well as two metal layers and all inter-layer vias. The slice is also provided with vias from the top metal layer to the gate layer of the next slice, thus forming the inter-slice interconnect. This material set is necessary and sufficient to create arbitrary circuits within an individual 2-D slice. Arbitrary technology mappings can then be implemented by ignoring portions of slices as necessary. For example, a 3-D process with six interconnect layers per slice may be implemented by pairing adjacent CLayer objects and ignoring the semiconductor material on the second CLayer of each pair. Also, a 2-D process with arbitrarily many metal layers can be implemented by ignoring all semiconductor material except on the lowest slice. The canonical technology in FluidLayout is centered around a bottom-gate thinfilm transistor structure [21]. Each layer of transistors is thus represented as a set of TFTs along with two metal layers. This results in the technology shown in Figure 2-4. With the implementation framework thus described, FluidLayout is able to perform various system-level procedures, such as circuit verification through netlist extraction and circuit fabrication. 25 gate (next layer) q metal2 6 metall semiconductor ~Tsource/drain gate Figure 2-4: Canonical technology used in FluidLayout. 2.3 Circuit Verification One of the main advantages of having CAD software with integrated 3-D capabilities is that the software can perform tasks such as circuit netlist extraction. FluidLayout is able to extract connectivity information from 3-D integrated-circuit layouts. In order to perform this netlist extraction, FluidLayout separates a given layout into planes, where each plane contains a given material (gate, source/drain, metall, or metal2) and the vias that connect that material to the next higher material along the third dimension. Then, FluidLayout enumerates all the rectangles in each plane, starting with the lowest. Each rectangle is checked to see if it belongs to a previouslydefined electrical node. FluidLayout then checks adjacencies to determine if two nodes have been assigned to a single wire, and if so, merges the nodes. Finally, if the rectangle is not assigned to a node, and is not adjacent to a node, a new node is created. Further, if the rectangle is part of a via, the corresponding rectangle on the next plane up is marked and added to the node. This allows the extraction procedure to maintain electrical connectivity along the third dimension. In FluidLayout, a CWire object is used to identify a node. Each CWire object contains a list of pointers to the CRectangle objects associated with the node. Each CRectangle object has a generic pointer that is used in netlist extraction to point to the CWire object to which the rectangle belongs. Thus, when the area enumeration is complete, all electrical nodes in the circuit have been identified, and each rectangle is able to identify the node to which it belongs. Once this is complete, FluidLayout extracts the semiconductor planes from the 26 layout. The semiconductor rectangles are enumerated, and surrounding gate and source/drain rectangles are identified. Each valid combination of gate, source, and drain is used to construct a CTransistor object that identifies the electrical nodes for the gate, source, and drain and the width and length of the transistor channel. The list of CTransistor objects is then written to a text file using the standard SPICE format for MOSFETs. 2.4 Circuit Fabrication One motivation for writing FluidLayout is to have the ability to support any fabrication technologies that emerge in the laboratory. For example, it is desirable to be able to target designs for an inkjet nanoparticle MEMS process [7] or a VLSI/MEMS liquid embossing process [16], without having to modify the layout. Support in FluidLayout for these facilities is provided by integrating methods that handle the circuit extraction into the document class. For each fabrication process, different file formats must be exported to support the fabrication. For example, the target inkjet process is a 3-D gantry system that is computer-controlled and requires G-code. The process prints materials in the same way that an inkjet printer prints rectangles, so in FluidLayout, there are methods to extract the separate material layers and raster the individual rectangles. Similarly, the liquid embossing process uses an elastomeric stamp to separate a film of solution into desired and undesired regions. A rectangular wire is thus created using a stamp whose raised surface is the outline of the rectangle. When pressed onto a uniform film, the stamp then drives away liquid corresponding to the wire outline. These stamps are created from wafers, which are created using lithographic masks. These masks are specified using the GDSII binary file format, so FluidLayout has methods for converting a set of rectangles to their outlines and writing these outline rectangles to GDSII binary output. 27 Place source/drain metal Figure 2-5: Box-outlining is used to place materials in FluidLayout. 2.5 Design Walk-Through To demonstrate the use and capabilities of the FluidLayout software, FluidLayout is used here to lay out an inverter. A more complete walk-through is available in Appendix A. Figure 2-5 shows the use of box outlines, drawn by left-clicking at the lower left and right clicking at the upper right, to place materials in the layout. By using the materials toolbar, layout of an n-channel MOSFET is completed on the first layer, as shown in Figure 2-6. Interconnect to layer 2 is done through a via from the top metal on layer 1 (i.e., metal2) to the gate metal on layer2. As shown in Figure 2-7, the via is placed on layer 1. FluidLayout marks a corresponding via on the second layer, as can be seen in the view window for the second layer. A p-channel MOSFET is laid out on the second layer, and the complete inverter is shown in Figure 2-8. To verify that the layout corresponds to an inverter, the SPICE deck extraction 28 Figure 2-6: Complete NMOS pulldown path. Ready |X:14, Y: 20 Figure 2-7: Placement of a metal2-+gate via results in a gate-+metal2 hint on the second layer. 29 Figure 2-8: The complete inverter. feature is used. The feature size is set to 5 microns per unit grid length (lambda). The following is the circuit netlist produced: C:\WINNT\Profiles\shamikd\Desktop\inverter. sp ***** Created by FluidLayout ***** Created on 5/1/2000 ***** M1 2 3 GND! 0 NTFT W=15u L=10u M2 2 3 Vdd! Vdd! PTFT W=15u L=10u This SPICE deck may then be used for verification of the layout. A more comprehensive guide to using FluidLayout can be found in Appendix A. As shown here, FluidLayout is a useful software CAD system for laying out and fabricating 3-D circuits. These capabilities will now be demonstrated with several test circuits. 30 Chapter 3 Some Basic Transistor Circuits The immediate application of FluidLayout is in targetting simple, commonly-known circuits for an emerging three-dimensional fabrication technology. This allows both for testing the functionality of FluidLayout and for exploring the viability of the technology. 3.1 Minimum Criteria for the Technology A new transistor technology is viable for computation only if suitable multi-transistor devices can be fabricated. In particular, it is possible for transistors to provide nonlinear input-output behavior, yet still be unsuitable for multi-transistor circuitry. There are several criteria that need to be met. These criteria are evaluated within the context of the metal-insulator-semiconductor field-effect technology discussed in [17, 16]. Consider, for example, the NMOS inverter in Figure 3-1. The desired function of this inverter is to take the signal represented by in and produce the logical negation of that signal at out. Represented using voltages, therefore, if V1 is below VM, then V 0 t should be above VM, and vice versa, where VM is some midpoint voltage between Vdd and ground. To verify that this circuit produces the desired behavior, the characteristics of the individual transistors must be examined. 31 Vdd (W/L) 1 out in (W/L) 2 Figure 3-1: NMOS inverter. The governing I - V relationships for the n-channel FET are given as follows: (L) ID for VGS - VTn VDS - VTn < VDS V DS DSn (the linear regime) and ID,SAT for VGS ((Vn~ins GS = I AnCins (+) (VGS - Tn ) 2 (1 + AnVDS) (the saturation regime), where pn is the field-effect mobility, Cin, is the gate insulator capacitance, W/L is the transistor channel width-to-length ratio, VGS is the gate-source voltage, VTn is the threshold voltage, VDS is the drain-source voltage, and An is the channel-length modulation parameter [9]. It is clear, then, that this type of circuit produces the desired operation: for a low input voltage, transistor 2 is turned off and transistor 1 pulls the output voltage high, and for a high input voltage, transistor 2 is turned on, thus pulling the output voltage low, provided transistor 2 is stronger than transistor 1. Transistor 2 is thus called a pulldown, while transistor 1 is called a pullup. As important as functionality, however, is the ability of the inverter to restore logic levels. That is, while a 0 is represented ideally by 0 volts and a 1 is represented ideally by Vdd, in actuality, this is not necessarily the case. However, a functioning 32 9 M1 Vout r0 r 0~ vf 0 v Figure 3-2: NMOS inverter small-signal model about VM. logic gate should to some extent recognize and accommodate these deviations from ideality. Consider, for example, a series of cascaded inverters, each of which outputs 0 volts for an input of Vdd and vice versa. Suppose the input to the first is slightly less than VXd, say Vdd - AVi. The output of this inverter will then be greater than 0 volts by some amount, say AVst. If the inverter restores logic levels, then AV,0 t < AV". If this is not the case, then as the signal passes through each inverter, the deviation from the ideal will increase until the signal stabilizes at VM. Level restoration follows if the gain of the inverter at VM is greater than one in magnitude. Consider an inverter whose output is Vdd minus the input. Then the gain at VM is identically -1, and deviations in the input are reflected exactly in the output. The gain of the inverter in Figure 3-1 at the midpoint voltage VM can be determined by examining the small-signal model of this inverter, shown in Figure 3-2. From the model, it follows that the gain of the inverter is yout Vin m2ro _ 2 + 9mlTo 33 where gm is the small-signal transconductance, defined as gm = 2 pCims (EL) and r, is the small-signal output resistance, defined as r, = (AID,SATY 1 [9]. ID,SAT, Thus, having a large transconductange and a large output resistance is critical to the performance of the inverter: consider the cases where gmir, is small or is large. If the product gmiro is small compared to 1, then the gain is essentially -gm2ro/2, which is also small. If, on the other hand, gmiro is sufficiently large, then the gain is approximately -gm2/gm1 = g. The transistor sizing is thus dictated by the need for high gain. The technology described in [17, 16] features transconductances on the order of 10-5 S and output resistances on the order of 106Q for a device with W = 292.5 pm and L = 2 pm at a VM of about 10 volts. This indicates that a gain of greater than unity is achievable with device sizes that currently can be fabricated. 3.2 Design of Basic Circuits In order to test both the fabrication technology and the capabilities of FluidLayout, some simple circuits are laid out using FluidLayout, and fabrication-ready output is produced. The layout used here consists of an inverter, a NOR gate, a basic static memory cell, and two ring oscillators. The inverter implementation used in this layout is that in Figure 3-1. The device sizes used are channel length of 10 pm, a channel width of 200 pm for the pullup FET, and a channel width of 1200 pm for the pulldown FET. Using the above device parameters, this should provide a gain of approximately -1.5. The NOR gate uses two pulldown transistors wired in parallel. Each is identical in size to the inverter pulldown. The memory cell, discussed in detail in Chapter 4, uses a pair of coupled inverters together with two access transistors. Finally, the two ring oscillators are different layouts of the same circuit. This circuit comprises three inverters wired in a series loop. Provided that the inverters have sufficient gain, a latent signal on an input to one of the inverters is amplified to 34 Figure 3-3: Layout of test devices. or to ground as it passes through the inverters. Further, when the signal returns to its starting point, it does so as its inverse, so that the signal oscillates when viewed at any fixed point. However, if the inverters do not have sufficient gain, the signal Vdd will decay to the midpoint voltage, VM. Thus, a ring oscillator is an ideal circuit to test intrinsic device parameters. Further, it is possible to examine different layout strategies with this multi-gate circuit. In particular, the second ring oscillator, though electrically identical to the first, is partitioned along the third dimension into three layers, with one inverter on each layer. Power and ground signals are distributed through vias to the upper layers, and the return signal from the output of the last inverter to the input of the first inverter travels through a via stack located near the outputs. 35 .M a - if B Figure 3-4: 3-D layout of a ring oscillator. The three inverters shown are stacked to form the 3-D layout. 36 The first layer of the layout of the entire structure is shown in Figure 3-3. Figure 34 shows the 3-D layout of the ring oscillator, including the 2nd and 3rd layers of the layout. In order to verify the functionality of the devices in this structure, the SPICE netlist is extracted using FluidLayout. 1 ***** C:\WINNT\Profiles\shamikd\Desktop\Maskl.sp ***** Created by FluidLayout ***** *** Created on 5/1/2000 inverter M1 invout inv_in GND! 0 NTFT W=1200u L=10u M34 invout Vdd! Vdd! 0 NTFT W=200u L=10u *** NOR gate M2 norout norA GND! 0 NTFT W=1200u L=10u M3 norout norB GND! 0 NTFT W=1200u L=10u M35 norout Vdd! Vdd! 0 NTFT W=200u L=10u *** SRAM cell (4,7 are internal bit storage nodes) M4 7 sramWL sramBL 0 NTFT W=200u L=10u M13 sramBLBAR sramWL 4 0 NTFT W=200u L=10u M5 7 4 GND! 0 NTFT W=1150u L=10u M6 GND! 7 4 0 NTFT W=1150u L=10u M36 7 Vdd! Vdd! 0 NTFT W=150u L=10u M37 Vdd! Vdd! 4 0 NTFT W=150u L=10u *** M14 M38 M15 M39 M16 M40 2-D ring oscillator (three inverters) 6 ring GND! 0 NTFT W=1200u L=10u 6 Vdd! Vdd! 0 NTFT W=200u L=10u 5 6 GND! 0 NTFT W=1200u L=10u 5 Vdd! Vdd! 0 NTFT W=200u L=10u ring 5 GND! 0 NTFT W=1200u L=10u ring Vdd! Vdd! 0 NTFT W=200u L=10u 'This SPICE deck has been edited for clarity. For example, with typical circuit layouts, the area enumeration algorithm results in all the n-channel devices grouped together and all the pchannel devices grouped together. The SPICE deck shown here has the transistors grouped by function. Also, since one of the features of the technology is the ability to route gate across source or drain, the netlist extraction will sometimes output a transistor as a parallel combination of two or more smaller transistors. This will allow for more accurate capacitance modelling once the relevant parameters have been obtained from the technology. However, in the SPICE deck shown here, parallel transistors have been merged. 37 *** 3-D ring oscillator M29 3 ring_3D GND! 0 NTFT W=1200u L=10u M41 3 Vdd! Vdd! 0 NTFT W=200u L=10u M42 2 3 GND! 0 NTFT W=1200u L=10u M47 2 Vdd! Vdd! 0 NTFT W=200u L=10u M48 ring_3D 2 GND! 0 NTFT W=1200u L=10u M53 ring-3D Vdd! Vdd! 0 NTFT W=200u L=10u Functional simulation can then be performed using this netlist to verify the performance of the circuits. The circuit can now be extracted to output that can be used for fabrication. The process in [17, 16] uses elastomeric liquid embossing to form circuit patterns. Each material layer (gate, source/drain, etc.) is patterned using a unique part of the stamp. This stamp is created using a wafer as a mold. Thus, FluidLayout is used to extract the circuit to a GDSII binary stream that can be used to fabricate a wafer mask. This mask pattern is shown in Figure 3-5. From this mask, an elastomeric stamp is created. This stamp is used to pattern solution-processed materials, which are then cured. The resulting structures form the gate, source/drain, semiconductor, and interconnect for the circuits. For example, Figure 3-6 shows the gate metal for the NOR gate, SRAM cell, and 2-D ring oscillator. Figure 3-7 shows the source/drain metal for these circuits. Thus, FluidLayout is useful both for laying out circuit structures and for fabrication of these circuits. For the remainder of this thesis, some designs will be examined that utilize more of the potential of FluidLayout. In particular, the focus is no longer on rapid prototyping of three-dimensional integrated circuits, but instead on exploring the architectural ramifications of being able to lay out circuits in 3-D. 38 l laye r 13 a=eem layer 3 pte-via layer 3 gate I layer 2 somreMrain-via ayvi2 nmtti2 layer 2nuttl- t lUavr 2 -> lApr 3 via e-pe layer 2 gate-via ser% ukto r I i layer 1 sourceldrain-via layer 3 p-type sermicqndutor I111layer 2 metal layer 2 gate liarige3 -y layer 1 metal l layer I zstul 1 -ia layer 1 retal 2 layer 1 source/dain layer 1 n-type smnicnductor -layer I - layer.12 via TI1i1' I iL I layer 1 gate I layer 1 gate-via layer 1 p-type semicondutor IM- I Figure 3-5: Stamp pattern with spatially-separated material patterns. 39 -. C I V 47 C a ii B Fl H 0 a 4 I 4 Figure 3-6: Patterned gate for a NOR gate, SRAM cell, and ring oscillator. 40 2 -S0 4 1.r 41 ~ K * It ~' ~____ -- k-a- Iji : 1: 9e* ~ VZY K - -'I "7 .0: 1' Si K' AI4 0 I--, p _ -Yr Figure 3-7: Patterned source/drain for a NOR gate, SRAM cell, and ring oscillator. 41 a 'I Chapter 4 The Static Random-Access Memory The technology for fabricating true three-dimensional integrated circuits is very much a new technology; it has yet to approximate tried-and-true 2-D fabrication in terms of feature size and circuit speed. However, preliminary research ([17, 16]) suggests that the fabrication of multilayer transistor circuits with transistor properties rivaling that of conventional silicon MOSFETs is neither unreasonable as a research goal, nor unrealistic as a commercial technology within the near future. This is one of the main reasons that FluidLayout has been engineered to handle complex three-dimensional integrated circuits while at the same time providing the file formats necessary for rapid prototyping of basic circuits. In order to demonstrate the capabilities of the FluidLayout software as well as the benefits of the three-dimensional medium, FluidLayout has been used to design a multilayer static random-access memory (SRAM). This implementation, shown in Figure 4-5, is capable of statically storing 512 bits of data in a chip whose footprint is the same as that of a 64-bit 2-D SRAM. Further, read/write power dissipation for this chip is approximately that of the 64 bit 2-D SRAM as well. It is helpful to examine this 3-D SRAM in the context of its 2-D counterpart. 42 4.1 Background and Motivation A static random-access memory must provide certain features by virtue of its name. Memory signifies that the circuit must have a read operation, and optionally, a write operation, by which a user may store data via a write (for writable memories) and retrieve the same data later, via a read. The memory may therefore be read-only (ROM) or read-write. Memories are further characterized by their access mechanism. In a multibit memory, the storage location of a particular bit or group of bits may be identified by an address. The access mechanism may restrict the user to data retrieval from sequential addresses; this is implemented in circuits such as the FIFO (first-in, first-out) and the shift register. Alternatively, access to random addresses may be permitted. This random-access memory generally requires more sophisticated control circuitry. Finally, memories may be categorized according to the permanence of their storage. Non-volatile memories do not require external power to retain their contents. For example, erasable, programmable read-only memories (EPROMs) implement the write mechanism using special circuitry and/or voltages that are not accessible during the normal read operation of the circuit. A specific, commonplace example of this type of memory is the Flash ROM. By contrast, volatile memories lose the contents of memory if the power supply is removed. Volatile memories may be further categorized as static or dynamic depending on the mode of storage. In static memories, such as SRAM, the circuit actively reinforces the value of the bit stored in memory. In order to overwrite a bit, the write circuitry must be able to overcome the static protection on the old bit in memory. Conversely, in dynamic memories, such as DRAM, bits are stored using the capacitance of internal nodes (or using an explicit capacitor for each bit). No circuitry protects the stored value; thus, the memory may be overwritten by charging or discharging the capacitor. Since capacitors tend to discharge naturally through leakage, dynamic memories must be refreshed periodically to prevent loss of data. This refresh introduces an overhead that may be recouped due to the increased storage density of bits in DRAM versus bits in SRAM. 43 Of the various types of memory, writable memories have seen the largest push for increased density. There are multiple reasons for this: first, as computers become faster and more sophisticated, the size of desirable computations grows, and thus the need for computer memory grows also; increased DRAM density allows one to increase the amount of main memory available to a computer, and increased SRAM density allows computer architects to increase the amount of memory cache available to a processor. Second, the advent of digital media has created a similar push in the consumer goods market, as digital cameras store pictures on Flash ROMs, and users of personal computers transfer media from laptops to palmtop organizers and portable audio players using any number of memory-card interfaces. Of these three types of memory (Flash ROM, DRAM, and SRAM), Flash ROM and DRAM place the most emphasis on bit density, and thus would benefit most from the area improvements that a three-dimensional process would bring. However, both Flash ROM and DRAM utilize special fabrication processes. In the case of Flash ROM, special transistor structures are used to provide the non-volatility, while in DRAM, explicit capacitors are often used for the bit storage mechanism. On the other hand, SRAM may be fabricated using the same technology as is used for logic, which is why all embedded memory has been in the form of SRAM until quite recently. It is for this reason that the layout of a three-dimensional SRAM is explored. SRAM benefits from the third dimension as much as DRAM or Flash memory would; and while the density of 3-D DRAM would exceed that of 3-D SRAM, the fabrication technology for 3-D SRAM is expected to be more readily available. 4.2 SRAM operation In SRAM of any dimensionality, the basic storage mechanism remains the same. The core of bit storage in the SRAM is the coupled-inverter cell shown in Figure 4-1. The bit may be thought of as being stored at node Q. Thus, in this circuit, the coupled inverters provide positive feedback on the stored bit, ensuring that the bit is not erased or overwritten unless so desired. As long as power is supplied to the 44 BL BL WL M1 M2 Figure 4-1: Six-transistor circuit for individual bit storage. inverters, the state of Q and Q is maintained; this provides the static functionality of the memory cell [23]. Access to the cell contents is provided through access FETs MI and M2. To read the cell contents, the word line WL is pulled high, turning on Mi and M2. The bit lines BL and BL are then pulled to Q and Q by Mi and M2 respectively. To write to the cell, the desired bit and its inverse are asserted on BL and BL respectively. WL is then pulled high. Ml and M2 must then be able to reverse the state of Q and Q in order to write to the cell. Since each inverter comprises two transistors, the cell as a whole is referred to as a 6-T (six-transistor) cell. The cell may be compressed to five transistors by removing MI or M2; however, this requires very careful design of the cell, as reversing the internal state during a write is now more difficult. An SRAM comprises any number of these cells arranged into an array of rows and colums. All of the cells in a given row share the same word line, and all of the cells in a given column share the same bit lines. This has two consequences: first, read and write operations act on whole words at a time, where a word consists of however many bits there are in a single row of the memory. Second, since columns share bit lines, no more than one row at a time may be permitted access to the bit lines of the memory. Thus, an SRAM must contain control circuitry that limits bit-line access to 45 a single row. This circuitry is therefore called row-select or row-decode circuitry. Further, in practical SRAM design, the word size is often much smaller than the desired size in words of the memory. A straightforward implementation would thus result in a memory that is much taller than it is wide. However, packaging constraints generally favor circuits that are square. The standard approach to solving this problem is to split the memory into N columns, where N is chosen to make the layout approximately square. A given row selection will thus address N words simultaneously. To refine this selection to the one desired word, column-select or column-decode circuitry is added to the memory. Typically, then, an SRAM interface consists of A address bits and D data bits. The A address bits are split into R row bits and A - R column bits. The memory core of the SRAM thus consists of 2R rows and D x 2 A-R columns of individual 6-T cells. 4.3 Extensions to 3-D A cursory examination of the two-dimensional SRAM structure shows that to first order, a particular cell is addressed via a row (word) line and a column (bit) line. The 2-D memory is thus addressed by a row-column matrix. It seems natural to extend this idea to three dimensions by incorporating a third dimension into the matrix addressing. At the cell level, this is done by adding extra transistors to the circuit. The addition of transistors M3 and M4, as shown in Figure 4-2, allows a thirddimension word line, WL2, to control access to the bit lines. However, an immediate objection to the circuit in Figure 4-2 is that it requires two additional transistors per bit of storage. This overhead could be prohibitively expensive in high-density SRAMs. Proposed solutions include making the 3-D SRAM cell single-ended by removing transistors M2 and M4 and bit line BL. This results in a six-transistor cell whose area is essentially that of the standard 2-D SRAM cell. However, if the SRAM is to be extended to three dimensions, this must be done in a way that considers the overall performance of the memory. In a conventional 46 BL BL WL2 WL M3 M2 M1 M4 Figure 4-2: Eight-transistor circuit for use in 3-D SRAM. SRAM, each row must be coupled to an active row decoder circuit, often implemented as an AND of address bits, that drives the word line for that row. This active decoder is necessary to prevent contention for the bit lines, which would happen if two word lines were simultaneously high. (The columns, by contrast, may be decoded using pass transistors since it is permissible to leave the bit lines in an unknown or disconnected state.) Therefore, using the 3-D matrix-of-word-lines approach to 3-D SRAM, there are necessarily more word lines to drive. In the single-ended SRAM described above, for example, a second set of word lines (running perpendicular to the first set) is needed. Thus, in addition to the decoder circuit for each row, this SRAM implementation requires an active decoder for each column. In addition, there are power considerations in choosing how to address the cells in the SRAM matrix. The bit lines are usually high-capacitance wires whose switching consumes a good deal of power. It is desirable to minimize the number and length of these bit lines, and to avoid switching the bit lines if possible. For an N-bit data bus, it would be optimal to have exactly 2N bit lines (N if the memory is singleended). However, two considerations interfere as the number of words in the SRAM grows: first, with the number of bit lines fixed, the length of the bit lines grows with the number of words, and thus the switching time and power consumption increase. Second, as the number of words grows, the aspect ratio (width-to-length ratio) of the 47 N-K '' col. decode K Figure 4-3: Proper cell distribution improves aspect ratio and decreases bit-line length. SRAM deviates further from 1:1, which is the square footprint that is most desirable for packaging. For high-density memory, having multiple sets of bit lines is a necessity. For example, a commonplace 2Kx8 (16384-bit) two-dimensional SRAM may be partitioned into 2048 rows and 8 columns. This would require only 16 bit lines, but these wires would be prohibitively long, and the memory would be many times taller than it would be wide. A better packaging scheme would be to have 128 rows and 128 columns and to use a 16:1 decoder to provide 8 bits of output. While this means that potentially 16 times as many bit lines could switch simultaneously, the switched capacitance is reduced by the same factor (since the length of the lines has been reduced). While the power consumption of the latter configuration is thus essentially the same as that of the former, it is possible to utilize the benefits of the former to reduce the power consumption of the latter. Specifically, the main benefit of the 2048x8 configuration is that for any given access, this configuration activates the fewest number of memory cells (in this case, eight). The 128x 128 approach activates 128 memory cells and then selects the relevant eight bits of those. The reason for this is that the 16:1 decoder can be implemented more compactly using pass transistors that come after the memory-cell activation, thus trading off chip area for power 48 consumption. If, instead, the decoder were implemented at the cell level, only eight cells would need to be activated. This could be coupled with a redundant 16:1 decoder so as not to increase the length of bit lines. (The in-cell decoder eliminates the need for an external column decoder; the bit lines may be wired together in bundles of 16. However, this increases the bit-line capacitance.) In two-dimensional memories, in-cell decoding results in a number of trade-off choices, as integrating another decoder into the cell increases the cell size and decreases the bit density. The issue is that the word lines are used to activate cells, and in the 128x128 cell example, a single word line activates 128 cells. In-cell decoding is thus just the use of an orthogonal set of word lines to select a subset of the 128 cells for activation. If, instead, parts of a word line could be independently tri-stated, the 16:1 decoding could be done without increasing the cell size. For example, a 128-cell word line could be separated into 16 blocks of eight cells. When the address decoder drives a word line high, block decoders could be used to select which of the 16 blocks receives the word line. The other 15 blocks would remain at their previous state (possibly active). The previously redundant 16:1 decoder would then be used to ensure that only the currently active cells have access to the data bus. The difficulty lies in implementing this tri-state scheme in a two-dimensional memory. The motivation for using word lines and bit lines is that the memory cells are self-wiring and the word line control is simple. Further, and more importantly, the visibility of signals on the word lines is sequential. That is, the memory cells on a single word line receive signals on that line in order from closest to the decoder to farthest away. If any signal on a word line is not meant for a particular cell, there must be a "pass-through" mechanism for the cell so that the signal may be transmitted to cells further down the chain. This mechanism requires that there be two word lines for each row - a master word line that traverses the entire row, and a per-block slave word line that is wired to the master through a pass transistor (which serves as the tri-state switch). This additional complexity results in increased chip area or diminished functionality (e.g. by sacrificing bit lines to recover area for the tri-state transistors). Further, the power requirements increase by a significant proportion 49 row of cells in a 2-D memory master word line ++++ slave wo rd line access transistor two stacked rows in a emo 3-Df y1 Figure 4-4: 3-D partitioning of rows allows for simple tri-stating of word lines. since the master word line must be driven in both cases. While this approach is complicated for two-dimensional memories, in three-dimensional memories, the resulting tree structure of master and slave word lines can be geometrically rearranged for a 3-D memory that is efficient in both area and power. As shown in Figure 4-4, the blocks that share slave word lines may be treated as rows in a 3-D SRAM. The tri-state circuitry may therefore be stacked along the third dimension, thus reducing the master word line to a series of stacked vias. The total length of the word lines is thus reduced back to essentially that of the original 2-D SRAM without tri-stating. In addition, the power dissipation is reduced by a factor asymptotically equal to the number of layers relative to a 2-D SRAM of the same capacity. This may be formalized as follows: suppose a 2-D SRAM has been partitioned into MV/7 rows and NvY columns; further assume that the SRAM cells are of unit length and width. The word lines are then of length N/7 while the bit lines are of length MVI. The SRAM is now to be repackaged into a 3-D SRAM with L layers, via the following procedure. The 2-D SRAM is first repackaged into M rows and NL columns, with a L:1 column decoder. This reduces the length of the bit lines to M; however, as vIE times as many bit lines are switching, no power savings are realized thus far. In fact, 50 the word lines are now longer by a factor of vi; to counteract this effect, the rows are then partitioned into L blocks of N cells each, and each block is assigned a slave word line that is fed from the master word line and tri-stated. (It is noted for the time being that as stated before, this configuration still dissipates appreciably more power in the 2-D case.) To construct the 3-D SRAM, each block of N cells in a row is assigned to one of L layers along the third dimension, as illustrated in Figure 4-4. This restores the original aspect ratio of M : N. The repackaging of the 2-D SRAM into three dimensions is complete. The worst-case switched capacitance (two word lines and all bit lines on a layer) is reduced from 2N L + 2MNL = 2NVI( 1+ M L) to 2N + 2MN = 2N(+ M). This is therefore the approach that will be taken in laying out a 3-D SRAM, which is explored in the following section. 4.4 Layout of a 3-D SRAM The first and second layers of a 512-bit 3-D SRAM are shown in Figure 4-5. The design of this SRAM is based on the concepts outlined above. The memory cell used in the 3-D SRAM is a standard 6-T cell, shown in Figure 46. This means that volume utilization is asymptotically equal to the area utilization of a 2-D SRAM of the same capacity. In order to obtain the power savings available to the 3-D SRAM, each of eight layers of the SRAM uses three address bits as a "layer select." That is, three address bits are used to provide the tri-state signal for the entire layer. If a given layer is not selected, all word lines and bit lines on that layer are disconnected from the row decoders and bit-line drivers (located on the first layer). The row decoder outputs and individual bits on the data bus are thus actually tri-stated buses wired along the third dimension. Figure 4-7 shows the connection of a row decoder to the word line through a pass transistor. Figure 4-8 shows the connection of bit lines to the tri-stated buses making up the data lines. Thus, the SRAM effectively acts as eight 2-D SRAMs whose I/O pins are wired 51 if"' %M JI, 26R1_%,i 6.:- 6;-.%.-% IWO. Ot lop L' 1 %Z -Vw 'all a 47,lir 0 "6 it 46 tfka II maqup 7: n77uMA El U go PIP q --ir U LI t4' Figure 4-5: First and second layers of an eight-layer 3-D SRAM. 52 Figure 4-6: 6-T SRAM cell layout. tri-state signal to - row decoder access FET Figure 4-7: Word-line tri-stating. 53 word lmie _ Wii-state si'gnal ~,,bit-hne write -enable Figure 4-8: Bit-line decoding using the word-line tri-state control signal. together, with the important difference that the row decoders and bit-line drivers are reused for all eight layers. Like a stack of 2-D SRAM chips soldered together, only one layer consumes power for any single operation. The power consumption of the 512-bit 3-D SRAM shown here is thus essentially that of a single 64-bit layer. The circuit area, as can be seen in Figure 4-5, is little more than that of a 2-D SRAM with an 8:64 aspect ratio, as the seven upper layers are dedicated to storage cells. The area overhead incurred by the 3-D implementation is in the form of access transistors that are clearly visible in the layout of the second layer; this overhead is not due to the three-dimensionality of the architecture, but instead a tradeoff for maintaining low power consumption as described above. Finally, while speed has not been the primary consideration here, care has been taken not to sacrifice performance in this regard. All word and bit lines are strictly shorter in the 3-D case, and the layer select computation is done in parallel with the row decoding. At any rate, the use of sense amplifiers would greatly reduce the dependency on bit line speed, and consequently diminish any speed advantage presented by a 3-D SRAM with this architecture. It is clear, then, that memory architectures benefit directly from implementation in a three-dimensional medium. It is useful to consider now the suitability of this medium for computation. 54 Chapter 5 The Cellular-Automata Machine Ever since the advent of integrated-circuit technology, general-purpose computation has been one of its main applications. Today, about half the world semiconductor market is in computer chips. Further, research into novel computational architectures is highly active. Many of these architectures could benefit from integration in a 3-D technology; one such architecture is the cellular automata machine. 5.1 Background Much of computational theory is centered around what are called finite automata. In particular, interest is centered here on a particular class called deterministic finite automata (DFAs), also known as finite-state machines (FSMs). 5.1.1 Finite-State Machines A finite-state machine is a quintuple (Q, E, 6, qO, F). Q is a finite set of states in which the machine can be. E is a finite alphabet from which the inputs to the machine are taken. 6 is the transition map, defined as a function 6 : Qx E --+ Q that maps a combination of the current state of the machine and the current input to the next state of the machine. qO is the initial state of the machine, and F C Q is the set of accepted final states. 55 inputs 0 combinational 0 logic 1 0 register current state 0 A. Finite-State Machine B. Hardware Implementation Figure 5-1: Finite state machine. The finite-state machine, depicted in Figure 5-1, is thus designed for synchronous operation. The FSM is set to state qo and an input is provided. At each tick of a global clock, 6 determines the next state, and the current input is discarded for the next input. If at the end of the input stream, the state of the machine is in F, then the machine is said to accept the input. The FSM admits a straightforward hardware implementation, also shown in Figure 5-1. FSMs are thus used for many types of control hardware. More important, however, is a modification of an FSM called a Turing Machine. A Turing machine comprises an FSM and an infinite tape. At each tick of a global clock, the FSM reads from the tape (its input) and decides (1) its next state, (2) an output to the tape, and (3) whether to move left or right on the tape. A Turing machine can thus be defined by a septuple (Q, E, F, 6, qO, B, F). Q is again the set of states of the FSM. F is the finite tape alphabet. B is a special character, the "blank." E C F - {B} is then the input alphabet. 6 : Q x F - Qx F x {left, right} is then the transition function for the Turing machine. F is again the set of accepted final states. Turing machines are important for general-purpose computation because for the 56 Finite-State Machine Read/Write Head #I#I#I#0 I1 l0l11111101011#1#1## I#I.l## Infinite Tape Finite Alphabet (0,1,#,...,],/,@) Figure 5-2: Turing machine. appropriate choice of parameters, there exists a Turing machine that can take as input a specification of an arbitrary Turing machine and an input to that machine, and simulate the behavior of the specified machine on that input. The simulator is thus called a Universal Turing Machine (UTM). It has been shown that a UTM is capable of solving a vast class of useful problems [19]. In particular, a UTM is computationally equivalent to modern general-purpose processors, in that a UTM can simulate the behavior of any of these processors, and vice versa. Therefore, any architecture that is equivalent to a UTM is suitable for general-purpose computation. Since certain cellular-automata machines can be shown to be equivalent to a UTM, it is worthwhile to explore this architecture further.' 5.1.2 Cellular-Automata Machines The cellular-automata machine (CAM) is a variant on the finite-state machine much in the same way a Turing machine varies from the FSM: the CAM adds unbounded memory to the system. The chief difference is that in a Turing machine, one pro'A more in-depth review of these concepts is available in many texts, such as [19]. 57 FSM FSM state state FSM FSM Figure 5-3: Four cells of a cellular-automata machine. cessing element has access to the unbounded memory, but in a CAM, the memory is distributed as finite chunks among an infinite array of processing elements. Specifically, a CAM is a regular array of cells, each of which is an FSM. For any given cell, the input to the cell is a set of states of other cells; the set of cells whose states are polled is called the neighborhoodof the given cell. CAMs generally also have the property of uniformity; that is, each cell has the same FSM and the topology of the neighborhood is invariant from cell to cell. It is, therefore, only the input (i.e. the initial configuration of states) that varies from program to program, as implemented on a CAM. Part of a hypothetical CAM is thus shown in Figure 5-3. This CAM features a square array and exhibits a common neighborhood known as a Moore neighborhood, which contains all cells in the immediate "square". Thus, in this CAM, each cell has eight neighbors. Unlike UTMs, CAM architectures are highly variable, in that there are many choices for the cell topology and for the neighborhood. For example, a hexagonal grid is used for many problems in physics. The CAM shown in Figure 5-3 uses a 58 2-D mesh. As is no doubt evident, the topology to be considered here is the 3-D rectangular mesh. This topology is suitable for a number of reasons. First and foremost, it has been shown that there exist CAMs using 2-D and 3-D meshes that are equivalent to a UTM [20]. Secondly, it admits a more efficient layout in three dimensions than in two. To demonstrate these efficiencies, a sample CAM is laid out that executes the Game of Life in three dimensions. 5.1.3 The Game of Life The Game of Life, as devised by John Conway [6], is a simple specification for a two-dimensional cellular-automata machine that exhibits a variety of interesting behaviors. The cells are arranged in a Moore configuration so that each cell talks to its eight nearest neighbors. At any given time, each cell is in one of two states, alive or dead. The transition rule for a cell is as follows: a living cell remains alive if and only if it has exactly two or three living neighbors (the environment condition), and a dead cells becomes alive if and only if it has three living neighbors (the fertility condition). More formally, a candidate Game is specified by the quadruple (El, E", F, F"), where a living cell remains alive if and only if it has m living neighbors with E < m K E, and a dead cell becomes alive if and only if it has n living neighbors with F < n K F,. The Conway Game of Life is thus (2, 3,3,3), hereafter termed Life 2333 [1]. Life 2333 exhibits some important behaviors. For example, there exist cell configurations that are stable (i.e. that maintain the same state indefinitely). configurations oscillate (i.e. cycle through a finite set of states). Other Further, some configurations, called gliders, translate across the plane. Finally, Life 2333 has configurations called glider guns, which are stationary configurations that emit gliders at regular intervals. It has been shown that using these glider guns, Life 2333 can be made to emulate 59 Boolean logic, and thus is equivalent to a UTM [6]. Fabrication of a Game of Life CAM is thus potentially more interesting than fabrication of some special-purpose machines. All that remains is to extend Life to three dimensions. The main difficulty is that a Moore neighborhood on a 3-D mesh consists of 26 cells. Thus, the range of candidate Games is greatly increased over the 2-D case. However, there do exist several candidate Games in three dimensions. Two of these are Life 4555 and Life 5766. While Life 4555 exhibits more prolific behaviors, Life 5766 has the interesting property that many special configurations in Life 2333 are extensible to Life 5766. In fact, Life 5766 is provably most analogous to Life 2333. However, it is not known whether a glider gun exists for either three-dimensional game [1]. Nonetheless, Life 5766 is used as the CAM to be laid out, as a wealth of configurations are already known. 5.2 Layout of a 3-D Game of Life The layout of a 3-D cellular-automata machine consists of two basic steps. First, the individual cell must be laid out; second, the cells must be wired together, and global signals must be distributed. It is in this second phase where layout efficiencies will be examined. This is for several reasons: primarily, this is due to the fact that the cell contents are highly variable from CAM to CAM; thus, optimizations for one particular CAM may not generalize. Secondarily, the motivation behind the CAM is towards simple cell design, and the design of simple FSMs is not terribly interesting. Nevertheless, the cell design will be discussed. 5.2.1 Game of Life Cell The cell behavior in the Game of Life, and in Life 5766 in particular, is easy to specify in a rule-based form. This means that Life is easy to implement in software. However, Life does not admit a simple expression in hardware. The hardware implementation in three dimensions is confounded by the fact that the neighborhood size is 60 26 cells as opposed to 8 in the two-dimensional case. Thus, neither a straightforward combinational-logic approach nor a lookup-table approach is feasible. The standard approach to Life-type problems in hardware is to emulate the software mechanism. In software implementations, the cell states are stored in memory and an accumulator is used to count the number of live cells in the neighborhood. A hardware implementation could then consist of a number of adders. However, a neighborhood of 26 cells implies 25 additions, some of which are necessarily five bits wide. The resulting logic structure would both be unnecessarily large and compute unnecessary information. The only way to implement such a cell using logic or arithmetic is through hardware reuse; that is, computation of the state must be done over several clock cycles, where different inputs are considered on each cycle. By contrast, an area-efficient single-cycle implementation must recognize that permutations in the input data are irrelevant and that only two result bits are necessary to compute the output (these being whether El < n < E, and whether F < n < F). Combinational logic is thus inefficient because each permutation of the input data must be recognized by a separate pulldown or pullup path. Lookup-table implementations fail for this reason as well. On the other hand, an adder implementation is inefficient because it produces extraneous output; it is not possible to discard higherorder bits of the sum, even though these bits are not desired for all practical Life implementations. 2 Thus, in order to simplify the FSM structure, non-computational approaches must be considered. One such approach is through sorting nets. A sorting net is a hardware structure that takes n unsorted inputs and produces n outputs that are the inputs in ascending or descending numerical order. Sorting nets differ from conventional sorting algorithms in two ways: first, comparisons are done in parallel, and second, the sequence of comparisons is independent of the input. Sorting nets are therefore 2 In the construction of a practical CAM, i.e. one that is more adept than the Game of Life at general-purpose computation, the computational structure of the FSM is paramount. That is, the cell would be implemented via a lookup table or via pipelined combinational logic, and the ability to do single-cycle computation on all 26 inputs would be sacrificed. The emphasis would thus be on optimization of the logic, rather than on optimization of the overall CAM structure, which is the focus here. 61 A, A2 A, Vdd AA B, B2 B, B Figure 5-4: Insertion sort bit-slice. ideally suited for hardware implementation. Life may be implemented by taking the input bus and sorting the bits on the bus. If the nth-highest bit is a 1, this then guarantees that there are at least n live neighbors. It is then straightforward to compute the next state of the cell by examining the relevant wires in the bus. An area-efficient wide sorting net can be implemented as an insertion sort. Insertions are efficient because each bit is inserted either at the top of the list or at the bottom. Consider the circuit in Figure 5-4. Bits A 3 , A 2 , and A 1 are sorted, and A 0 is to be inserted. Bits B 3 .. .B 0 are thus also sorted. A 26-bit sorter may be constructed using this method. Moreover, all but the fifth through eighth highest bits may be discarded for Life 5766. The next state may then be computed as a combinational function of these four bits and the current state. The Game of Life cell is shown in Figure 5-5. 5.2.2 Game of Life CAM Architecture Of vital interest is the usability of the 3-D medium for system-level optimizations of the CAM architecture. In particular, the inter-cell communication and clock distribution both benefit from implementation in three-dimensions. System I/O can also be done in an efficient manner. To experiment with some of these optimizations, the Game of Life cell is fitted into a 4 x 4 x 4 rectangular mesh. Inter-cell communication is a difficult problem mainly because of the number of wires - 26 per bit of state. This problem confounds implementation of a 3-D CAM 62 state logic output rester Figure 5-5: Game of Life cell. in a 2-D medium. With a three-dimensional medium, the problem reduces to that of wiring the eight-neighbor 2-D configuration. Each cell requires inputs from its eight neighbors on the same layer, nine neighbors from one layer down, and nine neighbors from one layer up. An efficient 3-D wiring scheme is to wire the eight same-layer neighbor inputs to the cell, and then wire each input to three via stacks. The first stack connects to the gate on that layer, the second stack goes up to the next layer, and the third stack goes down to the layer below. Beneath the second stack lies a via stack coming up from the layer below, and above the third stack lies a via stack coming down from the layer above. Thus, when the cells are aligned along the third dimension, the vias align to provide I/O between layers. Clock distribution is done through an H-tree network. The clock net on the first two layers of the CAM is shown in Figure 5-6. The four peripheral vias connect the four-cell H-trees on the second layer to the four-cell H-trees on the first layer, thus forming a 3-D H-tree. The central via then connects the 32-cell H-tree shown in the 63 |- Figure 5-6: Clock distribution in the 3-D Game of Life architecture. figure to the 32-cell H-tree on the third and fourth layers of the CAM. The savings in wire length can be seen immediately. Distributing the clock signal to an 8 x 8 2-D grid would require four H-trees like the one in the left half of Figure 5-6; this 3-D implementation requires two. Further, the four H-trees in the 2-D implementation would have to be connected into an H-tree, requiring yet more wire. In this 3-D circuit, the trees are connected by the central via stack. Thus, the 3-D medium features wire-length savings, and thus power savings, that can be exploited readily. Of course, a 3-D architecture is useless if there is no way in which it can be programmed. System I/O thus becomes an important concern. While various bitserial methods such as shift-register implementations have been proposed for this purpose, the approach taken here is to integrate a 3-D memory interface with the CAM. Specifically, each row receives a word line and each column receives a bit line. Each cell also receives a write-enable signal that is used to program the CAM and that 64 also serves to disconnect the state logic from the output register. External addressing and control can then be used to select a multi-bit word for reading out of the CAM while the CAM is operating, or can be used to stall the CAM and alter some or all of its state. The CAM architecture shown here may thus be visualized as a 3-D memory with integrated processing elements. A single layer of the 4 x 4 x 4 Game of Life architecture is shown in Figure 5-7. 65 -* 0 C .0 0 Chapter 6 Conclusion Currently, integrated circuits are fabricated using a CMOS technology which constrains the circuits to a two-dimensional geometry. This places fundamental limits on the density of both memory elements and processors in a single chip. The development of new fabrication technologies, however, has made feasible the implementation of three-dimensional (multiple-layer) integrated circuits. The possibility of creating such devices will open new doors for circuit design and fabrication. In order to be prepared to utilize this technology, software design tools have been developed that allow circuit designers to utilize their background in 2-D CMOS logic design to develop 3-D circuits. This thesis centers in the development of such tools. Specifically, a computer-aided design system called FluidLayout has been developed that integrates a familiar means of circuit layout with the means to handle 3-D circuits of arbitrarily many layers. In addition to providing a familiar circuit design interface, FluidLayout also provides useful system-level functions such as the ability to extract circuit netlists and the ability to produce file formats useful for fabrication. The tools and technology have been used in this thesis to design some interesting circuits. Some basic transistor circuits have been laid out to test the viability of FluidLayout and of the medium. FluidLayout has been used to specify the circuit arrangement and to verify functionality of the circuit through netlist extraction. A wafer mask has been produced using the specification provided by FluidLayout. To explore some interesting ramifications of the technology, a 3-D static RAM has 67 also been designed using FluidLayout. This SRAM has a 512-bit capacity, but has the same footprint as that of a 64-bit 2-D SRAM. Further, wiring costs have been reduced so that the power consumption of this memory is approximately that of the 64-bit 2-D SRAM as well. The I/O packaging of this 3-D SRAM is arranged so that it may be used as a drop-in replacement for conventional SRAM; thus, immediate benefits may be seen from this technology. Finally, a 3-D cellular automata machine has been designed in order to explore some system issues in 3-D design. This CAM implements a 3-D version of Conway's Game of Life in hardware. The architecture features an efficiently-distributed global clock signal and a 3-D memory-style I/O interface. As the technology matures, circuits can be fabricated using the tools and techniques developed in this thesis. The potential benefits introduced by such new devices are enormous. 68 Appendix A FluidLayout User's Guide A.1 Overview FluidLayout is a design tool for geometric layout of three-dimensional integrated circuits and basic microelectromechanical systems (MEMS). That is, within FluidLayout, it is possible to construct arbitrary metal-insulator-semiconductor structures so long as the structures are confined to a Manhattan (900) grid. Figure A-i shows the main appearance of FluidLayout. The components outlined in the figure will be discussed in detail. A.2 Basic Layout Basic layout is done by painting rectangular boxes on a grid. A colored box is created by first drawing the box outline (the edit box) and then selecting the paint with which to fill it. The left mouse button is used to select the lower-left corner of the box and the right mouse button is used to select the upper-right corner. The box, once sized, may be moved by clicking the left mouse button at the desired lower-left coordinate. Once the box has been sized, it must be painted. There are two ways of doing this. The first is to select the desired material from the main toolbar, shown in Figure A-2. Substrate materials are identified by their color and their associated tool tips (for example, passing the mouse over the red button produces a "Gate" pop-up). 69 intil F01 r VIZI Jql ml o z! ir-tlvAlR-,,l low d .. . . . .. . .. . . . . . ciell f 6 dlb if oq ............. i tna i albini ......... ...... ---------- ......... -------- ....... ------------I IN ... ... ..... t IF 0 . .......... . . .............. .......... C 71- .... ..... -------- -- Figure A-2: The main FluidLayout toolbar. Via materials are identified by the color of the underlying metal along with an "X" through the button. The other way to paint boxes is to use existing paint. The middle mouse button acts as a copy button, in that if it is clicked over a painted region, the paint in that region is added to the paint in the edit box. Also, if the middle mouse button is clicked over empty space, the edit box is cleared. To make rectangle placement easier, several features are supplied. The first is a grid whose visibility may be toggled by the grid toolbar button or the 'g' key. Rectangle placement is aligned to this grid regardless of its visibility. Another feature is the current-coordinate window. The coordinates given are relative to the lower left of the entire circuit, so this information is especially useful in coordinating object locations across layers. Finally, zoom buttons are provided in order to allow the user to view any part of the layout easily. Specifically, three buttons are provided: zoom in (2:1), zoom out (1:2), and view all. Further, the 'z'/'Z' and 'v' keys are enabled as shortcuts; 'z' zooms to the edit box, 'Z' is equivalent to "zoom out", and 'v' is equivalent to "view all." A.3 A.3.1 Higher-Level Functions Node Labeling For circuit verification purposes, it is desirable to label some of the nodes of a circuit. This is done in FluidLayout using the "select" and "label" functionality. A node that is to be labeled must first be selected. In Figure A-3, a selected node is shown. FluidLayout outlines the selected rectangle in bold white. A rectangle may be selected using the 's' key. 71 Figure A-3: Partial view showing node selection. Once selected, a rectangle may be labeled by pressing the 't' key or the label button (marked 'T') on the toolbar. Doing so causes the dialog in Figure A-4 to pop up. The user may then enter an appropriate label. A.3.2 Translation, Rotation, and Reflection The ability to move portions of a circuit around the layout is extremely useful. In FluidLayout, this is accomplished with the Cut, Copy, and Paste facilities. The contents of the edit box are copied to the clipboard (and in the case of Cut, erased from the layout), and may be placed at a new location by relocating the edit box there. Additionally, FluidLayout supports 900 rotations about the lower-left corner of the edit box and reflections about the left side of the edit box. A rotation is depicted in Figure A-5. 72 Car Figure A-4: Label dialog. ILI Figure A-5: Subcircuit rotation. 73 oLi Figure A-6: Cell hierarchy management toolbar. A.4 A.4.1 Circuit-Level Tools Circuit Traversal As discussed before, FluidLayout provides zoom-in, zoom-out, and view-all functionality for circuits on a layer. The mechanism for creating and viewing different layers of the circuit is provided from the pull-down menus. For example, one can add a layer to the circuit by pulling down the Window menu and selecting New Layer. One can then open a view window for any of the existing layers by pulling down the View menu and selecting Switch to Layer. A dialog box will then pop up in which the user can specify the layer to view. A.4.2 Cell Hierarchy Management Every FluidLayout design can be used as a subcell of another design. Thus, FluidLayout has full support for cell hierarchy. Cell manipulation is done through the cell toolbar, shown in Figure A-6. The cell toolbar has buttons for loading cells into the design, adding cells to the layout, and for manipulation of individual cells. Specifically, the cell toolbar has the following buttons from left to right: Load Cell allows the user to load a cell from disk into memory and associate the cell with the current working design. Add Cell pops up a dialog box from which the user can select a loaded cell for addition to the circuit layout. The next three buttons specify the operation mode for cell manipulation. A cell may be accessed by double-clicking on the cell. The access mode is specified by the toolbar button that is currently depressed. 74 Copy Cell signifies that double-clicking should create a copy of the accessed cell. Move Cell signifies that double-clicking should move the accessed cell. Erase Cell signifies that double-clicking should erase the accessed cell. With this interface, it is feasible to construct circuits with arbitrary hierarchical depth. A.4.3 Magic Importation FluidLayout has the ability to import circuit layouts created in Magic. Text files with the .mag extension are recognized as Magic layout specifications, and those rectangles with material types that correspond to FluidLayout materials are imported to FluidLayout's native format. This option is selected by pulling down the File menu and selecting the Import->MAGIC file (.mag) option. A.4.4 Circuit Netlist Extraction FluidLayout is able to extract the connectivity information in VLSI circuit layouts and produce a netlist in SPICE deck format. This format is suitable for functional and timing simulation using the Berkeley SPICE circuit simulator or its variants. (The user must supply SPICE models for the transistors; this information cannot be provided by FluidLayout.) The user may select this option by pulling down the File menu and selecting the Export->SPICE deck (.sp) option. A.4.5 VLSI/MEMS Fabrication At some point, it is desirable to target a design for fabrication. There are two steps that must be taken: first, the fabrication technology must be specified, and second, the circuit must be extracted to a suitable output file. 75 In~tSetp, Laser Setup -stj Lambda (0.01: Laser position: _.1 -135 X: Lasei curent (mA): 300 Printing feed rate: 500 -Restore -Defaults1 Figure A-7: Edit->Properties->Laser Setup Fabrication Technology Specification FluidLayout is designed to understand fabrication technologies on a system-by-system basis. This is mainly because the fabrication methods are drastically different from each other. For example, FluidLayout is able to handle both inkjet and embossing means of fabrication. FluidLayout accommodates this by maintaining a property sheet for each design. This sheet may be accessed by pulling down the Edit menu and selecting Properties. Figure A-7 shows the property sheet for laser rasterization of the layout. 76 The individual fabrication parameters are set as described here: Inkjet Setup The inkjet is a 48-nozzle liquid printing head mounted on an X-Y-Z gantry. This property sheet therefore configures the size of the printing head and various printing parameters. Lambda corresponds to the size of the grid spacing. This should be set to the line width of the inkjet, since rectangles are rastered, and it is preferred that rectangles are printed as such as opposed to sets of snake-like lines. Inter-nozzle spacing refers to the space between each of the 48 nozzles. Printing feed rate is the rate of gantry movement during printing. Non-printing feed rate is the rate of gantry movement when not printing. Pre-print (acceleration) spacing is used to produce smooth, even lines. The inkjet is given a small amount of space to accelerate to full speed before printing. Laser Setup The laser setup is similar to that of the inkjet, as the laser is mounted on the gantry also. Laser position is the (x, y) location of the laser in gantry coordinates. Laser current is the operating current of the laser. This is used for automated operation of the laser via the serial port of the controlling computer. Stamp Setup The stamp setup uses a flexible stamp to pattern liquid materials. Each stamp contains all the patterns for a circuit (e.g the gate pattern is followed by the gate-via pattern); these patterns are spatially separated on the stamp. Each pattern consists of rectangular outlines that are used to separate the desired material rectangles from undesirable excess material. Layer-to-layer spacing thus gives the spacing from a given layer on the stamp to the next. For example, if the gate pattern is at (0, 0) and the spacing is set to 1000 microns, then the gate-via pattern is at (1000, 0). Width of stamp outline refers to the width of the rectangular outlines in the stamp patterns. 77 Printer Use The gantry printing system is designed so that laser and inkjet print- ing may be done interchangeably, meaning that different material parts of the same circuit may be done with either inkjet or laser. Thus, this property sheet allows the user to specify which method is to be used for printing. Fabrication File Production Once the technology is set, all that remains is to produce output that can be fabricated. This can be done in three ways, depending on the target process. File->Export->MMI code is used to produce G-code for the gantry system. File->Export->VLSI Nanoprinting GDSII produces GDSII binary stream data that is almost universally accepted for mask fabrication. File->Export->MEMS Nanoprinting GDSII produces GDSII binary data, but the layout is interpreted as a MEMS process and the release layers are generated accordingly. A.5 Step-By-Step Design Walk-Through As an example, a three-dimensional ring oscillator is laid out here. Once FluidLayout is started, the user is presented with the screen in Figure A-1. The grid is off by default; it should be turned on by clicking on the grid toolbar button or by hitting the 'g' key. First, an NMOS transistor is created. To do so, a source node is created by left-clicking at some location and right-clicking to define a 4 x 2 box, as shown in Figure A-8. Clicking on the blue toolbar button fills in this box with source/drain material, as shown in Figure A-9. By left-clicking, the edit box can be relocated to the right of the new source node as a suitable location for a drain node. Then, by middle-clicking over the source node, the source node material can be copied to the new edit-box location, yielding Figure A-10. 78 - r I I I I I - I - II- - I- -- I [ II I I I Figure A-8: 4 x 2 box used for the source of an NMOS transistor. I I I I I I I I -- I I J ~ - I i- -i I- Figure A-9: Completed source node of the NMOS transistor. I . - - L - .- Figure A-10: NMOS source and drain nodes. 79 Source and drain nodes for the PMOS transistor can be created using the same method. The resulting pattern is shown in Figure A-11. Figure A-11: Inverter source and drain nodes. The common gate may be laid out using the edit box and clicking on the red toolbar button to fill it. Similarly, the n-type and p-type semiconductor can be placed using the green and brown buttons respectively. The power and ground lines can also be laid out using the source/drain material. Finally, the transistor drains can be wired together to form the output. This results in the image in Figure A-12, which can be seen in its entirety by pressing the 'v' key or clicking on the 'V' toolbar button. It is now desired to create inverters on the second and third layers of this circuit, 80 I I -- ; I ; I - I I 1 V I I - i 'u- - r -a I -..L J I -L--P.. I I r -r--r- L r INI f I Figure A-12: Complete inverter. 81 V - I ; I thus making a three-dimensional ring-oscillator. This is started by creating a new layer from the Window menu, shown in Figure A-13. _e P I-ayout f I NJA w I I - II L_ b.ta I I ye - Figure A-13: Window menu. This layout process could then be repeated for each new layer. Instead, the Copy facility is used to duplicate the inverter. This is done in four steps. First, the edit box is used to outline the area to be copied. Then either control-C or the copy toolbar button is used to copy the contents of the edit box. Next, in the view window for the second layer, the edit box is placed at the desired location for the copy (it is only necessary for the lower-left corner of the box to be in the correct place; this can be verified from the current-coordinate indicator). Finally, the Paste function is used by pressing control-V or the paste toolbar button. Next, vias must be placed to connect the three layers. Starting with the first layer, contact pads to the input, output, power, and ground are laid out, as shown in Figure A-14. Each contact pad may then be wired to a series of vias that connect it to the next layer up. Specifically, this is done by outlining the contact pad with the edit box, then selecting the via material to place. In this case, the gate pad needs gate-via, source/drain-via, metall-via, and metal2-via to connect it to layer 2. A via is placed by clicking on the appropriate toolbar button. Specifically, the (xyz)-via button looks like the (xyz) button with an 'X' through it. Upon placing the metal2 via, which connects to the next layer up, a corresponding 82 I -.- - I 1 - 1- I I I I I I 1 1 1 jr -4 - L L .- -- - ---- - L - - F I - i J -- I II r L---- - I I L, I A f, I. - - -- i- - r i J . -- 1- - 1. - r 1 4,L ~ ~ ~ i I . -JLJ J L - .1 J-1 I t I i~~ - - r J L J- J J. - $,I i r- -i -i -k'I L "Lr """1 - J1 "g" - -- A - , L1 J J_-J J..L .-- .L t I I I 1.1 I I f I I I r - -r - 1 -4 - --I- Ij I V IP I - - r J L- - 4-1- I- I 'r~~ 'I -J -- - 1 I I -A I I I ~I -v t I 1 i - 1 L __1U J L -- -4 -- P.--I- -4--7 i 1 - - J--L --- i - - Figure A-14: First-layer inverter with contact pads. 83 I I I - - J -L "W J--Lj 'J~~- -17, - -- - -- via is marked on the next layer. This is shown in Figure A-15. rT -r 1- I I J- I I _1 I ~ I 11 - _L J i- L I - -- of I.. - r - 4I i j I 1 - - 11II _ L 1 - - - "1 - - -, - ri r J L 1 L I I 1l 1 1 1 _kI r i I I I J- L II T ' I I I I I e - I I I I I I S r ,I- -- -I - -- Figure A-15: First-layer inverter with via stacks to the second layer. The inverters on the second and third layers may be wired similarly. This completes the ring-oscillator structure. Finally, the nodes may be labeled for circuit-verification purposes. This is done by placing the mouse pointer over the desired node and pressing the 's' key to select it. Then, pressing the 't' key or clicking on the label button (marked 'T') pops up a dialog box in which the user can label the node. Vdd, ground, and the oscillator output are labeled in this way, as shown in Figure A-16. 84 1 I I II - I I 4 1 I I I L N_- - i- I_ I ~ I I L---I--- - m I J_ I -1 -y a -- 1 -I * -'------I I-I I I -I I I I I I I . I: I I I I - . -- II- I . I II I I II -- r i r -------- ~ ... Ji.. ~ a A i i -i L IL Ie A a a I ir -.- J-L Figure A-16: Labeled first-layer inverter. 85 I- I t i I I J -- L. -- The circuit is then extracted to a SPICE deck via the File->Export menu, shown in Figure A-17. m , Edit View New Window Help Opery.. Close S ave 11Sae As... Pr'nt Preview Pint Setup... UMI code ( _prg} (.gds} GDSI VLSI Nanoprinting 1 ring-oscillator MEMS Nandrinting G D II (.gds) 2 C:\Users\...3 DGameOfLife 3 CAIsers\...\3DGame0fLifeCel 4 C:\Users\...\Mask 1 SPICE, dek (.sp) -41 I Exit Figure A-17: File->Export menu. This produces the SPICE circuit netlist shown here. ***** C: \WINNT\Profiles\shamikd \Desktop\ring-oscillator. sp ***** Created by FluidLayout ***** Created on 5/1/2000 M1 3 ring GND! 0 NTFT W=10u L=10u M2 3 ring Vdd! Vdd! PTFT W=30u L=10u M3 2 3 GND! 0 NTFT W=10u L=10u M4 2 3 Vdd! Vdd! PTFT W=30u L=10u M5 ring 2 GND! 0 NTFT W=10u L=10u M6 ring 2 Vdd! Vdd! PTFT W=30u L=10u 86 Finally, it is desired to extract the circuit to output that can be fabricated. Using the same File->Export menu, VLSI nanoprinting GDSII is selected. This produces a binary stream that can be used to turn a mask, which is used to produce the wafer mold from which a circuit stamp can be made. The circuit design flow is thus complete. 87 A.6 Summary of Useful Commands Desired Action Method, Shortcut, or Menu place edit box left-click at lower-left corner resize edit box right-click at upper-right corner paint box click on applicable toolbar button or middle-click over an area with the same paint zoom to edit box 'z' key zoom in zoom-in toolbar button (+ magnifying glass) zoom out zoom-in toolbar button (- magnifying glass) view entire circuit 'v' key or view-all toolbar button (marked 'V') toggle grid grid toolbar button or 'g' key cut contents of edit box Cut toolbar button or ctrl-X copy contents of edit box Copy toolbar button or ctrl-C paste contents of edit box Paste toolbar button or ctrl-V rotate contents of edit box Rotate toolbar buttons (circular arrows) select rectangle/node 's' key label rectangle/node 't' key or Label toolbar button (marked 'T') manipulate cell double-click left mouse button over cell load cell Load Cell toolbar button (image of cell from floppy disk) add cell Add Cell toolbar button (image of cell from list) copy cell Copy Cell toolbar button (image of duplicate cells) move cell Move Cell toolbar button (arrow) erase cell Erase Cell toolbar button (obliterated cell image) edit fabrication parameters Edit->Properties menu import Magic layout File->Import menu export SPICE deck File->Export->SPICE export fabrication files File->Export menus 88 Bibliography [1] Carter Bays. Candidates for the game of life in three dimensions. Complex Systems, 1:373-400, 1987. [2] Carter Bays. A new game of three-dimensional life. Complex Systems, 5:15-18, 1991. [3] Carter Bays. A new candidate rule for the game of three-dimensional life. Complex Systems, 6:433-441, 1992. [4] Claude Bertin et al. Integrated multichip memory module structure. United States Patent 5,502,667, March 1996. [5] A. R. Brown et al. Logic gates made from polymer transistors and their use in ring oscillators. Science, 270:972-974, 1995. [6] J. H. Conway, E. R. Berlekamp, and R. K. Guy. Winning Ways for Your Mathematical Plays. Academic Press, New York, 1983. [7] Sawyer Fuller and Joseph Jacobson. Ink jet fabricated nanoparticle mems. In 13th Annual IEEE Conf. on MEMS, 2000. [8] Information Mechanics Group. Cam8: A parallel, uniform, scalable architecture for cellular automata experimentation. On the World-Wide-Web at http://www. im. Ics. mit. edu/ cam8. [9] Roger T. Howe and Charles G. Sodini. Microelectronics: An IntegratedApproach. Prentice-Hall, NJ, 1997. 89 [10] F. T. Leighton and Arnold L. Rosenberg. Three-dimensional circuit layouts. SIAM Journal on Computing, 15(3):793-813, 1986. [11] Charles E. Leiserson. Vlsi theory and parallel supercomputing. MIT/LCS/TM 402, Massachusetts Institute of Technology Laboratory for Computer Science, May 1989. [12] Magic - a vlsi layout system. On the World-Wide-Web at http://research. compaq. com/ wrl/ projects/ magic/ index. html. [13] C. A. Mead and L. A. Conway. Introduction to VLSI Systems. Addison-Wesley, Reading, MA, 1980. [14] J. Ousterhout. Corner stitching: A data structuring technique for vlsi layout tools. IEEE Trans. Computer-Aided Design, CAD-3(1):87-100, 1984. [15] J. Ousterhout et al. Magic: A vlsi layout system. In Proceedings of the 21st IEEE Design Automation Conference, pages 152-159, 1984. [16] B. A. Ridley et al. Solution-processed inorganic transistors and sub-micron nonlithographic patterning using nanoparticle inks. In Materials Research Society Proceedings, Fall 1999. [17] B. A. Ridley, B. Nivi, and J. M. Jacobson. All-inorganic field-effect transistors fabricated by printing. Science, 286:746-749, 1999. [18] Arnold L. Rosenberg. Three-dimensional integrated circuitry, pages 69-80. VLSI Systems and Computations. Computer Science Press, Rockville, MD, 1981. [19] Michael Sipser. Introduction to the Theory of Computation. PWS Pub., Boston, 1997. [20] Francoise F. Souli6, Yves Robert, and Maurice Tchuente, editors. Automata Networks in Computer Science: Theory and Applications. Princeton University Press, Princeton, NJ, 1987. 90 [21] Andrew C. Tickle. Thin-Film Transistors;A New Approach to Microelectronics. 1961. [22] Tomaso Toffoli and Norman Margolus. Cellular Automata Machines: A New Environment for Modeling. MIT Press, Cambridge, MA, 1991. [23] Neil H. E. Weste and Kamran Eshraghian. Principles of CMOS VLSI Design: A Systems Perspective. Addison-Wesley, Reading, MA, 1993. [24] Ronald Williams and Ogden Marsh. Future wsi technology: stacked monolithic wsi. IEEE Transactions on Components, Hybrids, and Manufacturing Technology, 16:610-614, 1993. [25] P. Zavracky. 3d microelectronics. On the World-Wide-Web at http://www. ece. neu. edu/ edsnu/ zavracky/ mfl/ programs/ 3d/ 3dmicro.html. 91