Multiple Drain Transistor Based FPGA Architecture Drew Carlson and Pankaj Kalra University of California at Berkeley; Cory Hall #1770; Berkeley, CA 94720 Tel: (510) 643-2558, Fax: (510) 643-2636, E-mail: acarlson@eecs.berkeley.edu Abstract—A novel device structure, the multiple drain transistor (MDT) has been proposed for reconfigurable applications. The MDT incorporates multiplexing and memory elements into a single device, giving it a unique area-savings advantage over existing switching solutions. Switching circuit and programmable block designs are reviewed for field-programmable gate arrays (FPGAs). MDTbased circuits are proposed as an alternative to these designs. Key Words—Multiple drain transistor, Multiplexing, Non-volatile memory FPGA, Introduction Reconfigurable circuits, especially field programmable gate arrays (FPGAs), are proposed as a low cost alternative to ASIC designs because of reduced design time, flexibility of application, and increased robustness to programming errors. One challenge in FPGA architecture is to provide high density designs for switching circuits, in both programmable blocks and the routing fabric that connects them. Multiple drain transistors (MDTs) are proposed as a low cost alternative to existing switch designs. The MDT is a MOSFET with multiple drain terminals, each of which can be independently turned off. Recently, charge storage in a silicon nitride sidewall has been proposed to shift the threshold voltage of a device for non-volatile memory [1]. The MDT uses stored charge in the sidewall to electrically isolate each drain. It can therefore be used as a multiplexer that can store its configuration in built-in memory. This functionality is fundamental for reconfigurable circuits. In this paper, we give a general overview of FPGA design, with an emphasis on the challenge of highdensity switching. Recent designs and theory are surveyed for both the routing fabric and the logic blocks of an FPGA. It is found that the layout area of the switches is a limiting constraint for many aspects of FPGA design, especially routability. MDT-based circuits are suggested, and a study is proposed to quantify their advantages, in terms of cost (layout area), performance, and design flexibility. The Multiple Drain Transistor The MDT (Figs. 1 & 2) uses two types of switching mechanisms to form connections between independent drain terminals and a shared source terminal. The primary mechanism is the gate-induced channel of a conventional MOSFET. A secondary mechanism uses stored electrons in the drain sidewall to deplete the LDD, thus forming a barrier to conduction between the drain and channel. The LDD must be very lightly doped and shallow to ensure that the sidewall charge, which may be limited by trap state density, can sufficiently prevent conduction. The MDT is programmed by hot carrier injection at each drain to separate it from the channel. With an advance programming cycle, an MDT with n drains may be used as an n-to-1 multiplexer or as an n-bit non-volatile memory element. Unlike other sidewall memories [1, 2], the MDT is both programmed and read in a forward mode of operation; that is, it is not necessary to exchange the source and drain connections to operate the device. This method of sidewall memory storage could also be used in a conventional (single drain) device, with a tradeoff in IDSAT due to higher series resistance of the LDD. Background: FPGAs FPGAs consist primarily of two types of circuits: logic blocks that execute a (possibly complex) logical function, and routing fabric, that connects the outputs of each block to the appropriate inputs of other blocks. Each type can be programmable. Logic blocks are often implemented as look up tables that are programmed with the truth table of a particular function. The connections of the routing fabric can be individually programmed as well. FPGAs are usually programmed off-line, due to the complexity of dynamically routing all of the signals on a chip. The programming and erasing delays of the switches in an FPGA are less important than the operational delay of switching. From the standpoint of layout area, the routing fabric is the dominant part of an FPGA design. Routing fabric layouts typically consume 70-90% of a layout area. Routing Fabric The routing fabric of an FPGA consists of vertical and horizontal channels and switching blocks, arranged as in Fig. 3. The channels consist of connection blocks, which connect the wires in the channels to the inputs and outputs of the appropriate functional blocks. The switching blocks, located at the junction of horizontal and vertical channels, connect the wires of one channel to those of another. The switching blocks are often designed so that there is one channel in each of the cardinal directions; that is, a horizontal channel at one side of the switching block will not necessarily connect to the horizontal channel on the other side. The switching blocks are the limiting constraint to layout area for most FPGA fabric designs. [3] examines the point at which switching block designs become wirebound, when the square of the width of the channel exceeds the area of the switching block making the necessary connections. The width of the channel is determined by the number of wires W in each channel, the process rule-specified space between the wires, and the number of metal layers in the process. It is found that for a four-metal 0.13m process, the wirebound point is not before W > 169. For typical channel widths of W=4-5, the limiting constraint is the area of the switching block. There are two common strategies for minimizing switching block area. The first is to reduce the size of each switch in the block. In a typical design, these switches are composed of a pass transistor, whose size can directly determine the delay and power consumption of the routing path, and a memory element, typically an SRAM cell. Switch minimization has progressed to the level where switching blocks approach the layout density of an SRAM array. An optimal distribution for pass transistor sizing has also been proposed [4]. The second strategy is to reduce the number of switches in the switching block. In addition to power and delay metrics, approaches of this form are compared in terms of a metric of flexibility. Flexibility is commonly defined as the number of switches connected to each wire entering the switching block. Lower flexibility allows for area savings at a tradeoff of design routability. Generally a working minimum of 3 is used for flexibility [5]. There are many techniques for maximizing routability subject to the constraint of a low flexibility. Multilength wires in the channels can route signals to many places in a design while saving an extra switch in every switching block. Hard-wired routing pattern FPGAs extend this idea by including a handful of hard-wired connections based on statistics of a switching block [6]. Each hard-wired connection saves the layout of a switch at a small cost in the complexity of the routing. Since most routing is calculated off-line, the additional time and energy required is negligible. There are also algorithms for arranging switches within a switching block to maximize density [3]. Given a certain switch area and a square switching block, the switches can be arranged in a non-periodic way such that the distance between them is maximized. This arrangement allows for the greatest density of switches within the block. For applications where routability is a critical constraint, optimal switch-level designs for universal switching blocks have been found [7]. Of these two strategies, the MDT is most promising for reducing the area of the switch. Due to its compact design and incorporation of a memory element, a single MDT can replace the large switching cells in current use. This area savings will allow for an overall reduction of the routing fabric area and an increase in the flexibility of the designs. Specific circuits for use in a switching block are discussed in the proposals section below. Programmable Logic Block A programmable logic block can be implemented in three ways as: a gate like NAND, a 2-to-1 MUX based look-up table or as an SRAM/ Flash based look-up table. The size of the logic block has a great impact on the area and functionality of the FPGA. In a 2-to-1 MUX based implementation, each logic block typically can implement an arbitrary function of k variables using a k-input look-up table (k-LUT), where generally 3 k 5 [8]. A complete 2-to-1 MUX based programmable logic block is shown in Fig. 4. Fig. 5 shows an FPGA architecture where each logic block contains a single a 3-LUT that takes as input the signals (s0, s1, s2) and outputs any function F of the signals. Each k-LUT generally consists of a number of transmission-gate based 2-to-1 multiplexers arranged in a binary-tree type structure where the leaves of the tree are connected to memory cell elements (2k bits in length) [9]. Usually, a latch is used as the memory cell element. Fig. 6 shows the use of SRAM for holding programming bits and NMOS pass transistors or transmission gates for decoding address [10]. The pass transistor implementation requires a lower transistor count, but degradation of signal integrity is a major concern. Recently flash FPGAs have also been very popular. The nonvolatile nature of reprogrammable flash-based technology is an advantage over SRAMbased technology. Proposals In order to compare MDT-based designs quantitatively, a circuit-level simulation model has been developed. The model consists of a subcircuit of transistors and resistive elements. Because the MDT is constructed with a CMOS-compatible process, an existing SPICE model for an NMOS transistor was used as a template. The model was completed with data from simulations using MEDICI, a device simulator, and measured I-V characteristics from devices fabricated in the UC Berkeley Microlab. The majority of the design effort of this project will be into subcircuits of the FPGA design. We look to replace the design for a switch in a switching block, and segments of the logic blocks, without getting into the complexity or specificity of a giant FPGA design. We propose an MDT-based switching block design with drains connected to the wires of one or more channels and the shared source terminal connected to a single wire of another channel. The proposed topology of Fig. 7 is convenient for connections of two channels of one direction to one channel of an orthogonal direction. Fig. 8 illustrates how this configuration can be used for a low area universal switching block for a process with two routing metal layers. Our proposal will examine this design in terms of the metrics of power and area. Other circuit designs of lower flexibility or additional metal routing layers will also be considered. A common benchmark for comparison of switching block designs is the universal switching block of the XC4000 [11]. We will also compare this design against the layouts reported in the literature, e.g. [3]. A similar topology will be examined for use in the connection blocks inside the channels. For the programmable logic block, the most common choice to implement function in a k-input LUT is to use memory cells based on SRAM/ Flash or to use 2-to-1 muliplexer based approach. MDT can be used as an nto-1 multiplexer or as an n-bit non-volatile memory element (Figs 9 & 10). The charge storage in the sidewalls will give the control to select desired drain, resulting in a digital switch just like pass transistor. We propose an MDT-based LUT technology with built-in non-volatile memory and signal multiplexing capability within a single transistor. In a 2-to-1 MUX based LUT implementation, it takes 6 (4+2) transistors to perform the task of generating non-inverting output from two multiplexed signals. With the help of MDT, we can achieve the same goal by using only one transistor having two drains. We’ll investigate the potential area advantage of MDT based technology over other memory cell based technology like SRAM and Flash. It will also be necessary to create circuits to program and erase these MDT-based designs. Because MDT programming relies on hot carrier injection, it is necessary to generate a short pulse of relatively high voltage. To program multiple MDTs, it is necessary to apply the pulse to each drain that will be turned off; however, this may be done simultaneously for some cases. We expect the overhead area for these circuits to be comparable to that of current SRAM-based architectures, with the addition of charge pumps to generate the pulse. It is important to note that programming speed is not one of our metrics for comparison, since most FPGA programming is done offline. Metrics The advantages of the proposed architecture will be quantified in terms of area savings, power, and switching delay. Qualitative comparisons of design flexibility will be discussed. The comparison will be performed at the level of individual logic or switching blocks in the interests of time and feasibility. A variety of output loads and input conditions will be used to evaluate the robustness of the design and to simulate the environment within the FPGA. Conclusion A general survey of recent FPGA designs has been presented. The combination of multiplexer with memory unit was found to be ubiquitous and relatively large in its implementation, compared with the MDT. The potential for cost savings, measured in layout area, is likely to be significant for an MDT-based architecture. Further work will be conducted to quantify these savings, in addition to evaluating the advantages of an MDT-based design by other metrics, specifically power and delay. References [1] [2] M. Fukuda et al., IEDM Technical Digest, pp.909-912, 2003. Y.K. Lee et al., J. Vac. Sci. & Tech. B, vol. 22, pp.2493-2498, 2004. [3] H. Schmit and V. Chandra, “Layout Techniques for FPGA Switch Blocks,” IEEE Trans. VLSI Systems, vol. 13, pp.96-105, Jan. 2005. [4] V. Betz and J. Rose, “Circuit design, transistor sizing and wire layout of FPGA interconnect,” in Proc. IEEE Custom Integrated Circuits, 1999, pp. 171–174. [5] J. Rose and S. Brown, “Flexibility of interconnection structures for field programmable gate arrays,” IEEE J. Solid-State Circuits, vol. 26, pp. 277–282, Mar. 1991. [6] S. Sivaswamy et al., “HARP: Hard-wired Routing Patterns of FPGAs,” FPGA’05, 2005. [7] H. Fan et al., “Reduction Design for Generic Universal Switch Blocks,” ACM Trans. Des. Auto. Of Elec. Sys., vol. 7, pp.526546, Oct. 2002. [8] G. Borriello, C. Eblino, S. Hauck, and S. Burns, “ The tripytch FPGA Architecture,” IEEE Trans. VLSI Sytems, 3 (1995), pp. 491-501. [9] M. Roberts, L. Tomes, F. Moues, and D. Auvergne, “Influence of logic block layout architecture on FPGA performance,” in Proc. 4th Intl. Workshop on Field-Programmable logic and applications, pp.34-44, Sep. 1994. [10] P. Chow, S. Ong Seo, J. Rose, K. Chung, G. Paez-Monzon, and I. Rahardja, “The design of an SRAM-based field-programmable gate array. I. Architecture,” IEEE Trans. VLSI Syst., vol. 7, June 1999. [11] Xilinx, “XC4000 Data Sheets,” available at http://www.xilinx.com/, 2005 Si3N4 Q NIT SOURCE GATE G DRAIN1 N- S k-inputs N+ D2 D2 Select SiO2 D1 P+ . . W WI W Dn LUT D Q Output > Clock I. D II. S Fig. 1. Top view of the multiple drain transistor (MDT), extendable to n drains. Isolation of width WI separates the drains from each other. A 2-drain layout is illustrated in the inset. Fig. 2. Cross-section of the MDT. Charge is stored in the sidewall, over a lightly doped LDD, to isolate the drain from the channel (inset). Fig. 4. Programmable Logic Block I/O Pads S0 S1 . . . . . . . . . .. .. . S2 . . . Interconnection switch . . . Memory Cells F .. .. . Logic Block Fig. 3. FPGA: Generic Structure Pass transistor Logic . . Decoder Circuit Fig. 6. SRAM based LUT BL2 BL1 BL3 SRAM based LUT Fig. 5. MUX based LUT implementation F Fig. 7. Possible topology for a routing switch using the MDT. In this example, the shared source connects to one wire of a horizontal channel (W), while the drains connect to the wires of two vertical channels (N, S). A B Out A B Out Fig. 10. Configuration of a MDT as 2-to-1 MUX G S Fig. 9. MDT as non-volatile memory element Fig. 8. Proposed 4 x 4 universal switching block, implemented with 8 MDTs.