Multi-Drain Transistors and their Use in FPGAs

advertisement
Multiple Drain Transistor Based FPGA Architecture
Drew Carlson and Pankaj Kalra
University of California at Berkeley; Cory Hall #1770; Berkeley, CA 94720
Tel: (510) 643-2558, Fax: (510) 643-2636, E-mail: acarlson@eecs.berkeley.edu
Abstract—A novel device structure, the multiple
drain transistor (MDT) has been proposed for
reconfigurable applications. The MDT incorporates
multiplexing and memory elements into a single
device, giving it a unique area-savings advantage
over existing switching solutions. Switching circuit
and programmable block designs are reviewed for
field-programmable gate arrays (FPGAs). MDTbased circuits are proposed as an alternative to these
designs.
Key Words—Multiple drain transistor,
Multiplexing, Non-volatile memory
FPGA,
Introduction
Reconfigurable
circuits,
especially
field
programmable gate arrays (FPGAs), are proposed as a
low cost alternative to ASIC designs because of reduced
design time, flexibility of application, and increased
robustness to programming errors. One challenge in
FPGA architecture is to provide high density designs for
switching circuits, in both programmable blocks and the
routing fabric that connects them.
Multiple drain transistors (MDTs) are proposed as a
low cost alternative to existing switch designs. The
MDT is a MOSFET with multiple drain terminals, each
of which can be independently turned off. Recently,
charge storage in a silicon nitride sidewall has been
proposed to shift the threshold voltage of a device for
non-volatile memory [1]. The MDT uses stored charge
in the sidewall to electrically isolate each drain. It can
therefore be used as a multiplexer that can store its
configuration in built-in memory. This functionality is
fundamental for reconfigurable circuits.
In this paper, we give a general overview of FPGA
design, with an emphasis on the challenge of highdensity switching. Recent designs and theory are
surveyed for both the routing fabric and the logic blocks
of an FPGA. It is found that the layout area of the
switches is a limiting constraint for many aspects of
FPGA design, especially routability.
MDT-based
circuits are suggested, and a study is proposed to
quantify their advantages, in terms of cost (layout area),
performance, and design flexibility.
The Multiple Drain Transistor
The MDT (Figs. 1 & 2) uses two types of switching
mechanisms to form connections between independent
drain terminals and a shared source terminal. The
primary mechanism is the gate-induced channel of a
conventional MOSFET. A secondary mechanism uses
stored electrons in the drain sidewall to deplete the
LDD, thus forming a barrier to conduction between the
drain and channel. The LDD must be very lightly doped
and shallow to ensure that the sidewall charge, which
may be limited by trap state density, can sufficiently
prevent conduction. The MDT is programmed by hot
carrier injection at each drain to separate it from the
channel. With an advance programming cycle, an MDT
with n drains may be used as an n-to-1 multiplexer or as
an n-bit non-volatile memory element. Unlike other
sidewall memories [1, 2], the MDT is both programmed
and read in a forward mode of operation; that is, it is not
necessary to exchange the source and drain connections
to operate the device. This method of sidewall memory
storage could also be used in a conventional (single
drain) device, with a tradeoff in IDSAT due to higher
series resistance of the LDD.
Background: FPGAs
FPGAs consist primarily of two types of circuits:
logic blocks that execute a (possibly complex) logical
function, and routing fabric, that connects the outputs of
each block to the appropriate inputs of other blocks.
Each type can be programmable. Logic blocks are often
implemented as look up tables that are programmed with
the truth table of a particular function. The connections
of the routing fabric can be individually programmed as
well. FPGAs are usually programmed off-line, due to
the complexity of dynamically routing all of the signals
on a chip. The programming and erasing delays of the
switches in an FPGA are less important than the
operational delay of switching.
From the standpoint of layout area, the routing fabric
is the dominant part of an FPGA design. Routing fabric
layouts typically consume 70-90% of a layout area.
Routing Fabric
The routing fabric of an FPGA consists of vertical
and horizontal channels and switching blocks, arranged
as in Fig. 3. The channels consist of connection blocks,
which connect the wires in the channels to the inputs and
outputs of the appropriate functional blocks. The
switching blocks, located at the junction of horizontal
and vertical channels, connect the wires of one channel
to those of another. The switching blocks are often
designed so that there is one channel in each of the
cardinal directions; that is, a horizontal channel at one
side of the switching block will not necessarily connect
to the horizontal channel on the other side.
The switching blocks are the limiting constraint to
layout area for most FPGA fabric designs. [3] examines
the point at which switching block designs become
wirebound, when the square of the width of the channel
exceeds the area of the switching block making the
necessary connections. The width of the channel is
determined by the number of wires W in each channel,
the process rule-specified space between the wires, and
the number of metal layers in the process. It is found
that for a four-metal 0.13m process, the wirebound
point is not before W > 169. For typical channel widths
of W=4-5, the limiting constraint is the area of the
switching block.
There are two common strategies for minimizing
switching block area. The first is to reduce the size of
each switch in the block. In a typical design, these
switches are composed of a pass transistor, whose size
can directly determine the delay and power consumption
of the routing path, and a memory element, typically an
SRAM cell. Switch minimization has progressed to the
level where switching blocks approach the layout
density of an SRAM array. An optimal distribution for
pass transistor sizing has also been proposed [4].
The second strategy is to reduce the number of
switches in the switching block. In addition to power
and delay metrics, approaches of this form are compared
in terms of a metric of flexibility. Flexibility is
commonly defined as the number of switches connected
to each wire entering the switching block. Lower
flexibility allows for area savings at a tradeoff of design
routability. Generally a working minimum of 3 is used
for flexibility [5].
There are many techniques for maximizing routability
subject to the constraint of a low flexibility. Multilength wires in the channels can route signals to many
places in a design while saving an extra switch in every
switching block. Hard-wired routing pattern FPGAs
extend this idea by including a handful of hard-wired
connections based on statistics of a switching block [6].
Each hard-wired connection saves the layout of a switch
at a small cost in the complexity of the routing. Since
most routing is calculated off-line, the additional time
and energy required is negligible. There are also
algorithms for arranging switches within a switching
block to maximize density [3]. Given a certain switch
area and a square switching block, the switches can be
arranged in a non-periodic way such that the distance
between them is maximized. This arrangement allows
for the greatest density of switches within the block. For
applications where routability is a critical constraint,
optimal switch-level designs for universal switching
blocks have been found [7].
Of these two strategies, the MDT is most promising
for reducing the area of the switch. Due to its compact
design and incorporation of a memory element, a single
MDT can replace the large switching cells in current
use. This area savings will allow for an overall
reduction of the routing fabric area and an increase in
the flexibility of the designs. Specific circuits for use in
a switching block are discussed in the proposals section
below.
Programmable Logic Block
A programmable logic block can be implemented in
three ways as: a gate like NAND, a 2-to-1 MUX based
look-up table or as an SRAM/ Flash based look-up
table. The size of the logic block has a great impact on
the area and functionality of the FPGA.
In a 2-to-1 MUX based implementation, each logic
block typically can implement an arbitrary function of k
variables using a k-input look-up table (k-LUT), where
generally 3  k  5 [8]. A complete 2-to-1 MUX based
programmable logic block is shown in Fig. 4. Fig. 5
shows an FPGA architecture where each logic block
contains a single a 3-LUT that takes as input the signals
(s0, s1, s2) and outputs any function F of the signals.
Each k-LUT generally consists of a number of
transmission-gate based 2-to-1 multiplexers arranged in
a binary-tree type structure where the leaves of the tree
are connected to memory cell elements (2k bits in
length) [9]. Usually, a latch is used as the memory cell
element.
Fig. 6 shows the use of SRAM for holding
programming bits and NMOS pass transistors or
transmission gates for decoding address [10]. The pass
transistor implementation requires a lower transistor
count, but degradation of signal integrity is a major
concern. Recently flash FPGAs have also been very
popular. The nonvolatile nature of reprogrammable
flash-based technology is an advantage over SRAMbased technology.
Proposals
In order to compare MDT-based designs
quantitatively, a circuit-level simulation model has been
developed. The model consists of a subcircuit of
transistors and resistive elements. Because the MDT is
constructed with a CMOS-compatible process, an
existing SPICE model for an NMOS transistor was used
as a template. The model was completed with data from
simulations using MEDICI, a device simulator, and
measured I-V characteristics from devices fabricated in
the UC Berkeley Microlab.
The majority of the design effort of this project will
be into subcircuits of the FPGA design. We look to
replace the design for a switch in a switching block, and
segments of the logic blocks, without getting into the
complexity or specificity of a giant FPGA design.
We propose an MDT-based switching block design
with drains connected to the wires of one or more
channels and the shared source terminal connected to a
single wire of another channel. The proposed topology
of Fig. 7 is convenient for connections of two channels
of one direction to one channel of an orthogonal
direction. Fig. 8 illustrates how this configuration can
be used for a low area universal switching block for a
process with two routing metal layers. Our proposal
will examine this design in terms of the metrics of power
and area. Other circuit designs of lower flexibility or
additional metal routing layers will also be considered.
A common benchmark for comparison of switching
block designs is the universal switching block of the
XC4000 [11]. We will also compare this design against
the layouts reported in the literature, e.g. [3]. A similar
topology will be examined for use in the connection
blocks inside the channels.
For the programmable logic block, the most common
choice to implement function in a k-input LUT is to use
memory cells based on SRAM/ Flash or to use 2-to-1
muliplexer based approach. MDT can be used as an nto-1 multiplexer or as an n-bit non-volatile memory
element (Figs 9 & 10). The charge storage in the
sidewalls will give the control to select desired drain,
resulting in a digital switch just like pass transistor.
We propose an MDT-based LUT technology with
built-in non-volatile memory and signal multiplexing
capability within a single transistor. In a 2-to-1 MUX
based LUT implementation, it takes 6 (4+2) transistors
to perform the task of generating non-inverting output
from two multiplexed signals. With the help of MDT,
we can achieve the same goal by using only one
transistor having two drains. We’ll investigate the
potential area advantage of MDT based technology over
other memory cell based technology like SRAM and
Flash.
It will also be necessary to create circuits to program
and erase these MDT-based designs. Because MDT
programming relies on hot carrier injection, it is
necessary to generate a short pulse of relatively high
voltage. To program multiple MDTs, it is necessary to
apply the pulse to each drain that will be turned off;
however, this may be done simultaneously for some
cases. We expect the overhead area for these circuits to
be comparable to that of current SRAM-based
architectures, with the addition of charge pumps to
generate the pulse. It is important to note that
programming speed is not one of our metrics for
comparison, since most FPGA programming is done offline.
Metrics
The advantages of the proposed architecture will be
quantified in terms of area savings, power, and
switching delay. Qualitative comparisons of design
flexibility will be discussed. The comparison will be
performed at the level of individual logic or switching
blocks in the interests of time and feasibility. A variety
of output loads and input conditions will be used to
evaluate the robustness of the design and to simulate the
environment within the FPGA.
Conclusion
A general survey of recent FPGA designs has been
presented.
The combination of multiplexer with
memory unit was found to be ubiquitous and relatively
large in its implementation, compared with the MDT.
The potential for cost savings, measured in layout area,
is likely to be significant for an MDT-based
architecture. Further work will be conducted to quantify
these savings, in addition to evaluating the advantages of
an MDT-based design by other metrics, specifically
power and delay.
References
[1]
[2]
M. Fukuda et al., IEDM Technical Digest, pp.909-912, 2003.
Y.K. Lee et al., J. Vac. Sci. & Tech. B, vol. 22, pp.2493-2498,
2004.
[3] H. Schmit and V. Chandra, “Layout Techniques for FPGA
Switch Blocks,” IEEE Trans. VLSI Systems, vol. 13, pp.96-105,
Jan. 2005.
[4] V. Betz and J. Rose, “Circuit design, transistor sizing and wire
layout of FPGA interconnect,” in Proc. IEEE Custom Integrated
Circuits, 1999, pp. 171–174.
[5] J. Rose and S. Brown, “Flexibility of interconnection structures
for field programmable gate arrays,” IEEE J. Solid-State
Circuits, vol. 26, pp. 277–282, Mar. 1991.
[6] S. Sivaswamy et al., “HARP: Hard-wired Routing Patterns of
FPGAs,” FPGA’05, 2005.
[7] H. Fan et al., “Reduction Design for Generic Universal Switch
Blocks,” ACM Trans. Des. Auto. Of Elec. Sys., vol. 7, pp.526546, Oct. 2002.
[8] G. Borriello, C. Eblino, S. Hauck, and S. Burns, “ The tripytch
FPGA Architecture,” IEEE Trans. VLSI Sytems, 3 (1995), pp.
491-501.
[9] M. Roberts, L. Tomes, F. Moues, and D. Auvergne, “Influence
of logic block layout architecture on FPGA performance,” in
Proc. 4th Intl. Workshop on Field-Programmable logic and
applications, pp.34-44, Sep. 1994.
[10] P. Chow, S. Ong Seo, J. Rose, K. Chung, G. Paez-Monzon, and
I. Rahardja, “The design of an SRAM-based field-programmable
gate array. I. Architecture,” IEEE Trans. VLSI Syst., vol. 7, June
1999.
[11] Xilinx, “XC4000 Data Sheets,” available at
http://www.xilinx.com/, 2005
Si3N4 Q
NIT
SOURCE
GATE
G
DRAIN1
N-
S
k-inputs
N+
D2
D2
Select
SiO2
D1
P+
.
.
W
WI
W
Dn
LUT
D
Q
Output
>
Clock
I. D
II. S
Fig. 1. Top view of the multiple drain
transistor (MDT), extendable to n drains.
Isolation of width WI separates the drains
from each other.
A 2-drain layout is
illustrated in the inset.
Fig. 2. Cross-section of the MDT. Charge is
stored in the sidewall, over a lightly doped
LDD, to isolate the drain from the channel
(inset).
Fig. 4. Programmable Logic Block
I/O Pads
S0
S1
.
.
.
.
.
.
.
.
.
..
..
.
S2
.
.
.
Interconnection
switch
.
.
.
Memory Cells
F
..
..
.
Logic
Block
Fig. 3. FPGA: Generic Structure
Pass transistor
Logic
.
.
Decoder
Circuit
Fig. 6. SRAM based LUT
BL2
BL1
BL3
SRAM
based
LUT
Fig. 5. MUX based LUT implementation
F
Fig. 7. Possible topology for a routing switch
using the MDT. In this example, the shared
source connects to one wire of a horizontal
channel (W), while the drains connect to the
wires of two vertical channels (N, S).
A
B
Out
A
B
Out
Fig. 10. Configuration of
a MDT as 2-to-1 MUX
G
S
Fig. 9. MDT as non-volatile
memory element
Fig. 8. Proposed 4 x 4 universal
switching block, implemented
with 8 MDTs.
Download