Low Power ROM Generation
by
Paul L. Chou
B.S. in Electrical Engineering, University of California at Berkeley (1994)
Submitted to the Department of Electrical Engineering and
Computer Science in partial fulfillment of the requirements for the
degree of
Master of Science
in Electrical Engineering and Computer Science
at the
W,
A
Massachusetts Institute of Technology
rs
OF
1
J
oY
OCT 1 51996
June 1996
© 1996 Paul L. Chou. All rights reserved.
LABRRAiES
The author hereby grants to MIT permission to reproduce and to distribute
copies of this thesis document in whole or in part.
Author
Department of
toucncal engineering andy Computer Science
June, 1996
Certified by
Anantha P. Chandrakasan
Assistant Professor of Electrical Engineering
Certified by
Chairmail,
Low Power ROM Generation
JV-.QaLLL1L1
u,,,
...
AAZ,,%•
V,,
-.
Low Power ROM Generation
by
Paul L. Chou
Submitted to the
Department of Electrical Engineering and Computer Science
in partial fulfillment of the requirements for the degree of
Master of Science in Electrical Engineering and Computer Science
Abstract
Recently, the reduction of power consumption in memory systems has been an active area
of research, due largely to the interest in portable computing devices, such as notebook
and palm top personal computers. This work is concerned with the implementation of a
low power read-only memory (ROM) generator with a simple interface and the ability to
optimize the ROM given power and process parameters.
A low power ROM generator is converted to the Cadence Design Environment. The application of low power techniques to the generator are investigated, such as charge redistribution sense amplifiers, reduced-element word decoders, and other methods borrowed
from low-power I/O. These methods are examined in the attempt to minimize power dissipation while maintaining performance. Techniques found applicable to low power memories are implemented and a generator, ROMGEN, is developed to create a ROM given
parameters supplied by the user.
In addition to these methods for low power dissipation, two tools are developed in an
effort to reduce design time. A ROM modeling program, ROMODEL, is developed to
model the power dissipation characteristics of a ROM. Also, an optimization program,
ROMOPT is developed that applies the modelling program to find the optimal configuration for the ROM. The results gathered from these tools can then be used to generate a
ROM configured for low power.
Thesis Supervisor:
Anantha P. Chandrakasan
Title:
Professor of Electrical Engineering and Computer Science
Low Power ROM Generation
Acknowledgments
I would like to thank a number of people who have contributed to the completion of this
project:
Anantha Chandrakasan for his insight, encouragement, and enthusiasm. Working with him
was an honor and a privilege.
Thucydides Xanthopoulos for the countless times that he has helped me. He patiently
answered my questions, gave me advice and encouragement, and proofread many
extremely rough drafts; none of this work would be possible without him.
Andrew Burstein and Tom Burd for help with the Berkeley low power cell library. Without their help, I would still be trying to create my own.
All my friends in Boston, especially Michelle Madriaga, James Kao, Francis Lau, Sherry
Whitley, and Lapoe Lynn for making sure I had some fun while at MIT.
My family for their invaluable support during these years far from home. My parents,
Peter and Mary, my brother Han, and my sister Christine are responsible for my accomplishments. They have given me more patience, love, and support than I deserve.
Low Power ROM Generation
Table of Contents
Ch. 1 Introduction...................................................
1.1
1.2
1.3
......................................................
Low Power Fundam entals............................................................................................
Contributions of this work .......................................... ................................................
Background ..................................................
1.3.1
ROM Architecture ............................................................. ...........................
1.3.2
M em ory Cell............................................. ....................................................
1.3.3
Peripheral Circuitry ............................................................ ..........................
Ch. 2 Magic to Cadence Cell Library Conversion .......................................
2.1
2.2
2.3
3.1
3.2
3.3
3.4
3.5
..... 16
Low Power ROM Library .......................................... .................................................
Cell Library Conversion....................................................................................................
2.2.1
m2s Technology File..................................................................................
2.2.2
Cell Conversion and Generation............................
2.2.3
Design Rule Check ....................................................................................
2.2.4
Tiling...................................................................................................................
2.2.5
Verification..........................................................................................................
Sum m ary ...........................................................................................................................
Ch. 3 ROM Operation .........................................................................
9
10
11
11
13
15
16
16
17
18
18
20
21
21
....................... 22
ROM Specification............................................................................................................ 22
The ROM Block................................................................................................................ 23
3.2.1
Term inals ............................................... ....................................................... 24
Basic Operation................................................................................................................. 25
ROM Low Power Features .............................................................. ........................... 26
3.4.1
Reduced Capacitance................................................................................
26
3.4.2
Reduced Action .......................................... .................................................. 26
3.4.3
Reduced Swing .......................................... ................................................... 27
ROM Block Architecture .......................................... .................................................. 27
3.5.1
Control Signals .......................................... ................................................... 27
3.5.2
Address Latch ........................................... .................................................... 29
3.5.3
Row Decoder and Wordline Driver ........................................
............ 30
3.5.4
ROM Core ............................................. ....................................................... 31
3.5.5
Colum n Selection......................................... ................................................. 34
3.5.6
Sense Amplification.......................................................35
Low Power ROM Generation
3.6
3.7
3.8
3.5.7
XOR...................................................
............................................... ......... 37
3.5.8
Self Timing .................................................................................................... 38
3.5.9
Output Driver..............................................................................................
40
ROM Bus ......................................................................................................................... 41
3.6.1
Bus Architecture .............................................................. ............................. 41
3.6.2
Block Decode Logic ............................................................. 42
Simulation ......................................................................................................................... 44
ROM Performance ............................................... ....................................................... 46
Ch. 4 ROM Generation...................................................
4.1
............................................ 48
4.2
Usage ......................................................
4.1.1
Parameter file ........................................... .....................................................
4.1.2
Generation...........................................................................................................
Cells ..................................................................................................................................
48
49
50
51
4.3
ROMGEN ..................................................
52
4.3.1
4.3.2
4.3.3
4.3.4
52
53
55
57
Tiling Procedures .......................................... ................................................
ROM Block Tiling ..................................................
ROM Bus Tiling .............................................................. .............................
Multiblock ROM Tiling .................................... ............................................
Ch. 5 ROM M odelling...................................................
5.1
5.2
5.3
5.4
5.5
5.6
5.7
5.8
5.9
............................................. 58
Brief Description of Operation ...............................................................................
58
ROM ODEL Limitations ...................................................................................................
59
Overview ........................................................................................................................... 60
5.3.1
ROM ODEL Technology File ..........................................
................ 60
5.3.2
Node M odel ............................................. ..................................................... 61
5.3.3
M odelling Wiring Capacitances .........................................
.............. 63
5.3.4
M odelling Gate Capacitances .................................................................
64
5.3.5
M odelling Junction Capacitances .........................................
............. 69
M odelling the ROM .............................................. ...................................................... 70
5.4.1
Control Signals .......................................... ................................................... 71
5.4.2
Address Latches and Row Decoding ..................................... ..
..........
71
5.4.3
ROM Core ............................................. ....................................................... 71
5.4.4
Sense Amplifiers and XOR Decoders............................. .....
............. 72
5.4.5
Output Latch and Driver .................................................... 72
ROM ODEL Usage ............................................... ....................................................... 72
5.5.1
ROMAVG: Analyzing ROM Core Data................................. ..
.......... 72
5.5.2
ROM ODEL: M odelling the ROM ...................................................................... 73
Results: ROM ODEL vs. HSPICE.....................................................................................
75
5.6.1
Inverter Chain ........................................... .................................................... 75
5.6.2
ROM ................................................................................................................... 76
Optimization Procedure ........................................... ................................................... 76
ROM OPT Usage ............................................................................................................... 77
Interpreting the ROM OPT Report ...........................................................................
78
Ch. 6 Conclusion ........................................................
................................................ 80
Appendix A ...................................................................................
................................ 82
Appendix B ....................................................................................
............................... 87
Low Power ROM Generation
List of Figures
FIGURE 1.
FIGURE 2.
FIGURE 3.
FIGURE 4.
FIGURE 5.
FIGURE 6.
FIGURE 7.
FIGURE 8.
FIGURE 9.
FIGURE 10.
FIGURE 11.
FIGURE 12.
FIGURE 13.
FIGURE 14.
FIGURE 15.
FIGURE 16.
FIGURE 17.
FIGURE 18.
FIGURE 19.
FIGURE 20.
FIGURE 21.
FIGURE 22.
FIGURE 23.
FIGURE 24.
FIGURE 25.
FIGURE 26.
Architectures for memory arrays .........................................
..... 12
4 x 4 NAND ROM [Rabaey95]. ........................................
...... 13
4 x 4 NOR ROM [Rabaey95]. .........................................
...... 14
Part of a Magic layout of the column select cell ..................................... 19
Part of a Cadence layout of the column select cell ................................. 20
The ROM diagram ........................................................ 22
A multiple block ROM. ...........................................
......... 23
A ROM block ............................................................. 24
Basic timing diagram for a ROM block ......................................... 25
Schematic of ROM control logic ........................................
...... 28
A timing diagram of the ROM control signals. ..................................... 29
A single ROM address latch. .........................................
....... 30
ROM row decoder and wordline driver ..................................................... 31
A "0" cell and a "1" cell ..................................... ..............
32
Example showing the original ROM data vs. coded ROM core.............33
Column selection, sense amplification, and XOR circuits......................34
The effects of charge sharing of the bitline with the output node...........35
Simplified view of a charge redistribution amplifier...............................36
XOR circuit used for decoding the ROM data..................................38
Self-timing circuits for Ready signal generation. ................................... 39
Schematic of the output driver. .........................................
..... 40
A single ROM bus instance ........................................
....... 42
A 1-bit block decoder. .............................................
........ 43
A 4-bit block decoder. .............................................
........ 43
HSPICE results from a ROM simulation.............................................44
HSPICE results from a ROM simulation.............................
..... 45
Low Power ROM Generation
FIGURE
FIGURE
FIGURE
FIGURE
FIGURE
FIGURE
FIGURE
FIGURE
FIGURE
FIGURE
FIGURE
27.
28.
29.
30.
31.
32.
33.
34.
35.
36.
37.
53
ROM Core and Column Select Mirroring. ......................................
Block 0 layout of a 2 block ROM ............................................ 55
A typical ROM bus layout ...........................................
....... 56
The layout of a 256 word by 8 bit, 4 block ROM...................................57
Typical gate capacitance vs. gate voltage plot from HSPICE. ............... 63
Actual and model C-V characteristics of a NMOS device. .................... 64
Actual and model C-V characteristics of a PMOS device ...................... 65
Equivalent Gate to GND Capacitance. ......................................
... 66
Calculation of total area under CV curve from HSPICE data ................ 67
Supply current charging the drain capacitance ...................................... 70
An inverter chain for HSPICE vs. ROMODEL comparisons....................74
Low Power ROM Generation
List of Tables
TABLE
TABLE
TABLE
TABLE
TABLE
TABLE
TABLE
TABLE
TABLE
TABLE
TABLE
TABLE
TABLE
TABLE
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
Layer Equivalents ......................................................... 17
Contact Equivalents ....................................................... 17
Design Rules for Magic to Cadence Conversion .................................... 18
HSPICE results for a 16 word x 4 bit ROM ...................................... 46
HSPICE results for a 256 word x 8 bit ROM ..................................... 46
HSPICE comparison of original ROM versus modified ROM...............47
Summary of ROMGEN parameters .......................................
..... 48
ROMGEN block cells. .................................................... 51
ROM GEN bus cells. ........................................................................ . 51
Technology File Parameters...................................................................... 60
Node Variables and Descriptions............................................................62
HSPICE simulation vs. ROMODEL for 5-inverter chain. ........................ 75
HSPICE simulation vs. ROMODEL for a 16 word x 4 bit ROM........ 76
HSPICE simulation vs. ROMODEL for a 256 word x 8 bit ROM............76
Low Power ROM Generation
Chapter 1
Introduction
As chip densities and system sizes continue to increase, the ability to provide adequate cooling for the system becomes a problem and results in a need for the development
of low power design strategies. The popularity of laptop computers and other portable
computing devices is an additional reason for interest in low power strategies, since low
power designs extend the battery life in these systems. Since control systems for these
applications typically require a substantial amount of non-volatile memory, development
of low power memory design strategies is important. This work is dedicated to the investigation and implementation of these low power read-only memories.
1.1 Low Power Fundamentals
Low power techniques generally involve reducing or eliminating the components
that contribute to power consumption. These can be classified into dynamic consumption,
consumption due to short circuit currents, and static consumption:
Ptotal = Pdynamic
+
Psc + Pstatic
(EQ 1)
One property of CMOS designs that have made them so popular in low power applications
is that static power consumption is generally insignificant. The main contributors to static
consumption are: 1) circuits with static current which are avoided when possible, 2) leakage due to reverse biased diodes at the junctions which is typically very small and negligi-
Low Power ROM Generation
ble, and 3) subthreshold currents, the effect of which is minimized by having a larger
threshold voltage (VT), typically about 0.75V in current technologies. Subthreshold currents can be reduced by employing dynamic optimization techniques such as MTCMOS in
a low VT system [Mutoh95].
Short circuit currents are due to direct current paths between VDD and GND. This
can occur, for example, when both NMOS and PMOS devices of a logic gate are on during
an input transition. To minimize Psc, the ratio of rise and fall times of input and output are
kept approximately the same.
The following is the expression for dynamic power for linear capacitors:
Pdyn= CL VDD Vswingf
(EQ 2)
where CL is the total capacitance of the node, Vswing is the voltage change seen at the
node, andf is the frequency of the transition. Since dynamic power dissipation constitutes
the most significant component of power dissipation, this work is focused on reducing and
modelling dynamic power consumption. Nonlinear capacitors such as parasitic junction
capacitances and gate capacitances are modelled so that the power dissipated can be calculated (Chapter 4). Examining the expression for dynamic power consumption one can see
that reducing power consumption implies that one must reduce the load capacitance,
reduce the supply voltage as much as possible (while meeting performance specifications),
reduce the voltage swing, and reduce the frequency at which the capacitances are charged
and discharged. These are the basic guidelines for low power CMOS design.
1.2 Contributions of this work
The focus of this work is to provide a low power ROM generator, ROMGEN with
a simple interface applying new techniques in low power design and modelling. This thesis includes the software to generate the U.C. Berkeley low power ROM in the Cadence
Design Environment. Ideas borrowed from low power I/O coding are implemented to
reduce the power of the ROM further. A ROM Modeler, ROMODEL, is created to quickly
and accurately model the energy dissipated in the ROM. A number of models are presented and implemented to calculate the energy dissipated in the ROM. Finally, ROMOPT
Low Power ROM Generation
is a tool that uses the results of the modelling tool to find the optimal partitioning scheme
prior to generation.
1.3 Background
Since the architecture of a ROM is generally well defined, identifying where
power is dissipated is straightforward. Architectural decisions, memory cell design, and
peripheral circuitry all affect power consumption. Much of the recent work on memories
has involved exploring methods to reduce power consumption in each of these areas.
1.3.1 ROM Architecture
Memory arrays are typically organized so that the dimensions are of the same
order of magnitude. The reasons for this can be understood by examining the basic array
structure depicted in Figure la. For a memory with M bits of address where each word is
N bits wide, the height (2 M) is many orders of magnitude greater than the width (N) for
typical memory sizes. Not only does this result in extremely long bitlines and poor performance, this design in not implementable due to the incredibly large aspect ratio
[Rabaey95].
A partial solution to the problem is depicted in Figure lb. By placing a multiple of
words on a single wordline and using a column address to multiplex the data, the aspect
ratio is reduced. This reduces the amount of row decoding circuitry necessary to select the
wordline and decreases the bitline length and capacitance. Consequently, power is reduced
and performance is increased. However, too much column decoding is undesirable for low
power memories, because the column decoding operation discards all but one word of the
data obtained during a read. This is discussed in more detail in Section 3.5.5.
Partitioning is another technique used to make the dimensions of the memory array
manageable, as illustrated in Figure ic. By partitioning the memory into a number of
blocks, decoding logic and sense amplifiers in blocks which are not accessed can be disabled and results in substantial power savings, especially with large memories. Also,
because the local row address is decoded locally within each block, the smaller row
decoding circuitry yields faster operation and lower power because less capacitance is
Low Power ROM Generation
switched. This comes at a cost of area, however, since each ROM block must have identical row and column decoding circuitry. Dividing the memory into several smaller blocks
means that overall performance of the memory is approximately that of a single block,
which can be a substantial improvement for large memories partitioned into several
blocks.
Word 0
Word 1
Word 2
Row Address
Row Addi
Word N-
-1
.
.
Data
Data
(a)
(b)
"
FIGURE 1. Architectures for memory arrays. (a) A basic array structure. (b) Reducing the aspect
ratio by dividing the address into a row address and column address. (c) Further reducing the
aspect ratio by dividing the memory array into multiple blocks and using a block address to enable
a single block.
However, partitioning increases overhead which could affect performance and
power dissipation in a negative way. Partitioning means that capacitance on the address
Low Power ROM Generation
and data lines is increased, since ROM blocks share the same buses. Also, dividing the
ROM into blocks requires additional block selection logic, which could possibly increase
overall power dissipation and increases the setup time before a read cycle can begin.
These trade-offs are a main reason for the development of the ROM modeler and optimizer which will be used to find the optimal partitioning scheme.
1.3.2 Memory Cell
Memory cell designs typically do not change much and are heavily dependent on
process technology, however, there are some choices that can be made. The choice
between NAND and NOR arrays is typically simple; NAND arrays are typically too slow
to be practical.
r-4
A
_
•
l-
A
• L
•
*BL[O]
[
• L
•
• L
Pull-un devices
....
•m•-- • ......
BL[1] -oBL[2] -* BL[3]
WL[O]
WL[1]
uVDD
"
.L_
IIr-
I
1·
fi
WL[2]
WL[31
.1L
I
BL[3]
-I
FIGURE 2. 4 x 4 NAND ROM [Rabaey95].
Figure 2 shows an example NAND ROM array. All the wordlines are initially set
to "1", and the selected wordline is set to "0". If no transistor is present on the bitline, the
bitline remains discharged by the series of NMOS devices, which is equivalent to reading
a "1". When a transistor is present on the wordline, setting the row to "0" turns off the
transistor, which eliminates the path to ground for the bitline. The PMOS pull-up raises
the bitline voltage, which is equivalent to reading a "0". This configuration produces a
very dense ROM core, however, due to the long chain of NMOS devices connected in
Low Power ROM Generation
series, this NAND structure yields unacceptably slow read times for medium and large
sized ROMs.
An example NOR ROM is depicted in Figure 3. During a read cycle, the selected
wordline is set to "1". If a transistor is present, it is turned on a discharges the bitline,
which is equivalent to reading a "1". If a transistor is not present, the bitline remains
charged, meaning the data is a "0". This results in a substantial increase in performance in
the worst case over the NAND ROM because a single NMOS transistor lies between the
bitline and GND. However, the size of a single cell doubles (as compared to the NAND
structure) because of the GND wires in the ROM core. But for large ROMs, this increase
in size is worth the increase in performance.
rCl"I
r'"I
rI
I
'
WL[O]
-L'
F I"
Pull-u devices
1
----
L
GND
WL[1]
WL[2]
GND
WL[3]
BL[0]
BL[BL
BL[2]
BL[3]
FIGURE 3. 4 x 4 NOR ROM [Rabaey95].
Instead of using PMOS devices as pull-ups in a pseudo-NMOS configuration (as
shown in Figure 3), NMOS devices will be used to precharge the bitline dynamically to
VT below VDD. By eliminating the static PMOS pullups, the direct path from VDD to
GND is eliminated, saving power. By using an NMOS device to charge the bitline, the
worst case bitline swing is (VDD-VT), reducing the power dissipation for discharging a
bitline. Also, because the source of the NMOS precharge device is not GND, the threshold
voltage, VT is higher than VTO due to the body effect, further decreasing the worst case
Low Power ROM Generation
bitline swing.
In the interest of low power, reducing the number of transistors in the ROM core is
important for a couple reasons: 1) Fewer transistors in the ROM means less capacitance
on the wordlines and bitlines, yielding faster switching time and lower energy consumption, and 2) reading a "1" (transistor present) requires that the bitline is discharged, which
consumes power, whereas reading a "0" (no transistor) consumes no energy. The importance of the latter is multiplied by the fact that due to column selection, the majority of the
information read from the bitlines is discarded. Thus, to reduce the transistor count in the
ROM core, methods for low-power I/0 [Stan89] [Tabor90]are adapted and applied.
1.3.3 Peripheral Circuitry
Low power techniques in the peripheral circuitry also play an important role in low
power design. A cascode charge redistribution sense amplifier is utilized for low power
and quick sensing of bitline data. Eliminating glitches is important because glitches waste
power, and thus, the peripheral circuitry is designed to reduce the amount of glitches that
occur. Eliminating glitches in the output drivers is especially important, since the capacitances on the output devices is typically large, and glitches on the outputs would result in a
considerable waste of energy.
Low Power ROM Generation
Chapter 2
Magic to Cadence Cell Library Conversion
The ROM generator is a tool for quickly and easily creating a ROM applying the
methods of low power design found to be effective in our investigation. Part of the structure of the ROM generator will be based on work done by A. Burstein [Burstein96] using
the Berkeley low power cell library, however a number of modifications were necessary to
make the ROM generator operational in Cadence. This section describes the conversion of
the cells into Cadence.
2.1 Low Power ROM Library
The Berkeley low power ROM library serves as a starting point for the ROM generator. Porting the cell library to the Cadence design environment involves a number of
steps, including the conversion of Magic cell layouts to Cadence, verification of design
rules (due to the different technologies being used), updating the layouts with circuit schematics, and finally verification of layouts versus schematics (LVS). HSPICE is used to
verify correct operation of the cells. Finally, sample ROMs were generated and simulated
with HSPICE to verify the conversion of the Berkeley ROM generator into Cadence.
2.2 Cell Library Conversion
The layouts were converted using a tool created by Thucydides Xanthopoulos
called m2s (magic to skill). This program translates Magic layout files (.mag) to Cadence
Low Power ROM Generation
Virtuoso format. The following sections describe the steps used to convert the cells.
2.2.1 m2s Technology File
A technology file is needed to translate Magic layers to Cadence layers. The m2s
tool extracts the data from the Magic file and creates a generator for the Cadence cell
using the technology file to translate the different layer names. Thus, the technology file
must be edited to match the Cadence and Magic setup. For this cell conversion, the technology file provided was sufficient (See Appendix B). The layers relevant to this process
(HP26) are listed in Table 1.
Magic Layer
Cadence Layer(s)
pwell
none
nwell
nwell
polysilicon
poly
ndiffusion
ndiff
pdiffusion
pdiff
metall
metal 1
metal2
metal2
metal3
metal3
ntransistor
ndiff poly
ptransistor
pdiff poly
psubstratepdiff
psub
nsubstratendiff
nsub
glass
overgla
TABLE 1. Layer Equivalents.
In Magic, contacts have their own layer, whereas in Cadence, the contact as well as
the layers being connected must be specified. For example, in Magic, a minimum area
polycontact is (4X) 2 . In Cadence, however, the same contact consists of a (4k)2 poly layer
overlapping a (4k)2 metall layer with a (2k)2 contact. This conversion from contacts to
multiple layers is handled in the technology file and listed in Table 2.
Magic Contact Name
Cadence Layers
polycontact
poly metall cont
ndcontact
ndiff metall cont aa
pdcontact
pdiff metal 1 cont_aa
TABLE 2. Contact Equivalents.
Low Power ROM Generation
Magic Contact Name
Cadence Layers
m2contact
metall metal2 via
psubstratepcontact
psub metall cont aa
nsubstratencontact
nsub metall cont_aa
TABLE 2. Contact Equivalents.
In addition to specifying the layers, design rules need to be specified to indicate
contact size, spacing and overlap during the conversion of contacts from Magic to
Cadence. These are also contained in the technology file and shown in Table 3. Two layers
that needed to be added for the new process were the n-select and p-select layers. The
effects of this are described in more detail in Section 2.2.3.
Design Rule
Size or Length
CONTACT_SIZE
2
CONTACT_SPACING
2
CONTACT_OVERLAP
1
VIASIZE
2
VIA_SPACING
2
VIA_OVERLAP
1
SELECT_OVERLAP
3
TABLE 3. Design Rules for Magic to Cadence Conversion.
2.2.2 Cell Conversion and Generation
Once the technology file is specified, the m2s tool is executed, with the Magic cell
and target Cadence library specified. A skill file is created with the same name as the cell
with a ".il" extension. This is a SKILL file that generates the layout in Cadence. To generate the layout, one must create a cell in the target Cadence library with the cell name. Then
the SKILL file is loaded and the layout is generated by typing "lg". All the cells were correctly generated, however, there were a few errors that occurred when performing the
design rule checks (DRC) described in the next section.
2.2.3 Design Rule Check
A number of design rule errors were found in the generated layouts using m2s. The
two most common problems were due to input/output terminals (pins), and the addition of
the select layers. Fixing these problems while maintaining ROM density was a primary
Low Power ROM Generation
-
II
-
___
·
·
concern.
Pins/Terminals. m2s creates pins described in the Magic layout and conveniently adds a
text label as well. However, terminal sizes are assumed to have a size of (4x)2 . Thus, when
the terminal is created, a layer of material is created overlapping the terminal location.
These (4X) 2 size terminals were often too large, and consequently resulted in a number of
design rule errors. These errors were fixed by adjusting or replacing the terminals.
FIGURE 4. Part of a Magic layout of the column select cell. An NMOS transistor with the source
tied to GND is depicted. The three contacts pictured above are two diffusion contacts (right of poly
and far right) and one well contact (middle). This process allow abutting diffusion and well
contacts.
Select Layers. The addition of the select layers to the cells introduced a large number of
design rule errors that were difficult to fix. Because a ROM is designed to maximize density, a majority of the cells had little or no space to add the necessary n-select and p-select
layers. These layers overlap diffusion regions and substrate contacts and degrade the density of the layouts because diffusions of opposite type cannot abut, though the original
cells allowed this. Permitting opposite diffusion types to abut is potentially dangerous
because contacts placed next to each other may become shorted. By adding the n-select
and p-select layers, this problem is avoided, and allows this ROM library to be used with
other processes.
Maintaining a high density was important, however increasing some cell sizes was
Low Power ROM Generation
I
~ __ _
___
unavoidable. This degradation of the density is most pronounced in the size of the ground
wire in the ROM core. Due to the large sizes of the well contacts, the introduction of the pselect and n-select layers, and the density of the column select cell, the density of the
ROM core was decreased slightly to accommodate the extra area needed for n-select and
p-select areas. This problem is exacerbated because the increase in the column width
(+4X) is multiplied by the number of bits in the ROM. Other cells were increased in size,
however these had an only minor effect on the overall ROM area. Figure 4 and Figure 5
illustrate the changes made in the ROM cells.
FIGURE 5. Partof a Cadence layout of the column select cell. The same NMOS transistor with the
source tied to GND is depicted. For tiling reasons, one of the diffusion contacts was deleted. The
two contacts pictured above are a diffusion contact (right of poly) and a well contact (far right).
The minimum spacing between well and diffusion contacts is larger and increases cell size.
2.2.4 Tiling
A number of tiling functions were also written for ROM generation (Chapter 3)
These tiling routines require that each cell has a special layer used to align each cell relative to other cells. This layer, "prBoundary", was added to each cell for tiling purposes.
Though the original magic cells contained tiling information, the m2s tool does not generate prBoundary information with the technology file used in this library conversion.
Therefore, the prBoundary layers were added manually and adjusted as necessary.
Low Power ROM Generation
2.2.5 Verification
A number of steps were taken to verify that the conversion of the cell library was
complete. Each of the cells were edited and modified to pass DRC. After passing DRC,
schematics were created for the cell library. The Berkeley low power library lacked schematics for each cell, although some general schematics detailing the ROM exist
[Burstein96]. Thus, it was necessary to reverse engineer most of the cells to generate schematics for the library. Finally layout versus schematic checks (LVS) were executed to verify equivalence.
HSPICE was used to verify the operation of a number of larger blocks, such as the
address latch, control circuits, and column select cells. ROMs of various sizes (Chapter 4)
were generated and simulated with HSPICE, verifying that the ROM library conversion
was complete.
2.3 Summary
The steps used to convert the library are summarized below:
1. Create/obtain an m2s technology file.
2. Execute m2s on the magic cell:
m2s [magic filename]
[target Cadence library]
The magic filename does not include the .mag extension. This creates a generator for
this cell with a ".il" extension.
3. Target Cadence library is created (if necessary).
4. Target cell name is created in Cadence library.
5. Cell generator is loaded from the CIW:
CIW> (load "cellname.il")
6. Cell generator is executed from the CIW:
CIW> ig
7. Terminal sizes are fixed, prBoundary layer is added.
8. Cell is edited until design rule checks are passed (DRC).
9. A schematic for the cell is created.
10.Schematic versus layout is verified (LVS).
11.HSPICE simulations verify correct operation of major cells.
Low Power ROM Generation
Chapter 3
ROM Operation
This chapter explains the operation of the ROM at the circuit level and describes
the techniques used to reduce power consumption. After an overview of the ROM (based
in part on [Burstein96]), the architectures of the ROM block and ROM bus are examined.
3.1 ROM Specification
CLK
A
D
PORB
FIGURE 6. The ROM
In the attempt to make the design useful to the user, a simple interface to the memory was chosen. The inputs to the system are an address and a clock that meets the maximum frequency requirement. In order to reduce power, the clock should be gated so that it
is asserted only when a read operation is taking place, thus preventing the ROM from executing a wasted read cycle. After a rising clock edge, the ROM will exit the precharge
Low Power ROM Generation
phase, enter the active mode, and re-enter the precharge phase when the data has been
sensed. This will be described in more detail below. Figure 6 is a simple block diagram of
the ROM, where CLK is the clock input, A is the input address bus, D is the output data
bus, and PORB is the low active power on reset signal.
Typically, the ROM is broken up into a number of ROM blocks to reduce power
and to increase performance by reducing the aspect ratio. By having only a single block
enabled during a read, the energy dissipated per cycle of the entire ROM is approximately
that of a single block. A typical ROM configuration is depicted in Figure 7.
blockO
block2
block4
*
*
*
blockl4
address and data buses, block decoding circuitry, and static latches
blockl
block3
block5
*
*
*
blockl5
FIGURE 7. A multiple block ROM.
Each block shares the same clock, address bus, data bus, and reset signals. Block
decoding circuitry ensures that only a single block is enabled at a time to eliminate the
problem of bus contention. Finally, static latches on the data bus ensure that the data
remains valid even if no reads are requested for a long amount of time. The number of
ROM blocks for a ROM can range from 1 to 16. The following sections will describe the
ROM blocks and ROM bus operation in further detail.
3.2 The ROM Block
The ROM block generated by ROMGEN is designed to be connected with other
blocks with the ROM bus, Figure 8 depicts a block diagram of a single ROM block (VDD
and GND not shown). The following sections describe the operation of a single ROM
Low Power ROM Generation
block in more detail.
CLK
ENABLE
D/
A
PORB
FIGURE 8. A ROM block.
3.2.1 Terminals
Address Bus. A is the input address bus. The width of the address bus is automatically set
by ROMGEN depending on the number of WORDS contained in the ROMGEN parameter file. For tiling reasons, the minimum number address lines per block is 3.
Clock. CLK is the clock input. On the positive edge of CLK, the ROM output will
become tristated. If ENABLE=1 and PORB=1 on the rising clock edge, the ROM will
proceed with a read after tristating the data bus. The read data will remain latched at the
output until the next positive clock edge. CLK may be held high or low until the next read
operation.
ENABLE. ENABLE is used to enable the read operation. On the positive edge of CLK, if
ENABLE= 1, the ROM tristates the output data bus and proceeds with a read operation. If
ENABLE=O on the rising edge of CLK, the ROM only tristates its outputs. The main purpose of the enable signal is to allow multiple blocks to be tied to the same data and address
buses. Typically, higher order address bits are decoded to enable a single ROM block during a read cycle.
PORB. PORB is the active low reset signal. The purpose of this signal is to tristate the
outputs upon power up. This is important for ROMs with multiple blocks that share the
same data bus.
Low Power ROM Generation
Data Bus. D is the output data bus, which is BITS wide. For tiling reasons, the number of
bits must be even. The LSB (rightmost bit) is D[O]; the MSB (leftmost bit) is D[BITS-1].
3.3 Basic Operation
The basic operation is straightforward. Once the address lines and enable lines are
set to proper values, CLK is raised. The data will appear after a delay, taccess. The ROM
will be ready for another read after cycle time tcycle. Only positive edges of CLK are relevant for access and cycle times. Access time and energy per cycle are a function of
WORDS, BITS, load capacitance, and power supply voltage. Figure 9 is a timing diagram
that demonstrates basic operation.
tive)
CLK
A
D
ENABLE
PORB
time
FIGURE 9. Basic timing diagram for a ROM block.
The first rising clock edge represents a read operation in which the ROM is
enabled. ENABLE is asserted, meaning that the block contains the data to be read. The
address must be stable when CLK is raised. The positive edge of CLK always tristates the
data bus. After an access delay, the data is available at the output bus, D, and after a cycle
time delay, the ROM is ready for another read.
Low Power ROM Generation
The second rising clock edge represents a read operation in which the particular
ROM block is not enabled. The only effect of the rising clock edge is the tristating of the
output bus.
3.4 ROM Low Power Features
ROMGEN is designed to be a low power ROM generator. The following highlights the techniques used to lower power consumption in the ROM.
3.4.1 Reduced Capacitance
Low Power ROM Core. Techniques adapted from low power I/O coding can be applied
towards memories to decrease power consumption in the ROM. By coding the ROM data
such that the number of transistors is reduced, a significant reduction in power consumption is possible.
For a conventional ROM core with a worst case of N transistors per wordline, the
modified ROM core has a worst case of N/2 transistors per wordline. This is accomplished
by storing the compliment of the word if the word is "heavy", meaning that more than
50% of the memory cells on the wordline contain transistors (data = "1"), otherwise the
data is stored directly into the ROM core. An Invert bit (INV) is stored in the ROM core to
flag the wordlines that have been inverted, and is used by the decoding circuit to restore
the original data. By coding the ROM core in this manner, the number of transistors ("l"s)
is reduced, which reduces the average capacitance on the wordlines and the bitlines, thus
reducing power. Also, by reducing the number of transistors on the wordlines, the number
of bitlines that are discharged decreases, saving power. And because the worst case delay
is decreased, the data can be sensed from the bitlines in a shorter amount of time, which
may result in faster overall access time, depending on the ROM dimensions.
3.4.2 Reduced Action
Multiple Block Capability. A ROM can be broken up into multiple ROM blocks that
share address and data buses. By decoding some of the address bits to enable a single
ROM block during a read, only one block will be activated during each read. Therefore,
Low Power ROM Generation
the overall power consumption and access time of the entire ROM is approximately that of
a single ROM block.
Non-glitching Outputs. At the beginning of a read cycle, each ROM block tristates its
outputs. Using self timing circuitry, the output data is enabled when the data is stable. This
eliminates glitches on the highly capacitive output data bus, which would waste power.
3.4.3 Reduced Swing
Low Voltage Operation. The ROM uses self timing to maximize speed with various supply voltages. This allows operation at lower voltages, which reduces the maximum swing
of the node, and hence, lowers power consumption.
Reduced Swing Bitlines. Each bitline is precharged to an NMOS threshold voltage below
VDD. Thus, the maximum swing is reduced to (VDD - VT). Energy dissipated in charging
and discharging the bitline is given by Eq. 3:
E = C VDD Vswing
(EQ 3)
Since the swing is reduced from VDD to (VDD - VT), the reduction in energy is:
Energy reduction factor =
VDD
VDDoo - VT
(EQ 4)
For low power supplies especially, this substantially reduces the voltage swing (and hence
the energy dissipated) in the bitlines (50% for VDD=1.5V and VT=0.75V).
3.5 ROM Block Architecture
The ROM block consists of address latches, row decoding circuitry, wordline drivers, column select devices, sense amplifiers, output drivers, self-timing logic, control
logic, and the ROM core. The following sections describe each part in greater detail.
3.5.1 Control Signals
Figure 10 shows the control logic for the ROM. Upon power-up, PORB is low,
which sets the SR latch and drives OEN low. This tristates the bus and is independent of
Low Power ROM Generation
A, CLK, and ENABLE. This ensures that there is no bus contention upon power-up.
ENABLE
PORB
Ready
EalWord
FIGURE 10. Schematic of ROM control logic.
On the rising edge of CLK, a low pulse is generated by the 2-input NAND gate
connected to CLK. This pulse sets the SR latch and drives OEN low which tristates the
output bus. If ENABLE is not asserted, then this is the only action that occurs.
If ENABLE is high during the rising edge of the clock, another low pulse is gener-
Low Power ROM Generation
ated by the 3-input NAND gate connected to CLK and ENABLE. This forces EvalAddr
high, which begins the address evaluation phase.
At the beginning of address evaluation, all address lines (A, Abar) are precharged
high. Address evaluation is completed when the PMOS device connected to EvalAddrBar
pulls the AddressReady node high. This can only occur when all the address latches have
pulled down either A or Abar. Once this occurs, AddressReady has no path to ground and
the PMOS device pulls AddressReady high. This signals that the address evaluation phase
is complete, the word evaluation phase begins (EvalWord= 1).
CLK
EvalAddr
AddressReady
EvalWord
Ready
time
FIGURE 11. A timing diagram of the control signals. On the rising edge of CLK, EvalAddr is set,
which begins the address decoding operation. Once the addresses have been latched and A and
ABar have been generated, AddressReady is asserted. AddressReady signals the completion of the
address decoding, and begins the sensing operation by setting EvalWord high. EvalWord raises the
wordline and allows the sense operation to complete. Once the worst case reference signal, Ready,
is read, the sense operation is complete. Ready resets EvalAddr, AddressReady and EvalWord,
which returns the ROM to its initial (precharge) mode. Ready is deasserted once the Ready bitline
is precharged, and the ROM is ready for another read operation.
The assertion of EvalWord turns off the precharging of the bitlines and wordline
drivers, disables OEN, and allows the sense operation to proceed. The self-timed signal,
Ready, signals that the read operation is complete and resets EvalWord and EvalAddr back
to their initial precharge states. The ROM control logic is shown in Figure 10, and
Figure 11 is a diagram illustrating the timing of the control signals.
3.5.2 Address Latch
Figure 12 shows the dynamic latch used in the ROM. When EvalAddr is low (iniLow Power ROM Generation
tial state), the two PMOS devices are on, and thus A and ABar are precharged to a logic 1.
This also means that the transmission gate is on, which passes the address input, Ain, to
the pull down devices for ABar and A.
A pulse generated on the rising edge of CLK begins the read operation by setting
an SR latch for EvalAddr. Once EvalAddr is asserted high, the precharge PMOS devices
shut off and the NMOS pull down device is turned on. The transmission gate is disabled,
thus latching the address data. Since the pull down device is now on, either A or ABar is
pulled low. A and ABar are then passed to the row decoding circuitry.
Ain
k
EvalY
EvalAddrBar
VDD
ABar
A
FIGURE 12. A single ROM address latch. [Burstein96]
Note that only NMOS devices are used during the evaluation step. This helps
speed up operation because the slower PMOS devices are used only in the precharge
stage, when speed is not as important. The PMOS devices can, however affect the cycle
time, since the A and ABar must be fully precharged before another read cycle can begin.
3.5.3 Row Decoder and Wordline Driver
The address latches drive the address to the row decoder. The address is decoded
Low Power ROM Generation
with an NMOS pull down network, shown in Figure 13. When the all the addresses lines
(AO, AOBar, etc.) have settled, EvalWord is raised. A single address is decoded by the network and pulls down the node at the end of the row decoder chain. Note again that only
NMOS devices are used in the critical part of the address decoding circuit which allows
faster operation and smaller devices. This PLA-type decoding style also simplifies the tiling, which makes generating the row decoding logic for different ROM sizes easier.
EvalWord
AO
W
1-
AOBar
Al
Al Bar
1
-I
-f
-•
oeo
r_
-I
oeo
_r
I
FIGURE 13. ROM row decoder and wordline driver (adapted from [Burstein96]).
Figure 13 also depicts the wordline driver (only one wordline driver is drawn for
simplicity). EvalWord is initially low at the beginning of a read cycle. In this state, the
EvalWord PMOS is on, which precharges the gate of the PMOS wordline pullup high. The
wordline is discharged by the NMOS device while EvalWord is low. When EvalWord is
asserted, the address is decoded and one of the rows pulls down the gate of the PMOS
device, which drives the corresponding wordline high.
3.5.4 ROM Core
The main idea in the design of the memory cell is to achieve the maximum signal
Low Power ROM Generation
-
--
~II
"
-.-. ~.--.C~...1
I
sL~I-
strength obtained in the minimum area, due to the massive number of cells typical of a
ROM. Memory cells are dominated by technological considerations and cell topologies
have not varied much over the last decades. Recent improvements in density have been
due to technology scaling and advanced manufacturing processes, however there are still
design techniques that can reduce power consumption on this level of ROM design as
well. Figure 14 illustrates the sizes of the cells in the NOR-type ROM core.
FIGURE 14. A "0" cell (left) and a "1" cell (right). Note that the contact and diffusion wire (GND)
dominates cell size.
Methods investigated for low-power 1/O [Stan89] [Tabor90] can be applied to
reduce the amount of capacitance on the wordlines and bitlines which will speed up operation and lower the amount of power dissipated. This technique involves coding data such
that the data has less "weight". In the case of ROM core data, words that have fewer "l"s
than "O"s are considered low weight, because "l"s correspond to transistors being present
in the ROM core, which increases wordline and bitline capacitance. Thus, the low weight
codes are applied to the data such that a reduction of transistors in the ROM core is
obtained. Figure 15 is an example illustrating the process.
The ROM core data is coded such that the fraction of 'l's on a particular wordline
(and hence, the number of transistors on that wordline) is never greater than 50%. This is
accomplished in the following manner:
1. Assume N is the maximum number of transistors on a wordline.
Low Power ROM Generation
I
I'-
--
-'-pc
2. If the number of transistors on a particular wordline is less than or equal to N/2, the
wordline remains unchanged and the INV bit is marked as a "0", meaning that the data
is not inverted.
3. If the number of transistors on a particular wordline is greater than N/2, the word line is
inverted and the INV bit is set to a "1", meaning that the wordline has been inverted.
The additional bit, INV, is also stored in the ROM core on the corresponding wordline.
ROM Data
(Conventional ROM Core)
Coded ROM Core
INV bit
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
0
0
1
0
0
0
1
1
0
1
0
0
1
1
1
0
0
0
1
1
1
0
0
1
0
0
0
0
0
1
0
0
0
0
0
0
0
1
1
1
1
1
1
0
0
0
0
0
1
1
1
1
1
1
1
0
0
0
0
0
0
1
E6
>4
Decoded Data
FIGURE 15. Example showing the original data (also the configuration for a traditional ROM
core) versus the coded ROM core. A "1" corresponds to an NMOS on the wordline, and a "0"
means that there is no transistor present. In this case, 19 transistors in the traditional ROM was
reduced to only 10 transistors in the coded ROM, an improvement of nearly 50%.
The coded core data must be XORed with the INV bit before being passed to the
output driver. It can be seen by inspection that XORing the data with the INV bit will yield
the uncoded data.
By using low weight codes for the data, the number of transistors stored in the
ROM core is reduced. Fewer transistors in the ROM core means less capacitance on the
wordlines and bitlines, saving power and increasing performance. Also, a smaller number
of transistors on the wordline means that fewer transistors are turned on. Since there are
fewer transistors on the wordline, fewer bitlines will be discharged. This is especially
important because bitlines tend to be highly capacitive, and reducing the number of bitlines that switch can result in substantial power savings.
Low Power ROM Generation
3.5.5 Column Selection
Column decoding is necessary to reduce the aspect ratio, however this decoding
results in a waste of power. For an M bit column decoder, M-1 bits are discarded after the
read. Of those M-l bits, the bitlines that switched (logic "1") contribute to wasted power.
It is evident that due to column selection, reducing the number of transistors in the ROM
core becomes even more important, since power dissipation in the bitlines becomes a
larger factor. Thus, it is clear that column decoding should be kept at a minimum. However, eliminating column decoding entirely is not a possibility because it would make the
aspect ratio very large and also make the pitch matching of the sense amplifiers impossible
[Burstein96].
Figure 16 shows the column selection and sense amplification circuitry. The two
low order address bits of the ROM are used to select which column to enable. One of the
select bits is enabled (Sel[3:0]), which allows the bitline data to pass through to the sense
amplifier. The sensing operation is described in more detail in the next section.
UIEvoal' ordEiar
FIGURE 16. Column selection, sense amplification, and XOR circuits.
Low Power ROM Generation
rk.t
3.5.6 Sense Amplification
The sense amplifier used in the ROM is also depicted in Figure 16. A simplified
schematic of the charge redistribution sense amplifier is illustrated in Figure 18. Typically,
a cascode device with its gate connected to a fixed reference voltage Vref is inserted
between the load and driver transistor. In the ROM circuit, the column select device acts
as the cascode device with Vref equal to the column select gate voltage (VDD).
t=0
+
Vbitline
+
out
itlin
Vout
Cbitline >> Cout
+
totalVfinal
Ctotal = Cout + Cbitline
FIGURE 17. The effects of charge sharing of the bitline with the output node. At time t=O, the two
nodes are effectively shorted (top). The equivalent circuit at t=0 + is shown at bottom.
Figure 17 illustrates the concept of charge sharing. At the beginning of the sensing
operation (EvalWord=1), the drain of the NMOS column select device is precharged to
VDD, and the source (bitline) is precharged to VDD-VTN by an NMOS device. Once the
wordline is raised, the bitline starts discharging if the data is a logic "1" (transistor
present). This change in bitline voltage turns the column select device on and redistributes
the charge between the highly capacitive node (bitline) with the small capacitive node at
Vout.Before the column select device turns on, the charge on each of the capacitors is:
Qbitline = Cbitline Vbitline
(EQ 5)
Qout = Cout Vout
(EQ 6)
At time t=O+, the column select device is turned on and the nodes are shorted. By charge
conservation, the total charge is equal to the sum of the charge on both capacitors:
Low Power ROM Generation
Qtotal
=
Qbitline + Qout
=
Cbitline Vbitline + Cout
(EQ 7)
V out
Since Cbitline and Cout are in parallel, the value of Ctotai is known:
(EQ 8)
Ctotal = Cbitline + Cou t
Using this expression, the total charge can also be expressed:
(EQ 9)
Qtotal = Ctotal Vfinal = (Cbitline + Cout) Vfinal
By equating Eq. 7 and Eq. 9, we obtain an expression for the final voltage, Vfinal.
Cbitline Vbitline + Cout Vout
Vfinal
(EQ 10)
Cbitline + Cout
Since Cbitline >> Cout, Eq. 10 simplifies to:
Vfinal
=
(EQ 11)
Vbitline
This shows that as soon as charge redistribution begins (when the bitline starts discharging), the column select device quickly equalizes the output voltage to the bitline voltage.
OUT
FIGURE 18. Simplified view of a charge redistribution amplifier [Rabaey95].
For low power supples (1.5V), the precharged bitline voltage, VDD-VTN is near the
switching threshold, VM, for the inverter (because VM is designed to be approximately
Low Power ROM Generation
VDD/ 2 ) which reduces the amount of swing on the bitlines necessary to read the data.
Therefore, once the bitline starts discharging, the voltage at the input of the inverter
quickly drops to VDD-VTN. As the bitline continues to discharge and reaches a voltage
less than VM, the inverter output will switch and the read of the "1" is complete.
Although the NMOS device in the ROM core is fighting against the weak PMOS
pull-up at the drain of the column select device, the NMOS is much stronger due to larger
sizing, greater mobility, and lower threshold than the PMOS device [Burstein96]. This
introduces some static power dissipation once the bitline discharges, which is another reason why reducing the number of "1"s in the ROM core is important.
In the case when there is no transistor present on the bitline, the source of the column select device remains at an NMOS threshold below VDD. The column select NMOS
remains off, leaving the drain at VDD. The inverter completes the read of the stored "0".
By precharging the column to an NMOS threshold voltage below VDD, the maximum swing on the bitline is reduced to VDD-VTN (for a logic "1") while the best case
swing is still approximately 0 (for a logic "0"). Since the bitlines can be large, this results
in substantial power savings, especially for low power supplies. Another advantage of low
power supplies is that the threshold voltage of the inverter is very close to VDD-VTN,
which means that only a small swing on the bitline is necessary to switch the inverter and
complete the read operation.
3.5.7 XOR
An XOR gate was needed to restore the data if the particular wordline was inverted
in the ROM core. By XORing the INV bit stored in the ROM with the sensed data, the
original data is obtained. Figure 19 is the circuit schematic of the XOR shown in
Figure 16.
When A=O, the transmission gate is on, and B is passed through to F. However,
when A=l, the source of the PMOS device is a logic 1, and the source of the NMOS
device is a logic 0. In this configuration, the rightmost NMOS and PMOS devices can be
viewed as an inverter, with B as the input. Thus, if B=0, the PMOS device is turned on and
Low Power ROM Generation
F= 1. If B= 1, the NMOS device is turned on and F=O.
A
93
F
FIGURE 19. XOR circuit used for decoding the ROM data.
3.5.8 Self Timing
The ROM core contains an extra reference wordline and bitline used for self-timing. Figure 20 shows the circuits used to generate the self timed signal, Ready.
The Ready signal is generated using circuits identical to those used by the ROM to
sense and decode data, which can be seen by comparing Figure 20 with Figure 16. Ready
signals the completion of the data sensing operation. Therefore, the reference wordlines
and bitlines that generate the Ready signal must have the worst case delay to guarantee
that the sense operation is complete and that the data is stable.
For a ROM with M wordlines and N bitlines, the reference wordline has the worst
case number of NMOS gates (represented by the bottom row of transistors in Figure 20):
N/2 for the ROM core, and an additional transistor to pull down the bitline and generate
the self-timing signal, Ready. The bitline has a worst case number of NMOS drains connected to it, M. The column selection, sense amplification, and decoding are identical,
thus including the delays for a typical sense operation. Inverters were added to buffer
Ready and to generate ReadyBar. (4 inverters were necessary because ReadyBar must be
Low Power ROM Generation
stable before the Ready signal can be asserted.)
Ece
Nord
Sela
r
S
LYCrhordB-
r
-r
-U--
i i,
-I
ý
PecqyWorc
FIGURE 20. Self-timing circuits for Ready signal generation.
By using identical circuits to read the data in the worst case, the assertion of Ready
guarantees that all the read operations are complete, and the data may be passed to the output devices. Speed is maximized at different voltages because data can be sent to output
devices and the address and word evaluation circuits can return immediately to precharge
state to prepare for the next read cycle. This Ready signal also guarantees that the data is
stable, which ensures glitch free operation of the output driver.
Low Power ROM Generation
3.5.9 Output Driver
The output driver must be non-glitching because glitches on the large capacitance
of the data bus result in wasted power. Glitches can occur at the output of the XOR gate,
Rady
EvalWord
Dataln
EvalwordBar
ReadyBar
FIGURE 21. Schematic of the output driver.
Low Power ROM Generation
depending on the time required for the bitline data and the INV bit to be sensed. Additional pass gates placed before the output driver shown in Figure 21 ensure that the XOR
outputs are stable before being sent to the data latches.
At the beginning of a read cycle, OEN goes low, which sets the static latches to the
initial state, tristating the data outputs. Once the evaluation of the data begins (EvalWord
becomes asserted), OEN is asserted high so that it does not interfere with the sensing operation.
The data is taken from the XOR device only when EvalWord and Ready have both
been asserted, which is the reason for the pass transistors before the static latches. This is
to ensure that the XOR device has received stable INV and data before allowing the static
latches to latch the data.
Consider the case when the data is a logic "0". EvalWord and Ready are asserted
so the pass gates are enabled. The input of the latch leading to the NMOS output device is
brought low, which means that the output device is turned on, bringing the data output
low. For the PMOS output device, this case is the same as its initial precharged state. The
static latch does not switch and the PMOS device remains off.
In the case when the data is a logic "1", the latch leading to the PMOS output
device is raised, meaning the gate of the PMOS output device is low. This turns on the
PMOS device, which drives the data bus high. For the path to the NMOS device, this is
the same as its initial state, so the static latch does not switch and the NMOS output device
stays tristated.
3.6 ROM Bus
The bus provides a path for CLK, PORB, and power supplies to the ROM blocks
and allows multi-block operation. The bus connects the address and data buses of the
ROM blocks together and provides the necessary block decode logic. This section is an
overview of the ROM bus.
3.6.1 Bus Architecture
A typical multi-block ROM is divided up into two rows of blocks, separated by the
Low Power ROM Generation
bus, as shown in Figure 22. A single ROM bus instance connects together two ROM
blocks (known locally as bkO and bkl) in a particular column and provides connectivity to
other (adjacent) bus instances. This connection of two ROM blocks with a bus is referred
to as a "block column".
The bus connects all relevant control signals and supply lines to each ROM block.
In addition, the block decode logic is generated to enable bkO, bkl, or neither. Weak static
latches are provided (only for the first block column) to ensure that the data remains valid
when the data bus is tristated.
Block Decode
to other bus cell
to other bus cell
L4i
k2
Kinus1
k3k
A block column
A typical ROM configuration
FIGURE 22. A single ROM bus instance provides VDD, GND, CLK, PORB (not shown) to the
ROM blocks, connects the address and data buses, provides block enable information, and static
latches for the data bus (only for 1 block column). Block columns are then tiled together, connected
to other buses to complete the multi-block ROM.
3.6.2 Block Decode Logic
The block decoding circuitry is necessary to ensure that only 1 block is enabled
during a read. The high order address bits are used to decode the block to be enabled; this
Low Power ROM Generation
group of address bits are called the block address. For an N-bit block address, N-1 bits are
used for global block decoding (selecting which block column is to be enabled) and 1 bit
is reserved for local block decoding (either bkO or bkl). The current ROM bus has a limit
of 4 block address bits, yielding a maximum of 16 blocks per ROM.
sell
A
selo
FIGURE 23. A 1-bit block decoder. (2 blocks)
For a ROM with two ROM blocks, an inverter provides the necessary block enable
logic (Figure 23), since no global block decoding is necessary. However, for ROMs with
more than two ROM blocks, the circuits for block selection must be designed to be easily
programmed since each block column has different global decode logic.
A3
Al
AOb
FIGURE 24. A 4-bit block decoder. (9-16 blocks)
Figure 24 shows the circuit used for 4-bit block decoding. The block address is
A[3:0], and SELO and SELl are the block enable signals for bkO and bkl, respectively.
Low Power ROM Generation
The high order address bits (A[3:1]) are used for global block decoding, and AO is used
for local block decoding. As illustrated in the schematic, the block address lines used for
global block selection (A[3: 1]) are left unconnected. It is left up to ROMGEN to tile the
wires so that the appropriate global block decode logic is generated for each block column.
3.7 Simulation
Simulations using HSPICE verified correct operation of a 16 word by 4 bit ROM
with two ROM blocks and VDD= 1.5V. Figure 25 shows the timing of the control signals,
EvalAddr and EvalWord for bkl of the ROM. Since there are two blocks, A3 acts as the
block enable signal. Thus, EvalAddr and EvalWord are not asserted on the rising edge of
CLK when the block is not selected (A3=0). However, when the block is selected (A3=1),
EvalAddr is asserted, and EvalWord is asserted after the address decode operation is complete. Both EvalAddr and EvalWord are reset by the self timed signal, Ready (not shown)
once the sensing operation is complete.
9G/OG/19
22:32!20
1 .50
DRIVER.TRO
CLK
V
O
L
T
L
I
N
A.
1.0
500.0M
00.
1.0
DRIVER .TRO
A-1
V
0
L
T
L
I
N
1.0
500 . OM
o.
1.50
V
O
L
S
I
L
T
N
V
O
L
I
L
N
-
1.0
500.0M
1 .50
T
DRIVER.TRO
.......
..
...
1.0
....
DRIVER .TRO
N234
"
500 . OM
- -
-- -
.700.
600.ON
"
ON
TIME
00.ON
ELIN)
"
900.ON
1.OU
FIGURE 25. HSPICE results from a ROM 16 word x 4 bit simulation of node voltages. (from top
to bottom: CLK, A3, EvalAddr (block 1), and EvalWord (block 1))
Low Power ROM Generation
Figure 26 shows the operation of the ROM during the time illustrated in Figure 25.
CLK (top) and EvalWord (second from top) are included for reference. The rise of the
selected wordline (third) and the reading of a "1" on a bitline (fourth) are shown. As
expected, the wordline is raised after EvalWord is asserted. This turns the transistors on
and the corresponding bitline and discharges. The fifth waveform is the Ready signal,
which signals that the read operation is complete. The Ready signal resets EvalWord and
EvalAddr (shown in Figure 25) and allows the data to be passed to the output driver. Once
EvalWord and EvalAddr are reset, the bitlines return to their precharging state and Ready
is reset. Finally, D (last) depicts the output of the data "1". Note that there was a change of
data on the previous rising edge of the clock, though EvalAddr and EvalWord were not
asserted. This is because the other ROM block (bkO) was the block enabled for the particular address specified.
DRIVER.TRO
CLK
1.0
n.1.
0
1.0
1 .0O
,
0.
,
..
S..
L
..
t
L
I,
.
I
I
I
I
..
I..
I
.1....
.
.
,
m
DRIVER.TRO
N234
-
,
,
, J ,L ,
I
,
.
I
"I
DRIVER.TRO
1
....
.0
DRIVER.TRO
0
. ..
1..............N I O
1 .0
...
500.
..
...
N215
..
....
0.
1 .0
.0
DRIVER.TRO
N23 0
_ •--
_
DRIVER.TRO
1
700. ON
O00.0N
TIME
800. ON
[LIN)
900. ON
1.0U
FIGURE 26. HSPICE results from a ROM 16 word x 4 bit simulation of node voltages. (from top
to bottom: CLK, EvalWord, wordline, bitline, Ready, and D)
Low Power ROM Generation
45
3.8 ROM Performance
The performance and energy dissipated per cycle for various supply voltages and
ROM sizes are summarized below in Table 4 through Table 5.
VDD (V)
Words
Bits
Blocks
Access Time
(ns)
Cycle Time
(ns)
Energy /
Cycle (pJ)
1.5
16
4
1
25.7
33.9
10.91
1.5
16
4
2
25.8
32.0
12.58
2.2
16
4
1
10.9
14.8
23.61
2.2
16
4
2
11.0
13.9
26.27
3.3
16
4
1
6.1
8.4
55.88
3.3
16
4
2
5.9
7.9
60.50
5.0
16
4
1
4.0
5.7
141.39
5.0
16
4
2
3.9
TABLE 4. HSPICE results for a 16 word x 4 bit ROM
5.4
150.43
VDD (V)
Words
Bits
Blocks
Access Time
(ns)
Cycle Time
(ns)
Energy I
Cycle (pJ)
1.5
256
8
1
47.4
58.3
28.8
1.5
256
8
2
39.4
48.9
30.6
1.5
256
8
4
33.9
43.2
35.9
2.2
256
8
1
18.1
23.9
59.9
2.2
256
8
2
15.4
20.2
61.0
2.2
256
8
4
13.7
18.4
69.3
3.3
256
8
1
9.6
11.1
137.9
3.3
256
8
2
8.1
11.0
135.3
3.3
256
8
4
7.5
10.3
148.9
5.0
256
8
1
6.2
8.7
348.3
5.0
256
8
2
5.4
7.4
330.9
5.0
256
8
4
5.0
TABLE 5. HSPICE results for a 256 word x 8 bit ROM
6.9
357.2
Note that increased partitioning decreases access and cycle times, but does not
necessarily decrease the energy dissipated per read cycle. Partitioning small ROMs into
more blocks does not significantly reduce energy consumption, due to the extra bus wiring
and capacitance on the output data bus (Table 4). However, for medium and large sized
ROMs, the partitioning scheme that dissipates the least power requires some investigation.
The original Berkeley ROM is compared to the improved ROM for various supply
Low Power ROM Generation
voltages. The difference in performance between the two ROMs is heavily dependent on
the data. The results in Table 6 compares the two ROMs with the average data word having 75% "l"s, which favors the new ROM.
ROM
Original Berkeley ROM
VDD = 1.5V
Energy /
Cycle (pJ)
VDD = 2.2V
Energy /
Cycle (pJ)
VDD = 3.3V
Energy /
Cycle (pJ)
VDD = 5.OV
Energy /
Cycle (pJ)
37.36
77.52
176.0
436.0
New ROM
28.54
60.00
139.5
354.1
TABLE 6. HSPICE comparison of original ROM versus modified ROM. ROM core is 256 word x 8
bit ROM with heavy data. Average data weight = 24 and average coded data weight = 9.
The reduction in power consumption is due to the low power coding of the
improved ROM core. In the worst case, the new ROM has no coded ROM data and therefore has slightly more power consumption then the Berkeley ROM, from the extra logic
necessary for data decoding.
Low Power ROM Generation
Chapter 4
ROM Generation
ROMGEN is the tool used to generate the ROMs described in the previous chapter, "ROM Operation". ROMGEN is written in SKILL and is designed for use in the
Cadence Design Environment. The first section of this chapter (Section 4.1) provides an
overview of the generator and its usage. The sections that follow describe the generator in
more detail, including a summary of the cells in the ROMGEN library and specifics about
the SKILL code.
ROMGEN is simple tool for generating low power ROMs in Cadence. The generator is executed on a parameter file including the details of the ROM. The ROM blocks
and ROM bus will be created and tiled automatically to complete the generation of the
ROM.
4.1 Usage
This section describes ROMGEN usage and is intended to be a user guide for
ROM generation. 1
Parameter
Description
Value
libName
target library name
any valid library name (in quotes)
romName
target ROM layout name
any valid ROM name (in quotes)
words
number of words specified
TABLE 7. Summary of ROMGEN parameters.
Low Power ROM Generation
minimum 8 words (3 address lines) per block
Parameter
Description
Value
bits
number of bits per word (wordsize)
positive integer; (always even number created for tiling reasons)
blocks
number of blocks
1 = single block mode, 2-16 = multiblock
dataBits
ROM data
list of binary strings (in quotes) representing
ROM data. First string corresponds to
ADDRESS = 0.
TABLE 7. Summary of ROMGEN parameters.
4.1.1 Parameter file
A parameter file detailing the name, size, partitioning, contents, and library location prior to generation. The syntax of the file is:
((parameterl valuel)
(parameter2 value2)
(parameter3 value3)
(parametern valuen))
The parameters are summarized Table 7. (Note: parameter names are case sensitive.)
libName and romName specify the target Cadence library and name for the ROM. The
values must be included in double quotes.
words specifies the number of words in the entire ROM. For tiling reasons, the minimum
number of address lines per block is 3, therefore, the minimum number of words per block
is 8. Legal values for words include any integer greater than or equal to 8. In the case
where words is not a multiple of a power of two (since the number of addresses per block
is a power of two), ROMGEN automatically pads the ROM with enough "O"s to fill up the
rest of the last ROM block. A possible improvement to ROMGEN is to eliminate the
unnecessary wordlines containing empty data on the last block, as long as the minimum
number of wordlines is met.
The parameter bits specifies the number of bits per word of the ROM. For tiling reasons,
only ROMs with an even number of bits can be created. If bits is odd, a D[-1] bit is created
that the user may leave unconnected. Thus, the data bus is D[bits-1:0] if bits is even, and
D[bits-l:-1] if bits is odd.
Low Power ROM Generation
If blocks is 1, then ROMGEN enters single block mode and only generates a single ROM
block, without generating a ROM bus, static latches, or block selection circuitry. If blocks
is greater than 1, the ROM blocks and bus are generated for multiblock operation. An odd
number of blocks is legal value.
dataBits specifies the ROM data with a list containing at least words number of strings
consisting of "l"s and "O"s representing the ROM data in binary. Each string must be at
least bits long; any extra bits or words defined are ignored. The first string in the list corresponds to ROM ADDRESS=O, the second string corresponds to ADDRESS= 1, and so on.
The following is an example parameter file demonstrating proper parameter specification:
((libName "rom")
(romName "rom4x4")
(words 16)
(bits 4)
(blocks 2)
(dataBits
"0000" "1111" "1010"
"1100" "0011" "0110"
"0001" "0010" "0100"
"1110" "1101" "1011"
;
;
;
;
;
;
;
;
;
;
output library
output romname
# of words in ROM
# of bits per word
total number of blocks (1 to 16)
"0101"
"1001"
"1000"
"0111"
4.1.2 Generation
Application of ROMGEN involves these steps:
1. Create the ROMGEN parameter file.
2. Start Cadence.
3. In the Cadence Interface Window (CIW), load ROMGEN:
CIW> (load "romgen.il")
4. Once loaded, execute ROM generation on the parameter file:
CIW> (romgen "romgen.param")
where romgen.param is the name of the parameter file.
A cell for the ROM, the bus, and for each block is created in the library specified in
the parameter file. The bus cell name is marked with a "_bus" extension after the romName, and each block has a "_bkn"extention after the romName, where n is the block
number from 0 to blocks-1. This completes the ROM generation.
Low Power ROM Generation
4.2 Cells
This section lists cells in the ROMGEN library and briefly describes their function.
Table 8 lists the cells used to generate the ROM blocks and Table 9 catalogs the cells used
in ROM bus generation.
Cell Name
Description
romgen_GNDcell
GND cell used to create GND lines in the ROM core
romgen_aReady
2 NMOS devices per address bit used to generate AddressReady (See
Figure 10.
romgen_al
Address latch (A input)
romgen_alEnd
end cell for address latch cells, romgen_al, connects EvalWord to address
decoding circuitry.
romgen_botCell0
end wordline cell "0" for self-timing signal, Ready (no transistor)
romgen_botCelll
end wordline cell "1" for self-timing signal, Ready (transistor present)
romgen_botEndCell
end cell for reference wordline, Ready bit.
romgen_cell0
ROM core "0" cell. (no transistor)
romgen_celll
ROM core "1" cell. (transistor present)
romgen_cs
column select, sense amplifier, XOR, and output driver. (D output)
romgen_csEnd
dummy column select, sense amplifier, XOR for Ready bit. Also generates
OEN logic. (PORB input)
romgen_cslnv
sense amplifier for INV bit. Also provides buffering for Ready and ReadyBar
romgen_ctrl
control logic. (VDD, CLK, GND, and ENABLE input)
romgen_endCell0
bitline for Ready, and INV bit information. (INV = "0")
romgen_endCelll
bitline for Ready, and INV bit information. (INV = "1")
romgennand0
address decode "0" cell
romgen_nandl
address decode "1" cell
romgen_nandBot
address decode for dummy wordline
romgennandGND
GND cell for address decoder.
romgen_rs
row select and wordline driver
romgen_rsBot
reference row select and wordline driver
TABLE 8. ROMGEN block cells.
Cell Name
Description
romgen_aTop
tile aligned with romgen_al
romgen_aTopNC
tile aligned with romgen_al (no contact)
romgen_aspacer
spacer to align bus with address latches
romgen_bkSellb
1-bit block enable logic
TABLE 9. ROMGEN bus cells.
Low Power ROM Generation
Cell Name
Description
romgen_bkSel2b
2-bit block enable logic
romgen_bkSel3b
3-bit block enable logic
romgen_bkSel4b
4-bit block enable logic
romgen_blockEnp0
poly wire used to set a bit of global decode logic to "0"
romgen_blockEnpl
poly wire used to set a bit of global decode logic to "1"
romgen_EnpSpacer
spacer to position blockEnp at correct location to enable global decode logic
romgen_csEndTop
tile aligned with romgen_csEnd
romgen_csEndTopNC
tile aligned with romgen_csEnd (no contact)
romgen_csTop
tile aligned with romgen_cs
romgen_csTopNC
tile aligned with romgen_cs (no contact)
romgen_csTopStat
static latches for data bus
romgen_ctlTop
tile aligned with romgen_ctrl
romgen_ctlTopNC
tile aligned with romgen_ctrl (no contact)
romgen_ctlTopNCNE
tile aligned with romgen_ctrl (no contact, no enable). block enable logic
overlaps this cell.
romgen_invTop
tile aligned with romgen_csInv (no contact)
TABLE 9. ROMGEN bus cells.
4.3 ROMGEN
ROMGEN is a ROM generator written in SKILL and implements a modified version of the ROM tiling procedures designed for the LagerlV silicon assembler system
[Brodersen93]. LagerlV calls the TimLager tiling program and uses the "lprom2" function
to create the layout for the memory. Because TimLager utilizes a number of advanced tiling routines which were unavailable in Cadence, a tiling routine was created. This section
describes the tiling routine, and some details of its application in ROMGEN for ROM
block and bus generation.
4.3.1 Tiling Procedures
Because Cadence lacks a tiling routine like TimLager, a tiling routine and a number of helper functions were created to supply at least the minimum amount of functionality required to perform ROM generation.
The tiling function, romgen_tiler, places a simple mosaic (a homogeneous array
of cells, aligned using the prBoundary information) in a cell layout, given the mosaic size,
point of origin, and rotation. Regardless of rotation, the mosaic is always placed in the
Low Power ROM Generation
upper right quadrant relative to the point. This is a very simple tiling procedure, however,
it is adequate for ROM generation. The tiling program makes no effort to align terminals;
only origin and prBoundary information is used when placing cells. Therefore, the cells
must be carefully designed and updated with alignment information (prBoundary) in order
to ensure correct operation.
N = # of bits per word
D[N-1]
D[N-2]
....
D[O]
INV
10 I1 1 21 31 GNDI 31 21 11 010 11 1 21 31 GND] 31 21 1 101 ""/""
DUMMY
I NMOStiedtoGND
3 2
AOý
Typical wordline
Mirrored Column
Select Devices
to sense
amplifiers
FIGURE 27. ROM Core and Column Select Mirroring.
The placement of subsequent mosaics requires manipulating variables storing the
point of origin of the last mosaic placed. romgen_tiler returns a list containing the height
and width information for the mosaic (tile_size), which can be used to update the origin
for the next mosaic to be placed.
4.3.2 ROM Block Tiling
This section describes the tiling for a ROM block. The following sections are presented in the order in which they are generated. The actual code for "romgen_blk.il" is
located in "Appendix A".
Address/Row Decode. The NMOS NAND pull-down structure of the address decode is
the first section to be generated. The GND cells are first placed using a single mosaic.
Then, the NAND cells are individually placed to decode a each bit of the address (A or
ABar) in generated to decode binary 0, 1, 2, and so forth. The address for the self-timing
signal, Ready, is placed last.
Row Drivers. Each row driver drives two ROM wordlines, therefore a single mosaic is
created for placement of the row drivers. A special reference row driver is added for the
Low Power ROM Generation
reference wordline used to generate Ready.
ROM Core. The ROM core contains the data, INV bit information, and reference wordlines and bitlines for self timing. However, the tiling of the cells requires special manipulation due to mirroring of the column select devices, and 4:1 column selection. Due to
column selection, there are 4 words per wordline. Figure 27 demonstrates the tiling of the
ROM core.
The figure shows how the mirroring of the column select devices affects the placement of the ROM cells. A typical wordline is shown, with bits corresponding to the
selected column (determined by A[ 1:0]) marked accordingly. The INV bit is placed at the
end of the wordline with a "1" (transistor present) marking wordlines need to be inverted
to decode the data. The calculations for mirroring the data bits and generating the INV bits
is done by ROMODEL prior to block generation.
A dummy row with is created to simulate worst case delay. This row is tiled with
alternating "1" and "0" data in the ROM core, an INV bit, and a transistor used to pull
down the bitline for Ready. The bitline for Ready is generated by connecting the bitline to
the drain of a separate transistor with its gate and source tied to GND for each wordline to
simulate the worst case bitline.
Address Latches. A mosaic for the address latches is created and aligned with the address
decoding circuitry. The two least significant bits are contained within the control block for
column selection.
Control Block. The control block is placed adjacent to the address latches and aligned
with the row drivers. The control block has row drivers to drive the column select lines.
Column Selection. The column select devices contain column select, sense amplification,
XOR decoding, and output driver circuitry. These devices are mirrored to reduce the number of ground lines running through the ROM core. After bits number of column select
devices are placed, the column select devices for INV and Ready are placed.
This completes the generation of the ROM block. However, if the ROM is a multi-
Low Power ROM Generation
block, block decoding logic and static latches are also placed during block generation.
These cells are designed to overlap into the ROM bus area. In the case of a single block
ROM, these steps are ignored.
Block Decode Logic. If the current block being generated is the first ROM block in a column (locally, block 0), then the block decoding logic is placed. The block decoder used is
determined by ROMGEN, and depends on the number of blocks in the ROM. Once
placed, wires (poly) are placed to generate the correct block decode logic. This block
decode logic generates SELO and SEL1 outputs and is placed to align with the corresponding ENABLE inputs of block 0 and block 1, respectively.
Static Latches. The static latches are placed on the data bus for the first block of the entire
ROM only. These latches overlap with the ROM bus wiring area. Figure 28 depicts a typical first block of a small multiblock ROM:
FIGURE 28. Block 0 of a 2 block ROM. Note the decoding logic (an inverter and some bus wiring)
at the lower left and the 4 static latches attached to the D outputs (lower right).
4.3.3 ROM Bus Tiling
All the inputs and outputs of the ROM are located on a single side of the ROM.
Thus BUS is designed to align with these signals and to overlap the block decoding logic
and static latches placed during ROM block generation. Each input or output cell of the
Low Power ROM Generation
"
~
-- ·-
ROM block is has two corresponding ROM bus cells that are sized to be aligned with it.
The bus cells either connect the input or output to the bus (cell contains a contact), or provides wiring (no contact). These cells are tiled to create the ROM bus. A typical ROM bus
is shown in Figure 29:
FIGURE 29. A typical ROM bus layout. Bus wires (horizontal) are metal2, connections wires
(vertical) are metall.
FIGURE 30. The layout of a 256 word by 8 bit, 4 block ROM.
Low Power ROM Generation
The bus wires are (listed from bottom to top): GND, VDD, CLK, A[N-1:0], D[M1:0], PORB. Where N is the total number of address bits, and M is the total number of data
bits.
4.3.4 Multiblock ROM Tiling
Once bus and block generation are complete, the two ROM blocks are connected
to the bus using a single bus instance, creating a ROM column. Subsequent ROM columns
are created and tiled to the right, connected by the bus. The last ROM column may have a
single block connected to a bus, in the case of an odd number of block specified in the
parameter file. A sample ROM layout is depicted in Figure 30.
Low Power ROM Generation
Chapter 5
ROM Modelling
ROMODEL is a tool used to estimate the power dissipated in the low power ROM
described in the previous chapters. This tool estimates the energy dissipated per cycle by
modelling each node in the ROM and accumulating the energy dissipated. Many of these
techniques are borrowed from PYTHIA, a power estimation tool for Verilog
[Xanthopoulos96]. Since the architecture of the ROM is well defined, models for a large
number of identical nodes can be condensed into one and scaled accordingly, depending
on the size of the ROM. The dynamic power dissipated is on the order of CVDD2 for each
transition (CVDD(VDD-VT) for the bitlines). Thus, the energy dissipated per one ROM
access cycle is calculated for each node, scaled according to the size of the ROM, and
accumulated with all the other nodes. The result is reported as the total energy dissipated
per access.
ROMOPT is a tool that determines the valid block configurations for a ROM and
the optimal ROM partitioning scheme that minimizes energy dissipated per read cycle.
The optimization process applies the ROMODEL tool to model the power and generates
an output report detailing information about the different ROM configurations.
5.1 Brief Description of Operation
ROMODEL is a simple program that models the nodes in the low power ROM.
Low Power ROM Generation
The basic operation of the modeler is very simple and requires little knowledge of the
ROM itself. First, parameter values are read in from a technology file that needs to be
specified by the user. After the values are read, nodes are created and annotated with a
number of capacitance values. The capacitances that will be dealt with in the ROM are:
* wiring capacitance, which include diffusion, poly, metal 1, and metal2 layers. The layers are lumped to facilitate the accounting of wiring capacitances. These capacitances
are considered constant throughout the node swing.
* the total gate capacitance contributed by gates seen by the particular node. A piecewise constant model is used to model the nonlinear behavior of the gates, with some
correction factors for the Miller effect and gate-to-drain overlap capacitance.
* the total junction capacitance due to drain to substrate or drain to well junctions.
Drain junction capacitances are assumed to vary with voltage in the following manner:
C
CJ(VR)
=MJ
(EQ 12)
where VR is the reverse bias across the junction, 0 is the built in potential, Cjo is the
capacitance at zero bias, and MJ is the grading coefficient of the junction.
The following sections describe these models and approximations in more detail.
Using these models, the energy dissipated per cycle in each node as a function of ROM
size is calculated. Finally, the energy dissipated in the ROM is calculated by scaling the
appropriate nodes and accumulating the result. The node information is sent to an output
file and the total energy per cycle is reported to the user.
5.2 ROMODEL Limitations
Before the power estimates can be used with confidence, a few remarks about simplifications and assumptions need to be noted:
1. The energy/cycle calculated by ROMODEL only accounts for dynamic power. Short
circuit power and power due to leakage currents are not taken into account. Sources of
static power dissipation exist in this ROM (AddressReady and bitlines), however, this
contribution of static power assumed to be negligible and is ignored in this model.
2. All calculations of gate and junction capacitances employ simplifying assumptions.
The models are described in more detail in the following sections. The accuracy of
these models are validated using HSPICE.
Low Power ROM Generation
3. The modeler makes a number of assumptions to provide average energy/cycle estimates, however the user should be aware that the energy dissipated per cycle is dependent on the data and the frequency of each address access.
5.3 Overview
The user must provide a number of parameters: number of address lines per block,
wordsize, number of blocks, and supply voltage, VDD. A technology file must also be
specified which contains information about the process technology. Using this information, ROMODEL applies the models described later in this section to calculate the energy
dissipated.
5.3.1 ROMODEL Technology File
The technology file contains information about the process technology. Information about the parameters are listed in Table 10. Typically, these values are obtained from
SPICE model and parameter files obtained about the particular process. A sample technology file and SPICE model are included in Appendix B.
Parameter
Description
Units
lambda
technology parameter
m
phin
n-drain to p-substrate built in potential
V
phip
p-drain to n-substrate built in potential
V
vtn
threshold voltage for NMOS device
V
vtp
threshold voltage for PMOS device (sign ignored)
V
mjn
absolute value of MJ in Eq. 12 (NMOS area)
-
mjp
mjswn
absolute value of MJ in Eq. 12 (PMOS area)
absolute value of MJ in Eq. 12 (NMOS perimeter)
-
mjswp
cjOn
absolute value of MJ in Eq. 12 (PMOS perimeter)
NMOS zero bias area junction capacitance
F/m 2
cj0p
PMOS zero bias area junction capacitance
F/m 2
cjsw0n
NMOS zero bias perimeter junction capacitance
F/m
cjsw0p
cgd0n
PMOS zero bias perimeter junction capacitance
NMOS gate to drain overlap capacitance
F/m
cgd0p
PMOS gate to drain overlap capacitance
F/m
n_cv_bp
breakpoint for piecewise constant model of NMOS gate cap.
V
pcv_bp
breakpoint for piecewise constant model of PMOS gate cap.
n_gate_cap_init
NMOS gate capacitance from 0 to n_cv_bp (0 for our model)
V
F/m2
TABLE 10. Technology File Parameters.
Low Power ROM Generation
-
F/m
Parameter
Description
Units
p_gate_cap_init
PMOS gate capacitance from 0 to p_cv_bp (Cox for our model)
F/m 2
n_gate_cap_final
NMOS gate capacitance from n_cv_bp to VDD (Cox for our model)
F/m 2
p_gate_cap_final
PMOS gate capacitance from (VDD-p_cv_bp) to VDD (0 for our
model)
F/m 2
ndiff_cap
NDIFF to substrate wiring capacitance per unit area
F/m 2
pdiff_cap
PDIFF to substrate wiring capacitance per unit area
F/m 2
poly_cap
Poly to substrate wiring capacitance per unit area
F/m 2
metall_cap
Metal 1 to substrate wiring capacitance per unit area
F/m2
metal2_cap
Metal2 to substrate wiring capacitance per unit area
F/m2
ndifffringe
NDIFF to substrate fringe capacitance
F/m
pdiff_fringe
PDIFF to substrate fringe capacitance
F/m
poly_fringe
Poly to substrate fringe capacitance
F/m
metal lfringe
Metal 1 to substrate fringe capacitance
F/m
metal2_fringe
Metal2 to substrate fringe capacitance
F/m
TABLE 10. Technology File Parameters.
All values must be specified in the technology file before ROMODEL will proceed
with the ROM energy modelling. Each line of the technology file corresponds to a
[parameter] [value] pair, or a comment marked by a pound sign:
# This is a partial technology file
lambda 0.5e-6
vtn 0.75
All the parameters are straightforward with the exception of the CV-breakpoint for the
NMOS and PMOS piecewise constant models. These can be obtained by using the
CV_MODEL tool, to be described in further detail in Section 5.3.4. Once the technology
file is successfully read, the modelling of the nodes of the ROM begins.
5.3.2 Node Model
A node structure is created for every node in the ROM. Each node contains information regarding voltage swing, gate and drain sizes, and wiring capacitance. It also contains variables for equivalent capacitance and accumulated energy calculations to facilitate
the identification of energy-consuming nodes. Variable descriptions are listed on Table 11.
Low Power ROM Generation
Variable
Description
name
array containing the name of the node
v_init
voltage at the beginning of the clock cycle
v_final
biggest change in voltage before being reset
area_nmos_gates
total NMOS gate area (W x L) seen by node
width nmos_gates
total of all NMOS gate widths (W) seen by node
area_nmos junct
total area of NMOS drain junctions
perim_nmos junct
total perimeter of NMOS drain junctions
area_pmos_gates
total PMOS gate area (W x L) seen by the node
width_pmos_gates
total of all PMOS gate widths (W) seen by node
area_pmos junct
total perimeter of PMOS drain junctions
perim_pmosjunct
total area of NMOS drain junctions
area_ndiff
total area of NDIFF wiring
area_pdiff
total area of PDIFF wiring
area_poly
total area of POLY wiring
area_metal 1
total area of METAL 1 wiring
perim_metal2
total area of METAL2 wiring
perimndiff
total area of NDIFF wiring
perim_pdiff
total area of PDIFF wiring
perim_poly
total area of POLY wiring
perim_metal 1
total area of METAL 1 wiring
perim_metal2
total area of METAL2 wiring
wiring_cap
total wiring capacitance
wiring_energy
energy dissipated from wiring capacitance
totalcggnd
total gate to ground capacitance
total_cggnd_energy
energy dissipated from gate to ground capacitan
total_cj
total area junction capacitances
total_cj_energy
energy dissipated in area junction capacitances
total_cjsw
total sidewall capacitance
total_cjsw_energy
energy dissipated in sidewall junction capacitan,
total_node_cap
total equivalent node capacitance
total_node_energy
total node energy
TABLE 11. Node Variables and Descriptions.
Each node is updated with data obtained from the ROM cell layouts. Wiring areas
and perimeters, transistor areas, transistor widths, junction areas, and junction perimeters
Low Power ROM Generation
as a function of ROM size were determined for each node. An overview of the process is
described later, in Section 5.4.
5.3.3 Modelling Wiring Capacitances
ROMODEL includes wiring capacitances in the power estimate. For each node,
the amount of wiring area for each layer was determined. Because the ROM is tiled, the
expressions for wiring capacitance are a function of the number of bits, number of address
lines per block, and number of blocks. Thus, depending on the size of the ROM, the wiring area and length are scaled accordingly before the total capacitance is calculated. The
node wiring capacitance for each layer is:
(EQ 13)
Cwiring = AREAlayer * Carea + PERIMETERlayer. Cfringe
where Carea is the capacitance per unit area of the layer, where Cfringe is the fringe capacitance per unit length. The areas and perimeters of each wire is are determined as a function of ROM size. The layers that are considered are: ndiff, pdiff, poly, metall, and
metal2.
*THIS IS THE FULL
5.5990F _
--
-:
INPUT FILE USED FOR HSPICE SIMULATION RUN S.
95/05/09 14~33t10
,
WO0
DRIVER.
LX19(MxN
.. ....
5.40F
5.20F
DRI VER . l
LXI B MXP
:I
"1
:............................
5.OF
ff.80F
q.E-OF
-
'f.'OF
-
--
;.
4.20F
--
..- ------. ·........
-- -- -
.OF
3.0F
..............
..................
.........
1
2 . 4OF Z-_2.2OF
.............
,.........
....
_.........7......
3.OF Z7
2.9OF
. . ...
;. ..
.................
:..........
. .
=
............................
,
4 0
.-
5.0
2.-0F
2.023EF
' I 0
O.
I
II
L I I
2.0
3.0
VOLTS (LIN]
4.0
5.11
FIGURE 31. Typical gate capacitance vs. gate voltage plot from HSPICE. Both NMOS (solid) and
PMOS (dotted) C-V characteristics are shown above.
Low Power ROM Generation
The wiring capacitance for each of the layers is assumed to be constant throughout
the voltage swing. Only layer-to-substrate capacitances are considered; other sources of
wiring capacitance such as poly-to-metall capacitances are assumed to be negligible.
Thus, the energy dissipated per power consuming transition at the node is described by the
following equation:
(EQ 14)
Ewiring = Cwiring VDDVswing
The voltage swing, Vswing for all nodes is VDD, except for the bitlines which swing from
(VDD-VT) to GND.
C
Cox-
*
--------------------------------------
-- ---
ActualCV Charactenstic
Piecewise
Constant
Model
-
A
Vbp
V
FIGURE 32. Actual and model C-V characteristics of a NMOS device.
5.3.4 Modelling Gate Capacitances
To model gate capacitances, ROMODEL makes the simplifying assumption that
the transistor is mostly in the linear region. This is a good estimate when VDD is large,
however it is used by ROMODEL as a conservative estimate for low supply voltages as
well. Using this assumption, the total gate-to-channel capacitance can be assumed to be
split equally between the source and drain. The total gate capacitance is calculated in the
following manner:
ox
ox
tox
(EQ 15)
Cox is the oxide capacitance of the gate. However, the gate capacitance for varying voltages is nonlinear, as demonstrated in Figure 31. Therefore, an equivalent value, Coxequiv is
calculated, representing the average value of Cox over the voltage swing.
Low Power ROM Generation
Because finding the actual average of Cox over the voltage swing is complex, a
piecewise constant model is developed to approximate the change in gate capacitance over
the voltage swing of the node, shown in Figure 32 and Figure 33.
For NMOS gates, Coxequiv is calculated in the following manner:
0
0 _Vhigh _ Vbp
(EQ 16)
Cox
(Vhig h - Vbp )
Coxequiv, NMOS =
Vhigh
where Vbp is the breakpoint voltage and Vhigh is the maximum node voltage (the minimum node voltage is assumed to be 0).
c.
Cox
------------------------ ----------- I
ActualCV Charactersatic
Model
PiecewtheConstant
!
Vbp
V
VDD
FIGURE 33. Actual and model C-V characteristics of a PMOS device.
Similarly, Coxequiv for PMOS gates is:
0
Cox
Coxequiv, PMOS =
Vhigh
ox
Vbp
Vbp
(EQ 17)
high
These values of Coxequiv for NMOS and PMOS devices are computed before proceeding
with the total gate-to-ground capacitance calculation:
Cgs =
Cgs
Cgd =
1
2 (AREA gates)
gates
1
(AREA gates)
Coxequiv + WCgsO
(EQ 18)
Coxequiv + WCgdO
(EQ 19)
Cg-gnd = Cgs + 2 Cgd
Low Power ROM Generation
(EQ 20)
Cgs and Cgd represent the total channel and overlap capacitance between the two nodes.
The overlap capacitances, Cgso and Cgdo are assumed to be equal, which is why only Cgdo
is specified in the technology file. Coxequiv is split evenly between the Cgs and Cgd due to
the assumption that the transistor is always operating in the linear region. Cg-gnd, the total
gate to ground capacitance, is equal to the sum of Cgs and Cgd, after Cgd is multiplied by a
factor of 2 to account for the Miller effect. Figure 34 depicts the result of the calculations
of Eq. 18, Eq. 19, and Eq. 20.
Cgd
node
tin
Cg-gnd = Cgs + 2Cgd
I
node
arce
Cgs
FIGURE 34. Equivalent Gate to GND Capacitance.
Determining the CV breakpoint. The piecewise constant model is accurate only if the
breakpoint voltages can be determined with a good deal of accuracy. A tool called
CVMODEL was developed that uses HSPICE output to determine the breakpoint.
First, a file containing the CV plot data must be created. A sample HSPICE deck
that produces the necessary output is included in Appendix A. The following is a portion
of a sample CV plotfilel:
0.
1.00000e-3
2.00000e-3
3.00000e-3
4.00000e-3
5.779e-16
5.777e-16
5.775e-16
5.774e-16
5.772e-16
1. All alphanumeric representation exponents must be converted to exponential form
before being sent to the CV modeler. (i.e., m = e-3, u = e-6, f = e-15, etc.)
Low Power ROM Generation
5.00000e-3
5.771e-16
The first column represents the gate voltage and the second column is the capacitance of
the gate. CV_MODEL then calculates the area under the curve of the CV characteristic
provided in the CV plotfile and returns the breakpoint value. The syntax for CV_MODEL
is:
cv_model plotfile nmos/pmos
where plotfile is the filename of the HSPICE generated CV plot and nmos or pmos specifies what type of transistor is being modelled. The following demonstrates correct usage:
% cv_model plotfilel.cv nmos
total area: 3.439291E-12
Vdd: 5.000000E+00
deltav: 1.000000E-03
Breakpoint: 1.525147E-01
Total area is the estimated total area under the CV curve. The maximum sweep voltage is
assumed to be Vdd, and includes this information in the output. Delta_v is the stepsize of
the input gate voltage, and Breakpoint is the calculated breakpoint voltage.
Modeled CV Breakpoint
HSP
CV Plc
VDD
AV
FIGURE 35. Calculation of total area under CV curve from HSPICE data, used to find breakpoint
voltage in piecewise linear gate capacitance model.
Low Power ROM Generation
The CV modeler calculates the area by accumulating all the areas of the rectangles
with width AV, the simulation stepsize, and height Cn , the capacitance value for each
point. Figure 35 graphically demonstrates this procedure. CV_MODEL assumes that the
stepsize is small enough (-.001V) so that the error is negligible.
The sum of all the rectangles is the estimate for the area. Clearly the accuracy of the area
under the curve (and consequently, the breakpoint) is increased as AV is decreased.
CVAreaHSPICE=
(EQ 21)
Ci. AV
The area under the CV curve of the piecewise linear model is:
CVAreamodel = (VDD - Vbp)
Cox
(EQ 22)
Equating the areas from Eq. 21 and Eq. 22, we obtain the expression for the breakpoint
voltage:
Vbp = VDD-
(AV. Ici
i
(EQ 23)
(EQ 23)
Since the breakpoint for PMOS devices depends on VDD, the PMOS breakpoint
value in the technology file (and the value reported by CVMODEL) is specified as the
difference between VDD and the actual breakpoint. The application of CV_MODEL
below demonstrates correct usage for a PMOS device:
% cv_model plotfile2.cv pmos
total area: 3.282547E-12
Vdd: 5.000000E+00
delta_v: 1.000000E-03
Breakpoint: 3.359656E-01
The breakpoint is reported by CV_MODEL is 0.336V, which is the value that should be
used in the ROMODEL technology file. In the case when VDD = 1.5V, the actual modeled
breakpoint (1.5 - 0.336 = 1.164V) is used in all capacitance and energy calculations.
Once the breakpoints for the capacitance models are calculated, the power dissipated by the gate to ground capacitances is computed:
Egate = Cg-gnd Vswing VDD
Low Power ROM Generation
(EQ 24)
5.3.5 Modelling Junction Capacitances
Each node contains information about drain capacitances: NMOS drain area and
perimeter, and PMOS drain area and perimeter. The NMOS drain capacitances lie between
the node and GND. However the PMOS junction and sidewall capacitances lie between
the node and VDD. Since the PMOS drain capacitances are in series with a large capacitor
(Csupply) to GND, an assumption may be applied to simplify the analysis. The PMOS
drain capacitance is assumed to be much smaller than the supply capacitance, and therefore, the PMOS drain capacitance appears to be between the node and GND as well
[Xanthopoulos96]. The NMOS and PMOS capacitances can therefore be added in parallel, as long as it is remembered that the voltages across the n-drain and p-drain capacitances vary differently.
Figure 36 illustrates the charging of the drain capacitance. The power delivered by
the power supply is:
P drain = VDD isupply
(EQ 25)
The energy drawn from the supply is therefore:
Edrain =
VDD
supply) dt
(EQ 26)
= VDDisupplydt
(EQ 27)
= VDD
(EQ 28)
dq
Since dq/dV = C,
= VDD
(EQ 29)
.C(V)dV
Substituting Eq. 12 for C(V), the expression for Edrain is obtained:
Edr=in =
-
1++1+
(EQ 30)
where and VRfinal and VRinit are the reverse biases at the final and initial node voltages,
respectively. CJ and CJSW must be scaled by the appropriate drain perimeters and areas
Low Power ROM Generation
respectively.
VDD
VDD
Vin
Vout
FIGURE 36. Supply current charging the drain capacitance.
This assumes that all the current from the supply is used to charge the junction
capacitance. Ideally this is true, however if VDD > (IVTPI + VTN), some direct path current
is lost through the NMOS device during the non-ideal input transition. This contribution,
however, is assumed to be small and is neglected by ROMODEL.
5.4 Modelling the ROM
Each node of the ROM is modelled using the model described in Section 5.3.2
with some simplifying assumptions. Since the capacitance on a particular node depends
on the size of the ROM, gate and junction sizes are written as a function of bits, address
lines, number of blocks, and data. This ensures that the ROMODEL is accurate for different ROM sizes.
Because the density of a ROM, the number of nodes to be modelled quickly
becomes too large, it is not convenient to create a node model for every node in the ROM.
Using the fact that the ROM is very regular in nature allows some simplifying assumptions to be made to simplify the modelling of the nodes:
1. Identical nodes can be modelled with a single node.
2. The power dissipated in the nodes equals the modelled node power scaled by the number of nodes that switch and the number of power consuming transitions per read
access.
Using these assumptions, modelling each node is unnecessary, and ROM energy can be
Low Power ROM Generation
calculated without a large number of redundant calculations.
Most of the nodes in the ROM undergo either or no power consuming transitions,
or a single power consuming transition, such as the control signals, word lines, and bitlines. Eventually, all the nodes return to their initial (precharge) state, with the exception
of the address bits and the data outputs.
5.4.1 Control Signals
The control signals, mainly Ready, EvalAddr, EvalWord, and OEN always
undergo a single energy consuming transition in the selected ROM block and return to
their initial state (except for OEN, which is asserted in all ROM blocks). Therefore, the
energy consumed in these nodes is simply accumulated.
5.4.2 Address Latches and Row Decoding
ROMODEL makes the assumption that approximately 1/2 of the address bits are
high. Due to the symmetry of the address latches and row decoding circuitry (see
Figure 12 and Figure 13), ROMODEL makes the assumption that power consumption is
approximately the same regardless of the input address. The energy consumed in changing
the address inputs, A[alines-1:0] is not considered as part of ROM power dissipation.
5.4.3 ROM Core
Using the input parameters, estimates of the capacitances on the wordlines and bitlines can be approximated as well as the probability of undergoing a power consuming
transition. The two wordlines are enabled during a ROM read cycle are the data wordline
and the reference wordline. The reference wordline and bitline have a constant (worst
case) number of transistors and always contribute to energy dissipation during a read
cycle. For the data wordline however, ROMODEL uses the word_weight parameter to
estimate the number of transistor gates on the enabled wordline, the number of junction
capacitances on the bitlines, and the number of bitlines that will be discharged during a
ROM read access.
To estimate the capacitance of the INV bitline, invywordlines is used to obtain the
average number of drains per block on the INV bitline as well as the probability that INV
Low Power ROM Generation
is asserted, assuming that all address locations are equally likely.
5.4.4 Sense Amplifiers and XOR Decoders
The number of sense amplifier outputs that switch is estimated to be the fraction of
the coded word weight to the total number of columns, scaled by the number of bits. Sense
amplifier outputs that do not switch are simply "logic 0" and do not contribute to dissipated energy. The expected number of XOR decoder outputs that switch is similarly determined by the uncoded word weight (data_weight). Glitches are assumed to be negligible
and are ignored.
5.4.5 Output Latch and Driver
The number of data outputs that are "1" is approximated with the dataweight
parameter, from which the average weight of uncoded data can be calculated. ROMODEL
uses this information to calculate the probability that each data bit undergoes an energy
consuming transition, and assumes that half of the energy is consumed per change in output bit.
5.5 ROMODEL Usage
Since the energy of the ROM is dependent on the data, the core data must be analyzed before modelling can begin. ROMAVG is a tool that simplifies analysis of ROM
data. Then, using the data obtained from ROMAVG, the minimum, maximum, and average access energies can be modelled by ROMODEL.
5.5.1 ROMAVG: Analyzing ROM Core Data
ROMAVG is a tool that analyzes the ROM contents and calculates the minimum,
maximum, and average number of transistors on a wordline. It is important to note that
ROMAVG assumes that there are four words per wordline and that the core data is stored
using the low weight coding techniques described in Chapter 3.
The syntax for ROMAVG is:
romavg param_file
where paramfile is the ROMGEN parameter file (see Section 4.1.1) containing the ROM
Low Power ROM Generation
data to be analyzed. The parameters that ROMAVG uses are words, bits, and dataBits; the
other parameters in the parameter file are ignored.
ROMAVG analyzes the core data and returns values for ROMODEL and
ROMOPT. The following demonstrates ROMAVG usage:
% romavg romgen_paraml
bits: 8 words: 256
Total number of l's in ROM data (uncoded): 739
Total number of l's in ROM core after coding (data+Inv): 695
Wordlines inverted: 2
Average data weight (uncoded): 11.546875
Average coded wordline weight: 10.859375
Minimum coded wordline weight: 2
Maximum coded wordline weight: 16
Analysis of the ROM data yields minimum, maximum, and average weights of the coded
wordlines. Average weight of the uncoded data and the number of wordlines inverted is
also calculated. These values can then be used with ROMODEL and ROMOPT to determine best case, worst case, and average case energy per access.
5.5.2 ROMODEL: Modelling the ROM
The syntax for ROMODEL is:
romodel n_alines wordsize blocks vdd data weight word_weight
inv_wordlines [techfile]
where alines is the total number of address bits, wordsize is the number of bits per word
(must be even), blocks is the number of ROM blocks in the ROM, vdd is the power supply
voltage (in V), dataweightis the average number of transistors per wordline without coding, word_weight is the number (minimum, maximum, or average) of transistors on a
coded wordline, invwordlines is the number of wordlines that will be inverted in the
ROM (also using ROMAVG), and techfile is the technology file to be used, which defaults
to "romodel.tf'. Dataweight,wordweight, and inv_wordlines are typically determined
by ROMAVG (Section 5.5.1).
If the technology file is valid, ROMODEL proceeds with the ROM modelling and
returns the energy dissipated per access:
% romodel 4 4 1 1.5 8 6.25 1 romodel.tf
Low Power ROM Generation
1.468764e-11J
Energy per cycle:
A report including information about each node is generated after the modelling
has completed and is located in "romodel.out". The following is a sample report on a typical node:
***** RowSelBar *****
v_init: 0.000000E+00
v_final: 1.500000E+00
area Ngates: 16.000000
width Ngates: 8.000000
area Njunct: 20.000000
perim Njunct: 14.000000
area Pgates: 76.000000
width Pgates: 38.000000
area Pjunct: 42.000000
perim Pjunct: 12.000000
area ndiff: 0.000000
area pdiff: 0.000000
area poly: 82.000000
area ml: 144.000000
area m2: 125.000000
wiring_cap: 3.350250E-15
wiring_energy: 7.538063E-15
total_cggnd: 5.333653E-14
total_cggnd_energy: 1.200072E-13
total_cj: 4.773610E-15
total_cj_energy: 1.350579E-14
total_cjsw: 3.223089E-15
total_cjsw_energy: 8.370037E-15
total_node_cap: 6.468347E-14
total_node_energy: 1.494211E-13
The units for these values are given in Table 11.
Node#
1
2
3
4
5
CL
lambda = 0.5e-6
FIGURE 37. An inverter chain for HSPICE vs. ROMODEL comparisons.
Low Power ROM Generation
=
0.5pF
5.6 Results: ROMODEL vs. HSPICE
A number of simulations using HSPICE were performed to verify the accuracy of
ROMODEL. The nodes of an inverter chain with increasingly larger inverters were simulated and compared. HSPICE simulations were then run on ROMs of different sizes and
compared to ROMODEL results. The files used for simulation are included in Appendix
A.
5.6.1 Inverter Chain
The circuit depicted in Figure 37 was simulated in HSPICE for one cycle using 0.5
ns risetimes and a .5pF load capacitor on the last inverter output. Each inverter had a separate power supply, and thus, the power consumed in the charging and discharging of each
node (1, 2, 3, 4, and 5) was the power consumed by each of the respective power supplies.
The same circuit was simulated in ROMODEL and the results were compared for each
node. These simulations were run with four different supply voltages (1.5V, 2.2V, 3.3V,
and 5.OV). These results are summarized in Table 12.
VDD
(V)
1.5
H1
E/cyc
(pJ)
.0798
R1
E/cyc
(pJ)
.0745
H2
Elcyc
(pJ)
.1520
R2
Elcyc
(pJ)
.1436
H3
E/cy
(pJ)
.2936
R3
Elcyc
(pJ)
.2818
H4
E/cy
(pJ)
.5829
R4
EE/cyc
(pJ)
.5580
H5
E/cyc
(pJ)
1.527
R5
E/cyc
(pJ)
1.209
2.2
3.3
5.0
.1788
.4230
1.095
.1683
.3890
.9096
.3438
.810
2.00
.3256
.7571
1.772
.6672
1.578
3.855
.6403
1.492
3.495
1.324
3.141
7.650
1.270
2.960
6.943
3.282
7.377
17.28
2.583
5.768
13.14
TABLE 12. HSPICE simulation vs. ROMODEL results for power estimation for a 5-inverter
chain. Energies for nodes 1, 2, 3, 4, and 5 are listed above. Hspice results are prefixed with an "H"
before the node number. ROMODEL results are prefixed with an "R".
The values predicted by ROMODEL are slightly below HSPICE estimates
because direct path currents during input and output transitions are neglected by
ROMODEL. Note that the ROMODEL and HSPICE differ on the energy calculations for
node 5. This is because HSPICE assumes that signals that change quickly dissipate more
energy, and since node 5 has no gate capacitance loading the output of the inverter, the
node voltage changes abruptly, resulting in different energy estimates.
Low Power ROM Generation
5.6.2 ROM
ROMODEL
Energy I/
Cycle (pJ)
VDD (V)
Words
Bits
Blocks
HSPICE
Energy /
Cycle (pJ)
1.5
16
4
2
12.58
11.08
2.2
16
4
2
26.27
25.11
3.3
16
4
2
60.50
58.25
135.92
TABLE 13. HSPICE simulation vs. ROMODEL results for power estimation for a 16 word x 4
bit ROM.
5.0
16
4
2
150.43
VDD (V)
1.5
Words
Bits
Blocks
HSPICE
Energy /
Cycle (pJ)
ROMODEL
Energy /
Cycle (pJ)
256
8
1
28.80
25.65
2.2
256
8
1
59.85
58.62
3.3
256
8
1
137.91
136.34
5.0
256
8
1
348.28
317.22
TABLE 14. HSPICE simulation vs. ROMODEL results for power estimation for a 256 word x 8
bit ROM.
Power dissipation for different supply voltages and different ROM sizes were simulated using HSPICE and ROMODEL. Fringe capacitance was ignored for these simulations because the netlist extractor in Cadence does not include fringe capacitance
parasitics. The energy per cycle was obtained from HSPICE results by accessing a number
of addresses containing random data. The energy dissipated in those cycles was averaged
and compared to ROMODEL results. A summary of these simulations is listed in Table 13
and Table 14. ROMODEL gives lower values than HSPICE because static currents and
short circuit currents are ignored. Also, HSPICE results are averaged over eight successive ROM reads, which is heavily dependent on which address locations are accessed..
5.7 Optimization Procedure
The process for determining the optimal ROM block organization begins with
some calculations to determine valid numbers of block address bits and block decoding
bits. ROMOPT then applies ROMODEL and iterates through the number of partitioning
schemes to find an optimal solution.
First, the minimum number of address lines required for the number of ROM data
Low Power ROM Generation
words specified is calculated. Since the minimum number of address lines per block is 3,
there must be at least 8 words in the ROM. Also, because the ROM is limited to 16 blocks
with each block containing at least 8 words, the maximum number of bits that can be used
for block decoding can be determined.
For each ROM configuration, the number of address lines and words per block is
computed which is used to calculate the minimum number of blocks required to store the
number of words in the ROM. This is done to ensure that no unneeded blocks are created
containing only empty ROM data. ROMOPT then begins application of ROMODEL varying the number of blocks until all valid partitioning schemes have been tried and the optimal solution has been found.
5.8 ROMOPT Usage
The syntax for ROMOPT is:
romopt words wordsize vdd data_weight word_weight inv_wordlines
[techfile]
where words is the total number of words in the ROM, wordsize is the number of bits per
word (must be even), vdd is the power supply voltage (in V), dataweight is the average
weight of an uncoded wordline, word_weight is the average weight of a coded wordline,
inv_wordlines are the number of wordlines inverted and techfile is the ROMODEL technology file to be used (described in more detail in Section 5.3.1), which defaults to
"romodel.tf'. Data_weight, word_weight, and inv_wordlines are usually determined by
ROMAVG.
If the technology file is valid, ROMOPT proceeds with the ROM modelling and
optimization. The following demonstrates ROMOPT usage:
% romopt 2000 8 1.5 11.9 10.3 5 romodel.tf
Optimal number of blocks: 8
Minimum energy per cycle:
3.233820e-11J
ROMOPT output sent to "romopt.out".
The optimal number of blocks and minimum energy per cycle are calculated and displayed. An output report is also generated and contains more detailed information about
the different partitioning schemes for the ROM.
Low Power ROM Generation
5.9 Interpreting the ROMOPT Report
A report file, "romopt.out", is generated during the optimization process and
includes information about the different ROM configurations. A sample report is shown
below:
words: 2000
wordsize: 8
vdd: 1.500000
data weight: 11.900000 word weight: 10.300000
techfile: romodel.tf
Required number of address lines: 11
Maximum number of address bits used for block decode: 4
Valid ROM partitioning schemes listed below.
***** blocks: 1 *****
Words per block:
Number of block decode bits:
Energy per cycle for this ROM configuration:
2048
0
7.980468e-11J
***** blocks: 2 *****
Words per block:
Number of block decode bits:
Energy per cycle for this ROM configuration:
1024
1
4.717560e-11J
***** blocks: 4 *****
Words per block:
Number of block decode bits:
Energy per cycle for this ROM configuration:
512
2
3.393365e-11J
***** blocks: 8 *****
Words per block:
Number of block decode bits:
Energy per cycle for this ROM configuration:
256
3
3.233820e-11J
***** blocks: 16 *****
Words per block:
Number of block decode bits:
Energy per cycle for this ROM configuration:
Optimal number of blocks:
Minimum energy per cycle:
128
4
4.089522e-11J
8
3.233820e-llJ
This example report shows that for a ROM with 2000 words, partitioning the
ROM into eight blocks is recommended to minimize power consumption. This shows that
the maximum partitioning does not always yield the ROM with the lowest energy per
cycle. This is especially true for smaller ROMs, in which the energy dissipated in the
block decoding circuitry and extra data bus capacitance outweighs the gains in reducing
ROM block size. However, for small and medium sized ROMs, the difference in energy
per cycle for different partitioning schemes may be slight. Thus, the user may choose to
Low Power ROM Generation
use less partitioning in exchange for a smaller ROM area.
Low Power ROM Generation
Chapter 6
Conclusion
Low power design has become increasingly important as chip density, system
sizes, and the popularity of portable applications continue to rise. The significance of low
power memory design in particular has become more important, since most designs
require a large amount of non-volatile memory. Therefore, the focus of this work is on the
development of a ROM generator and supporting tools to model and optimize the ROM
for low power.
A ROM library from U.C. Berkeley [Burstein96] in Magic format is converted for
use in the Cadence Design Environment. Modifications appropriate for use with the HP26
process technology were made and schematics were created for each layout. Cell layouts
were checked for design rule violations (DRC) and verified versus schematics (LVS).
Large cells were verified using HSPICE simulations. The cells passed all these checks,
indicating the completion of the library conversion.
Once the library was converted, changes were made to the ROM to further
decrease power consumption. The ROM core is coded with an inverting bit, lowering the
number of transistors in the ROM core and hence, reducing the power dissipation in the
wordlines and bitlines. This also reduces the worst case wordline delays, possibly decreasing access time. Also, the self-timing logic is altered to include the bitline delay in the
self-timed signal generation. The application of a reduced element XOR gate minimizes
Low Power ROM Generation
the amount of additional logic required to decode the ROM data. This introduction of
decoding logic requires additional circuitry to eliminate glitches on the output bus, which
can waste power.
ROMGEN is a tool created for generating the ROMs described above. Each cell
was updated with tiling information and a tiling routine was written in SKILL to facilitate
the placement of the cells. Improvements were made to the generator, such as simplifying
the interface and allowing the user to specify an odd number of blocks, which can save
area for some ROMs.
ROMODEL is a tool that models power dissipation in the ROM. Models were
developed for gate, drain, and wiring capacitance to estimate each component of power
consumption. Furthermore, the repetitive structure of the ROM allows assumptions to be
made which eliminates a number of redundant calculations and reduces computation time.
Simulations using HSPICE were used to verify the accuracy of the results. Finally, to help
the user determine the partitioning scheme for the ROM, ROMOPT applies ROMODEL
for to exhaustively determine valid configurations and identifying the power dissipation
associated with each one.
Low Power ROM Generation
Appendix A
HSPICE Decks
Low Power ROM Generation
A.1 Spice Deck for CV plot
**This is the full input file used for hspice simulation runs.
**Project:Low Power ROM Generation
**Owner:Paul Chou
**Description:ROM address latch
**
**Include model files and netlist files here**
*.include '/amd/sick-puppy/a/jimg/tsmc/Nominal.model'
.include '/amd/sick-puppy/a/pchou/sim/models/hp26.models'
mxn gnd 2 gnd gnd nmos w=.5e-06 1=.5e-06 ad=0 as=0 pd=0 ps=O
mxp vdd 3 vdd vdd pmos w=.5e-06 1=.5e-06 ad=0 as=0 pd=0 ps=0
**Voltage supplies and Input Stimulus
Vvdd vdd gnd 5V
Vn 2 gnd 5V
Vp 3 gnd 5V
**Options and Analysis
.options post nomod dccap brief
.DC Vn 0 5 .001
.print LX18(mxn)
.DC Vp 0 5 .001
.print LX18(mxp)
.probe LX18(mxn)
.probe LX18(mxp)
.end
Low Power ROM Generation
A.2 Spice Deck for Inverter Chain
**
**
Project:
Owner:
**
Include model files and netlist files here
Low Power ROM Generation
Paul Chou
**
.include '/amd/sick-puppy/a/pchou/sim/models/hp26.models'
**
Voltage supplies and Input Stimulus
Vin
+
vddl
vdd2
vdd3
vdd4
vdd5
2 0 PWL(Ons 1.5V 10ns 1.5V 10.5ns OV
20ns OV 20.5ns 1.5V 30ns 1.5V)
11 0 1.5V
12 0 1.5V
13 0 1.5V
14 0 1.5V
15 0 1.5V
mxl 3 2 0 0 nmos w=2e-06 l=1e-06 ad=5e-12 pd=7e-6 as=5e-12
ps=7e-6
mx2 3 2 11 11 pmos w=4e-06 1=1e-06 ad=10e-12 pd=9e-6 as=10e-12 ps=9e-6
mx3 4 3 0 0
nmos w=4e-06 l=1e-06 ad=10e-12 pd=9e-6 as=10e-12 ps=9e-6
mx4 4 3 12 12 pmos w=8e-06 1=1e-06 ad=20e-12 pd=13e-6 as=20e-12 ps=13e-6
mx5 5 4 0 0 nmos w=8e-06 1=1e-06 ad=20e-12 pd=13e-6 as=20e-12 ps=13e-6
mx6 5 4 13 13 pmos w=16e-06 1=1e-06 ad=40e-12 pd=21e-6 as=40e-12 ps=21e-6
mx7 6 5 0 0 nmos w=16e-06 1=1e-06 ad=40e-12 pd=21e-6 as=40e-12 ps=21e-6
mx8 6 5 14 14 pmos w=32e-06 1=1e-06 ad=80e-12 pd=37e-6 as=80e-12 ps=37e-6
mx9 7 6 0 0 nmos w=32e-06 1=1e-06 ad=80e-12 pd=37e-6 as=80e-12 ps=37e-6
mxl0 7 6 15 15 pmos w=64e-06 1=1e-06 ad=160e-12 pd=69e-6 as=160e-12
+ ps=69e-6
cl 7 0 .5e-12
**
Options and Analysis
.options post nomod method=gear
.measure tran avgpowervin avg p(vin) from=0ns to=30ns
.measure tran avgpowerl avg p(vddl) from=0ns to=30ns
.measure tran avgpower2 avg p(vdd2) from=0ns to=30ns
.measure tran avgpower3 avg p(vdd3) from=0ns to=30ns
.measure tran avgpower4 avg p(vdd4) from=0ns to=30ns
.measure tran avgpower5 avg p(vdd5) from=0ns to=30ns
.tran .01n 30ns
.end
Low Power ROM Generation
A.3 Spice Deck for ROM Simulations
This is the full input file used for hspice simulation runs.
Project:
Low Power ROM Generation
Paul Chou
Owner:
**
**
Include model files and netlist files here
**
.include '/amd/sick-puppy/a/pchou/sim/rom.run3/netlist'
.include '/amd/sick-puppy/a/pchou/sim/models/hp26.models'
**
Voltage supplies and Input Stimulus
Vdd
vdd gnd 1.5V
porb gnd PWL( On OV
Vporb
Venable enable gnd 1.5V
5n OV
5.5n 1.5V
1600n 1.5V)
Va3 a3 gnd PWL( On Ov 800n Ov 800.5n 1.5V
+Va2
1600n 1.5V
1600.5n OV)
Va2 a2 gnd PWL( On Ov 400n Ov 400.5n 1.5V
+
800n 1.5V
800.5n OV)
+
1200n OV
1200.5n 1.5V)
+
1600n 1.5'VI
1600.5n OV)
200n 1.5v
200.5n OV
Val al gnd PWL( On 1.5v
+
400n OV
400.5n 1.5V
600.5n OV
+
600n 1.5v
+
800n OV
800.5n 1.5V)
1000.5n OV
+
1000n 1.57Ir
+
1200n OV
1200.5n 1.5V)
+
1400n 1.5'Ir
1400.5n OV
+
1600n OV
1600.5n 1.5V)
100n Ov 100.5n 1.5V
*Val al gnd PWL(
01n
*+
200n 1.5V
200.5n OV
*+
300n Ov
300.5n 1.5V
*+
400n 1.5V
400.5n OV
*+
500n Ov
500.5n 1.5V)
*+
600n 1.5V
600.5n OV
*+
700n Ov
700.5n 1.5V)
*+
800n 1.5V
800.5n OV)
VaO aO gnd 1.5V
0.5n OV
Vclk clk gnd PWL(On Ov
20.5n 1.5V
+
20n Ov
+
120n 1.5v
120.5n OV
+
220n Ov
220.5n 1.5V
+
320n 1.5v
320.5n OV
+
420n Ov
420.5n 1.5V
520.5n OV
+
520n 1.5v
+
620n Ov
620.5n 1.5V
+
720n 1.5v
720.5n OV
Low Power ROM Generation
820n Ov
920n 1.5v
1020n Ov
1120n 1.5v
1220n Ov
1320n 1.5v
1420n Ov
1520n 1.5v
1620n Ov
**
820.5n 1.5V
920.5n OV
1020.5n 1.5V
1120.5n OV
1220.5n 1.5V
1320.5n OV
1420.5n 1.5V
1520.5n OV
1620.5n 1.5V
Options and Analysis
.probe
.options post nomod
.measure tran avgcurrent avg i(vdd) from=0ns to=1600ns
.measure tran avgpower avg p(vdd) from=0ns to=1600ns
.tran .in 1620ns
.end
Low Power ROM Generation
Appendix B
Sample Parameter files
Low Power ROM Generation
B.1 Sample technology file for ROMODEL, romodel.tf
lambda .5e-6
# Wiring capacitance
ndiff_cap 286e-6
pdiff_cap 545e-6
poly_cap 73e-6
metall_cap 35e-6
metal2_cap 19e-6
(F/m^2)
# Fringe capacitance (F/m) (Unavailable)
ndiff_fringe 0
pdiff_fringe 0
poly_fringe 0
metall_fringe 0
metal2_fringe 0
# built in potential
phin 0.6
phip 0.6
# nmos threshold voltage
vtn .7623
# nmos junction cap (F/m^2)
cj0n 2.6473e-04
mjn 0.9561
# nmos sidewall cap (F/m)
cjsw0n 4.0556E-10
mjswn 0.270227
# pmos threshold voltage (abs value)
vtp 0.8814
# pmos junction cap (F/m^2)
cj0p 5.5813e-04
mjp 0.4968
# pmos sidewall cap (F/m)
cjsw0p 2.0919e-10
mjswp 0.463227
# nmos and pmos overlap cap
cgd0n 3.4599e-10
cgd0p 1.3214e-10
(F/m)
# piecewise constant parameters for nmos
# CV-plot breakpoint
n_cv_bp 0.1525147
#ncv_bp 0
# Inital gate capacitance per (m^2) (0 for NMOS)
n_gate_cap init 0
# Final gate capacitance per (m^2) (Cox for NMOS)
n_gate_cap_final 1.939045e-3
Low Power ROM Generation
# piecewise constant parameters for pmos
# CV-plot breakpoint
p_cv_bp 0.3359656
# p_cv_bp 0
# Inital gate capacitance per (m^2) (Cox for PMOS)
p_gate_cap_init 1.939045e-3
# Final gate capacitance per (m^2) (0 for PMOS)
p_gate_cap_final 0
# All comments must begin the line with a '#'
# Reminder: EOX == 34.515e-12 F/m
Low Power ROM Generation
B.2 Sample SPICE Model
.MODEL NMOS NMOS LEVEL=3 PHI=0.600000 TOX=1.7800E-08 XJ=0.200000U TPG=1
+ VTO=0.7623 DELTA=7.6940E-01 LD=1.1890E-07 KP=1.2379E-04
+ UO=638.1 THETA=1.2160E-01 RSH=6.5980E+00 GAMMA=0.5942
+ NSUB=4.0030E+16 NFS=7.0730E+12 VMAX=1.9160E+05 ETA=4.3410E-02
+ KAPPA=1.0510E-01 CGDO=3.4599E-10 CGSO=3.4599E-10
+ CGBO=4.1520E-10 CJ=2.6473E-04 MJ=0.9561 CJSW=4.0556E-10
+ MJSW=0.270227 PB=0.800000
* Weff = Wdrawn - Delta_W
* The suggested Delta_W is 2.8000E-07
.MODEL PMOS PMOS LEVEL=3 PHI=0.600000 TOX=1.7800E-08 XJ=0.200000U TPG=-l
+ VTO=-0.8814 DELTA=1.2220E+00 LD=4.5410E-08 KP=3.6685E-05
+ UO=189.1 THETA=1.7250E-01 RSH=5.5000E-01 GAMMA=0.4652
+ NSUB=2.4540E+16 NFS=7.7440E+12 VMAX=3.7770E+05 ETA=8.1730E-02
+ KAPPA=9.9830E+00 CGDO=1.3214E-10 CGSO=1.3214E-10
+ CGBO=4.2612E-10 CJ=5.5813E-04 MJ=0.4968 CJSW=2.0919E-10
+ MJSW=0.463227 PB=0.850000
* Weff = Wdrawn - Delta_W
* The suggested Delta_W is 3.0400E-07
Low Power ROM Generation
B.3 Sample technology file for Magic to Skill, m2s.tech
#
magic2cadence "Technology File"
# D. Xanthopoulos 1996
# this must be set to 1 if the select mask must be included
DERIVESELECT
# Necessary Design Rules
CONTACT_SIZE 2
CONTACT_SPACING 2
CONTACT_OVERLAP 1
VIA_SIZE 2
VIA_SPACING 2
VIA_OVERLAP 1
SELECT_OVERLAP 3
#Local Name
Magic layer
Cadence layer(s)
none
PW
pwell
NW
nwell
nwell
POLY
polysilicon
poly
NDIFF
ndiffusion
ndiff
pdiffusion
pdiff
PDIFF
M1
metall
metall
M2
metal2
metal2
M3
metal3
metal3
NT
ntransistor
ndiff poly
ptransistor
PT
pdiff poly
PSUB
psubstratepdiff
psub
NSUB
nsubstratendiff
nsub
GLASS
glass
overgla
# VERY IMPORTANT!!!!
# Contacts must be specified as layerl layer2 contact_cut
PC
NDC
PDC
M2C
PSC
NSC
end
Low Power ROM Generation
polycontact
ndcontact
pdcontact
m2contact
psubstratepcontact
nsubstratencontact
poly metall cont
ndiff metall cont_aa
pdiff metall cont_aa
metall metal2 via
psub metall cont_aa
nsub metall cont_aa
Bibliography
[Brodersen93]
R.W. Brodersen, "Anatomy of a Silicon Compiler," Kluwer, Boston,
1993.
[Burstein96]
A. Burstein, "Speech Recognition for Portable Multimedia Terminals,"
Ph.D. thesis, University of California, Berkeley, pp. 69-92, February
1996.
[Hirose90]
T. Hirose et al., "A 20-ns 4-Mb CMOS SRAM with Hierarchical Word
Decoding Architecture," IEEE J. Solid State Circuits, Vol. 25, pp. 10681074, Oct. 1990.
[Hoff89]
D. Hoff et al., "A 23-ns 256K EPROM with Double-Layer Metal and
Address Transition Detection," IEEE J. Solid State Circuits, Vol. 24, pp.
1250-1259, Oct. 1989.
[Knecht83]
M. Knecht et al., "A High-Speed Ultra-Low Power 64K CMOS
EPROM with On-Chip Test Functions," IEEE J. Solid State Circuits,
Vol. SC-18, pp. 554-561, Oct. 1983.
[Kuriyama90]
M. Kuriyama et al., "A 16-ns 1-Mb CMOS EPROM," IEEE J. Solid
State Circuits, Vol. 25, pp. 1141-1146, October 1990.
[Murakami90]
S. Murakami et al., "A 21-mW 4-Mb CMOS SRAM for Battery Operation," IEEE J. Solid State Circuits, Vol. 26, pp. 1563-1570, October
1990.
[Mutoh95]
S. Mutoh, T. Douseki, Y. Matsuya, T. Aoki, S. Shigematsu, J. Yamada,
"l-V Power Supply High-Speed Digital Circuit Technology with Multithreshold-Voltage CMOS," IEEE Journal Solid State Circuits, Vol. 30,
No. 8, pp. 847-854, August 1995.
[Ohtsuka87]
N. Ohtsuka et al., "A 4-Mbit CMOS EPROM," IEEE J. Solid State Circuits, Vol. SC-22, pp. 669-675, October 1987.
[Rabaey95]
Rabaey, "Digital Integrated Circuits: A Design Perspective", Prentice
Low Power ROM Generation
Hall, 1995.
[Sakurai84]
T. Sakurai et al., "A Low Power 46 ns 256 kbit CMOS Static RAM with
Dynamic Double Word Line," IEEE J. Solid State Circuits, Vol. SC-19,
pp. 578-585, Oct. 1984.
[Sasaki90]
K. Sasaki et al., "A 23-ns 4Mb CMOS SRAM with 0.2-uA Standby Current," IEEE J. Solid State Circuits, Vol. 25, pp. 1075-1081, October
1990.
[Stan89]
M. Stan and W. Burleson, "Limited-weight Codes for Low-power I/O,"
1994 International Workshop on Low-power Design, pp. 209-214, April
1993.
[Tabor90]
J. Tabor, "Noise Reduction Using Low Weight and Constant Weight
Coding Techniques," Master thesis, 75 pages, June 1990.
[Weste93]
N. Weste and K. Eshraghian, "Principles of CMOS VLSI Design", Sec.
Ed., Addison-Wesley, 1993.
[Xanthopoulos96] T. Xanthopoulos, "PYTHIA: A Power Estimator for Structural Verilog," 25 pages, April 1996.
[Yoshimoto83]
M. Yoshimoto et al., "A Divided World-Line Structure in the Static
RAM and Its Application to a 64K Full CMOS RAM," IEEE J. Solid
State Circuits, Vol. 18, pp. 479-485, October 1983.
Low Power ROM Generation