A Dual-VDD Low Power FPGA Architecture

advertisement
A Dual-VDD Low Power FPGA Architecture
A. Gayasen1, K. Lee1 , N. Vijaykrishnan1 , M. Kandemir1 , M.J. Irwin1 , and T. Tuan2
1
Dept. of Computer Science and Engineering
Pennsylvania State University
University Park, PA 16802
{gayasen,kiylee,vijay,kandemir,mji}@cse.psu.edu
2
Xilinx Research Labs
2100 Logic Dr.
San Jose, CA 95124
Tim.Tuan@xilinx.com
Abstract. The continuing increase in FPGA size and complexity and the emergence of sub-100nm technology have made FPGA power consumption, both dynamic and static, an important design consideration. In this work, we propose
a programmable dual-VDD architecture in which the supply voltage of the logic
blocks and routing blocks are programmed to reduce power consumption by assigning low-VDD to non-critical paths in the design, while assigning high-VDD to
the timing critical paths in the design to meet timing constraints. We evaluate the
effectiveness of different VDD assignment algorithms and architectural implementations. Our experimental results show that reducing the supply voltage selectively
to the non-critical paths provides significant power savings with minimal impact
on performance. One of our VDD -assignment techniques provides an average power
saving of 61% across different MCNC benchmarks.
1
Introduction
In modern FPGAs, power consumption has become an important design consideration.
Increasing performance and complexity have raised the dynamic power consumed per chip,
while the use of deep sub-micron processes has resulted in higher static power in the forms
of sub-threshold leakage and gate leakage. High power consumption requires expensive
packaging and cooling solutions. In battery-powered applications, high power consumption
may prohibit the use of FPGA altogether. Consequently, solutions for reducing FPGA
power are needed.
Reducing the supply voltage (VDD ) is an effective technique for reducing both dynamic
and static power. Dynamic power has a quadratic dependency on supply voltage, while
both sub-threshold leakage (due to Drain Induced Barrier Lowering, DIBL) and gate
leakage exhibit exponential dependencies on the supply voltage. However, reducing supply
voltage also negatively affects circuit performance. A well-known technique to reap the
benefits of voltage scaling without the performance penalty is the use of dual-VDD . The
timing critical blocks in the design operate on the normal VDD (or VDDH ), while noncritical blocks operate on a second supply rail with a lower voltage (or VDDL ). While
dual-VDD ICs have been successfully used in low-power ASICs and custom ICs [17], no
commercial FPGA today uses multiple VDD ’s for power reduction.3
The difficulty of designing a dual-VDD FPGA is that the optimal VDD assignment
changes from one design to another. Consequently, if logic blocks are statically determined
to be operating at low or high VDD , the placement and routing algorithms need to be
modified accordingly as in [11]. However, static assignment of VDD to the blocks may
prevent the ability to reduce power consumption or to meet timing constraints for some
3
Xilinx Virtex-II FPGAs use different supply voltages for I/O and the core. Pass transistors
used for interconnects are also supplied higher gate voltages to eliminate the VT drop. But this
is not targeted to reduce power.
designs. In contrast, the use of VDD -programmability for each block helps to tune the
number of high and low VDD blocks as desired by the application. In this approach,
the challenge is in determining the VDD assignments to each block. The need for level
converters wherever a low-VDD logic block drives a high-VDD block and the associated
delay and energy overheads are an important consideration when performing these VDD
assignments. Furthermore, positioning of the level converters influences the ability to assign
lower VDD ’s to the routing blocks.
In this work, we propose a programmable dual-VDD architecture in which the supply
voltage of the logic and routing blocks are programmed to reduce power consumption by
assigning low-VDD to non-critical paths in the design, while assigning high-VDD to the
timing critical paths in the design to meet timing constraints. In our programmable dualVDD architecture (see figure 1), the VDD of a circuit block is selected between VDDH and
VDDL by using two high-VT transistors (supply transistors) connecting the block to the
supplies. The state (ON/OFF) of each supply transistor is controlled by a configuration
bit, which is set by the VDD assignment algorithm. The configuration bits are set either
to connect the block to one of the power supplies or completely disconnect the block
from both the power supply lines when the block is unused or idle. We evaluate the
effectiveness of different VDD assignment algorithms and implementation choices for an
island style FPGA architecture designed in 65nm technology. Our results indicate that
one of our VDD -assignment techniques provides an average power saving of 61% across
different MCNC benchmarks.
(a)
(b)
(c)
Fig. 1. Supply transistors used for programmable VDD
The remainder of this paper is organized as follows. In Section 2, we revise the related
work, focusing in particular on power optimizations for FPGAs. In Section 3, we discuss
our dual-VDD FPGA architecture. Section 4 describes the experimental methodology we
used, and discusses the VDD assignment algorithms and the power estimation technique
we used. Section 5 presents experimental results and section 6 concludes the paper.
2
Related Work
Most of the previous works on power modeling, estimation and reduction in FPGA have
focused primarily on dynamic power. In [9], the dynamic power of a Xilinx XC4003A
FPGA was analyzed by taking measurements of test designs. [15] analyzes dynamic power
consumption in Virtex-II FPGA family. [12, 10] evaluate different FPGA architectures for
power efficiency. [16] presents a routability-driven bottom-up clustering technique for area
and power reduction in clustered FPGAs.
Leakage in FPGAs has captured interest only very recently. [18] makes a detailed
analysis of leakage power in Xilinx CLBs. It concludes that significant reduction of FPGA
leakage is needed to enable the use of FPGAs in mobile applications. [3] presents a finegrained leakage control scheme using sleep transistors at gate level. [14] evaluates several
low-leakage design techniques for FPGAs and shows that using multiple VT switch blocks
reduces leakage significantly. [1] selects the polarities of logic signals to reduce active
leakage power in FPGAs. [5] presents a cut enumeration algorithm targeting low power
technology mapping for FPGA architectures with dual supply voltages. [6] presents a
region-constrained placement approach to reduce leakage in FPGAs.
Dual-VDD techniques have been proposed previously for ASICs [19, 17]. Recently, a lowpower FPGA using pre-defined dual-VDD /dual-VT fabrics has been proposed in [11]. But,
they have focused on reducing only dynamic power, while keeping the leakage constant.
Further, they have used a fixed dual-VDD /dual-VT fabric, keeping all the routing resources
at high-VDD , which limits the power savings significantly.
3
Architecture
The proposed dual-VDD architecture is built on cluster-based island-style FPGA architecture, with the configuration stored in SRAM cells. It facilitates configurable supply voltage
for logic blocks and routing multiplexers. Figure 2 gives an overview of the architecture.
The basic logic element (BLE) consists of a 4-input LUT and a flip-flop. Eight such BLEs
cluster together to form a logic block (CLB). Figure 2(a) shows how the CLB is configured
using high-VT supply transistors to operate at two different voltages.
(a) Dual-VDD CLB
(b) Dual-VDD routing mux
Fig. 2. Dual-VDD architecture
As mentioned in section 1, a dual-VDD design needs level conversion when a low-VDD
block drives a block operating at high-VDD . In our dual-VDD architecture, level conversion
takes place only at CLB pins. For this purpose, CLB pins have level converters (LCs)
attached to them. A multiplexer allows to by-pass the level converter if level conversion is
not needed at that pin. Placing the level converter only at CLB pins reduces the complexity
of the routing fabric, and at the same time, limits the overheads due to level converters.
We experimented with two architectures differing in the placement of the level converters. While the first architecture places LCs at the output pins of CLBs; the second
architecture places them at CLB input pins. Figure 2a shows the first case, where only the
output pins of a CLB have LCs attached to them. In this case, a net with multiple fanouts
operates at high VDD if any one of the CLBs driven by this net is at high VDD (since,
the signal’s voltage level does not change in the routing fabric). This limits the number of
routing muxes that can be operated at low VDD , and therefore, is less effective in reducing
routing power compared to the case when LCs are attached to CLB input pins. But, the
drawback of keeping LCs at input pins of CLBs (apart from area penalty) is that a larger
number of LCs are needed, which increases the leakage in logic blocks. Our results support
this reasoning, but show that overall leakage is lower for the second case.
Figure 2(b) shows a routing multiplexer (mux) in the dual-VDD architecture. The
multiplexer’s output is connected to a level-restoring buffer to restore the VT -drop through
the NMOS-based multiplexer. Note that the same set of supply transistors control the
voltage of configuration SRAM cells and the level-restoring buffer. Since the configuration
SRAM is not timing critical, the supply transistors need to be sized just enough to supply
the maximum current needed by the level-restoring buffer connected to it.
If a circuit block (CLB or routing multiplexer) is completely unused, then in order to
save leakage, it is desirable to completely switch-off that block. This is achieved by keeping
a separate configuration bit for every supply transistor4 . Although this incurs more area
overhead, it results in significant leakage savings, since resource utilization in an FPGA is
typically low [18]. Due to the area overhead of level converters and supply transistors, the
dual-VDD FPGA takes approximately 21% more area than a single-VDD FPGA when LCs
are at CLB outputs. For the case when LCs are at CLB inputs, this number is estimated
to be around 23%.
Majority of leakage in an FPGA occurs in the configuration SRAM cells. It has been
previously shown in [6] that by increasing the threshold voltage of the configuration SRAM,
its leakage can be reduced by 98%, while increasing configuration time by 20%. Since
configuration time is not critical in most of our target designs, this tradeoff for power
savings is reasonable. In order to see the effect of dual-VDD on power consumption, we
have neglected the configuration SRAM leakage both for single supply design, and for
the dual supply design (since the reduction of configuration SRAM leakage is achieved by
increasing its threshold voltage, and is equally applicable to both single and dual supply
designs).
3.1
Level Conversion
Level converters have been studied widely ever since multi-VDD circuits were proposed
[19, 13]. The area, delay and power overheads of level converters prohibit random VDD
assignment to logic elements of a circuit. For the present work, we have used the level
converter circuit shown in Figure 3, and a 65nm BSIM4 SPICE model to simulate it. For
an FPGA architecture where level converters are placed at CLB input pins, four level
converters are required per BLE. For a VDDH of 1.1V and VDDL of 0.9V, the LC delay is
almost 17% of the delay of an LUT, and as much as 41% of the clock-to-Q delay of the
flip-flop. This significant delay in the LC prohibits the use of many LCs within a logical
path of the circuit. In contrast to delay, power consumption in an LC was observed to be
negligible (< 1%) compared to a BLE. This allows us to place LCs at all input pins of a
CLB and still get power savings.
4
Methodology
We used VPR and its power model [2, 12] for this work. MCNC benchmarks were used for
experimentation to evaluate the dual-VDD architecture and VDD assignment algorithms.
The routing architecture that we supplied to VPR closely resembles a modern FPGA,
with a routing channel width of 200, and buffered segments of lengths 1, 2, 6 and “long”.
The LUT-size of 4, and cluster-size of 8 LUTs are chosen to be same as a Xilinx Virtex-II
device.
4
In case of a routing mux, we may need to pull down the control signals when the mux is unused.
The pull-down transistors can be sized very small.
Fig. 3. Level converter circuit
Fig. 4. Experimental Flow
Circuit simulations were performed in SPICE using 65nm BSIM4 device models. Delays
of BLE and LC were obtained from these simulations. Power consumption, both static
and dynamic, of the LC was also obtained by simulating in SPICE using BSIM4 models.
Figure 4 shows the experimental flow. The flow deviates from a normal VPR flow after
the place and route stage. We first assign voltage to all CLBs using algorithms that are
discussed below, and then estimate power of the design placed and routed on the target
dual-VDD architecture. Assigning voltages after routing makes the timing analysis more
accurate, since all the routing delays get incorporated in the timing graph.
4.1
VDD assignment
In order to be effective, a dual VDD scheme requires that paths in the circuit vary in their
delays. If all paths are of same delay then all circuit elements will require high VDD to
maintain the performance of the design.
Figure 5 shows the distribution of path delays averaged over MCNC benchmarks which
we used for all our experimentation. It is evident from the figure that path delays in a
circuit vary considerably. Therefore, a dual-VDD scheme can be expected to reduce the
power consumption significantly. Figure 5 also shows the path delays after using our dualVDD assignment algorithms.
Optimal assignment of VDD to gates in a circuit is known to be an n-p complete
problem. We use the heuristic shown in figure 6 for VDD assignment. Initially we assign
low VDD to all CLBs in the FPGA, and find those paths whose delays become greater
than the desired clock time period. e call such paths “critical”. Those CLBs which do not
belong to any of the critical paths can be kept at low voltage without affecting performance
of the design. Some of the remaining CLBs and routing muxes need to operate at highVDD so that the design’s performance target is met. The order in which these CLBs are
analyzed is crucial for the performance of the heuristic. We define “criticality” of a CLB,
as the number of critical paths that pass through this CLB5 . The CLBs within a path
are analyzed in decreasing order of their criticalities. We started with CLBs on the most
critical path, and proceeded to smaller paths in decreasing order of their delay. Figure 6
shows the algorithm for the case when LCs are at CLB inputs. In that case all routing
muxes driven by a CLB have the same voltage as the CLB. For the other situation, when
5
This definition of criticality can potentially be improved by assigning priorities to paths depending on their delay or other parameters.
Fig. 5. Distribution of path delays
Table 1. Comparison of High-to-Low and Lowto-High algorithms (LC at CLB inputs, VDDH =
1.1V, VDDL = 0.9V
Design # CLBs
# VDDL CLBs
Low-to-High High-to-Low
alu4
191
51
65
apex2
235
54
74
apex4
158
46
26
bigkey
214
24
81
des
200
87
127
dsip
172
12
31
elliptic
451
339
327
ex1010
575
177
185
ex5p
133
30
24
misex3
175
18
16
pdc
572
400
405
s38584.1 806
724
739
seq
219
61
58
spla
462
215
226
tseng
131
102
114
Assign VDDL to all CLBs and routing muxes
P = list of all paths in the design
T = longest delay path when all circuit blocks operate at VDDH
Td = x T, where x ≥ 1 is a user-defined performance metric
critical path = {Pi ∈ P | delay(Pi ) > Td }
for each CLB
criticality(CLB) = number of paths passing through it
while (critical path not empty) {
Pk = path ∈ critical path with maximum delay
N = all blocks through which Pk flows
Sort N based on criticality (first entry has most paths)
while (delay(Pk ) > Td ) {
Ni = first(N)
N = N - Ni
Assign VDDH to Ni and all the routing muxes driven by Ni
update delay of all paths passing through Ni
}
critical path = critical path - {Pk }
}
Fig. 6. Algorithm for VDD assignment: Low-to-High (assuming LCs at CLB input pins)
LCs are at CLB outputs, the voltage of routing muxes driving a CLB is the same as that
of the CLB.
In order to enumerate all paths whose delays become larger than the required clock
time period, we used the algorithm proposed in [8]. It maintains all paths in a heap data
structure with their delays as the keys. Each path also maintains all the branch-points in
the path in increasing order of their branch-slacks 6 .
We experimented with a variant of the above algorithm (High-to-Low) too, in which
all the CLBs are initially kept at high voltage and then some of them are changed to low
VDD (see figure 7). Before changing a CLB to low-VDD , we need to make sure that this
will not increase the delay of some other path in the circuit above the desired clock period.
The number of low VDD blocks using both versions, for VDDH of 1.1V and VDDL of 0.9V
6
Branch slack is defined as the decrease in path delay if a particular branch point is used to
generate a new path
Assign VDDH to all CLBs and routing muxes
P = list of all paths in the design
T = longest delay path when all circuit blocks operate at VDDH
Td = x T, where x ≥ 1 is a user-defined performance metric
vddl delay(Pi ) = delay(Pi ) when all blocks in Pi are at VDDL
critical path = {Pi ∈ P | vddl delay(Pi ) > Td }
for each CLB
criticality(CLB) = number of paths passing through it
while (critical path not empty) {
Pk = path ∈ critical path with maximum delay
N = all blocks through which Pk flows
Sort N based on criticality (last entry has most paths)
while ((delay(Pk ) < Td ) & (N not empty)) {
Ni = first(N)
N = N - Ni
Assign VDDL to Ni and all the routing muxes driven by Ni
calculate delays of all paths flowing through Ni
if any of the delays > Td
reset Ni and all routing muxes driven by Ni to VDDH
else
update delays of all paths flowing through Ni
}
critical path = critical path - {Pk }
}
Fig. 7. Algorithm for VDD assignment: High-to-Low (assuming LCs at CLB input pins)
(for 65nm technology) is shown in table 1. For 10 out of 15 designs, the High-to-Low (h2l)
version performs better than Low-to-High (l2h). This happens because in case of h2l, when
the CLBs on a particular path are being analyzed whether they can be run on low-VDD ,
the algorithm continues to look at all the other CLBs on the path even after it failed to
change the VDD of some CLB. In contrast, in the l2h case, the algorithm keeps changing
CLBs on a path to high VDD (in decreasing order of criticality), till the delay of the path is
less than the required clock period. This sometimes causes the path’s delay to be reduced
more than what was necessary.
4.2
Power estimation
After all logic blocks have been assigned appropriate supply voltages, we estimate power
consumption of the entire FPGA. We concentrate only on the power consumption in
the core of the FPGA, and do not try to optimize or estimate IO power consumption.
Furthermore, we did not estimate the power consumption in the global routing grid used
for clock distribution.
In order to estimate dynamic power, VPR’s power model calculates transition densities
at all internal nodes of the FPGA, assuming that all inputs to the FPGA have the same
static probability (default: 0.5). Capacitances are estimated from the capacitance values
of a MOSFET, and that of wires and switches, all of which need to be provided in the
architecture file taken by VPR as an input. We used Berkeley Predictive 65nm technology
parameters for our experimentation.
We modified VPR’s dynamic power model to include dual supply voltages, and the
power consumption of level converters. Due to quadratic dependence of dynamic power on
DDL 2
) when its voltage is
supply voltage, dynamic power of a circuit element reduces by ( VVDDH
reduced from VDDH to VDDL . Dynamic power of a level converter (obtained from SPICE
simulations) was added wherever a level converter was used (using the transition density
at that node).
VPR has got a basic leakage model, which calculates sub-threshold leakage due to
weak inversion. But in a 65nm technology, two more effects, namely, DIBL and gate leakage
become significant, and need to be included in the leakage estimation. We also modified the
leakage model to take into account multiple supply voltages, and sleep modes. Specifically,
the following modifications were made to VPR’s leakage estimation.
1. Gate leakage and sub-threshold leakage due to DIBL were included in the leakage estimation. In order to estimate leakage of a single MOSFET, we used results from SPICE
simulations. 65nm BSIM4 device models were used. Simulations were performed for
various supply voltages to get leakage numbers for different voltages. These numbers
were incorporated into the power model of VPR to estimate gate leakage of the entire
FPGA.
2. We estimated average leakage in a routing multiplexer by halving the worst case leakage, as discussed in [14]. To verify, we simulated multiplexers of various sizes and
structures and found our leakage estimate to be very close to the SPICE results.
3. In the dual-VDD FPGA, unused logic blocks and routing muxes are kept in a sleep
state, by switching off both the supply transistors. Circuit simulations in SPICE
showed that in sleep mode, leakage of a circuit block reduces to 10% of the original (high VDD ) leakage.
4. To estimate level converter leakage, we obtained the leakage number for one level converter from SPICE simulations, and multiplied this by the number of level converters
in the FPGA.
5
Results and Analysis
Power reduction due to the dual-VDD architecture strongly depends on the voltage values
of VDDH and VDDL . In order to understand this dependence, and to come up with a good
voltage choice, we fixed the high-VDD at 1.1V and varied VDDL from 0.8V to 1.0V. Figure 8
shows the power consumption for different VDDL values (using High-to-Low Algorithm,
LC at CLB’s inputs) . Note that for 11 (out of 15) designs, VDDL value of 0.9V results in
maximum power savings. When VDDL is increased to 1.0V, although the number of CLBs
on low VDD increases, the total power consumption increases. This happens because power
consumption of the circuit elements at 1.0V is significantly higher than at 0.9V. On the
other side, when we reduce VDDL to 0.8V, power consumption again increases because the
number of CLBs and routing muxes on low VDD becomes too low. Therefore, for all other
results in this section, we use a VDDL of 0.9V. For this case, on an average, we get close
to 61% power saving.
Figure 9 shows the power consumption of the designs for the two algorithms — High-toLow (h2l) and Low-to-High (l2h), and level converter placements — at CLB outputs (LCo)
or inputs (LCi). (h2lLCi denotes High-to-Low algorithm with LC at CLB Inputs.) Note
that for most designs, the High-to-Low algorithm outperforms the Low-to-High algorithm.
This is expected because we showed in section 4 that the High-to-Low algorithm resulted
in larger number of low-VDD CLBs. Further, the placement of LCs at CLB inputs saves
more power (average: 61%) than their placement at outputs (average: 57%). This happens
because LC leakage is not large enough to overshadow the gains we get in routing power
by placing LCs at CLB inputs. But note that placing the LCs at CLB inputs increases
the area of the FPGA.
Figure 10 shows the static and dynamic power consumption in both logic and routing
resources for the different algorithms and LC placements. An important observation is that
not all components of power are reduced by the same factor. The reduction in dynamic
power is much less than that in leakage. For example, using High-to-Low algorithm and
placing LC at CLB inputs saves 24% dynamic power and 76% leakage power. This can be
attributed to two factors. First, in an FPGA since there exist a large number of unused
circuit elements, it is possible to reduce the leakage in them by switching them off. And
second, leakage varies exponentially with supply voltage, but dynamic power varies only
quadratically with supply voltage. Note that leakage in routing resources reduces to less
than 17% of the original, because in most designs it is possible to put a large number of
Fig. 8. Power consumption
VDDL’s. VDDH = 1.1V .
for
different
Fig. 10. Average power breakdown between
logic and routing resources.
Fig. 9. Power consumption for different architectures and algorithms.
Fig. 11. Average power consumption for different critical path delay tolerances.
routing muxes in sleep state, as they are sparsely used. Another trend to note is that the
logic portion of leakage is larger when LCs are placed at CLB inputs (LCi) than when
they are placed at CLB outputs (LCo). This implies that the larger overall power saving
for the LCi case comes entirely from the routing resources.
Finally, figure 11 shows what happens when we modify the VDD assignment algorithm
to allow some degradation in the performance of the design. In the figure, a delay value of
110% denotes 10% performance penalty. Note that these delay values may increase after
circuit implementation due to the use of supply transistors, and due to a possible increase of
wire lengths (since total CLB area and consequently inter-CLB distances increase). Using
h2lLCi, a 10% decrease in performance increases the average power saving by around 4%.
But beyond 20%, power saving remains almost constant.
6
Conclusion and Future Work
We have presented a dual-VDD FPGA architecture that provides significant power savings
with minimal performance penalty. Variations of the VDD assignment algorithm and level
converter placement were explored. It was observed that High-to-Low Algorithm coupled
with placement of level converters at the input pins of CLBs resulted in maximum power
savings. An average power saving of 61% was observed for this case. The dynamic power
was reduced by 24%, while the reduction in static power was close to 76%. But placing
the level converters at CLB output pins reduces the area penalty by about 2% and still
saves about 57% of total power.
In the present work, the router in VPR is essentially unaware of multiple supply voltages available for every logic block and routing switches. This could be improved by performing a dual-VDD aware routing. We plan to work on this in future.
7
Acknowledgements
This work was supported in part by NSF CAREER Award 0093082, NSF Awards 0130143,
0202007, and a MARCO/DARPA GSRC Grant.
References
1. J. H. Anderson, F. Najm, and T. Tuan. “Active Leakage Power Optimization for FPGAs”.
In Proceedings of ACM/SIGDA International Symposium on Field-programmable gate arrays,
2004.
2. V. Betz and J. Rose. “VPR: A New Packing, Placement and Routing Tool for FPGA Research”. In International Workshop on Field-programmable Logic and Applications, 1997.
3. B. Calhoun, F. Honore, and A. Chandrakasan. “Design Methodology for Fine-grained Leakage
Control in MTCMOS”. In Proceedings of International Symposium on Low Power Electronics
and Design, 2003.
4. A. Chandrakasan, W. Bowhill, and F. Fox. “Design of High-Performance Microprocessor
Circuits”. IEEE Press, 2001.
5. D. Chen, J. Cong, F. Li, and L. He. “Low-Power Technology Mapping for FPGA Architectures with Dual Supply Voltages”. In Proceedings of International Symposium on Fieldprogrammable gate arrays, 2004.
6. A. Gayasen, Y. Tsai, N. Vijaykrishnan, M. Kandemir, M. J. Irwin, and T. Tuan “Reducing
leakage energy in FPGAs using region-constrained placement”. In Proceedings of International Symposium on Field-programmable gate arrays, 2004.
7. V. George, H. Zhang, and J. Rabaey. “The design of a low energy FPGA”. In Proceedings of
International Symposium on Low Power Electronics and Design, 1999.
8. Y-C. Ju and R. A. Saleh. “Incremental Techniques for the Identification of Statically Sensitizable Critical Paths”. In Design Automation Conference, 1991.
9. E. Kusse and J. Rabaey. “Low-Energy Embedded FPGA Structures”. In Proceedings of
International Symposium on Low Power Electronics and Design, 1998.
10. F. Li, D. Chen, L. He, and J. Cong. “Architecture Evaluation for Power-Efficient FPGAs”.
In Proceedings of ACM/SIGDA International Symposium on Field-programmable gate arrays,
2003.
11. F. Li, Y. Lin, L. He, and J. Cong. “Low-power FPGA using Pre-Defined Dual-Vdd/Dual-Vt
Fabrics”. In Proceedings of ACM/SIGDA International Symposium on Field-programmable
gate arrays, 2004.
12. K. Poon, A. Yan, and S. Wilton. “A flexible Power Model for FPGAs”. In Proceedings of
International Conference on Field Programmable Logic and Applications, 2002.
13. R. Puri, L. Stok, J. Cohn, D. Kung, D. Pan, D. Sylvester, A. Srivastava, and S. Kulkarni.
Pushing ASIC performance in a power envelope. Design Automation Conference, 2003.
14. A. Rahman and V. Polavarapuv. “Evaluation of Low-Leakage Design Techniques for Field
Programmable Gate Arrays”. In Proceedings of ACM/SIGDA International Symposium on
Field-programmable gate arrays, 2004.
15. L. Shang, A. S. Kaviani, and K. Bathala. “Dynamic power consumption in Virtex[tm]II FPGA family”. In Proceedings of ACM/SIGDA International Symposium on Fieldprogrammable gate arrays, 2002.
16. A. Singh and M. Marek-Sadowska. “Efficient Circuit Clustering for Area and Power Reduction
in FPGAs”. In Proceedings of ACM/SIGDA International Symposium on Field-programmable
gate arrays, 2002.
17. M. Takahashi et.al. “A 60-mW MPEG4 Video Codec Using Clustered Voltage Scaling with
Variable Supply-Voltage Scheme”. In IEEE Journal of Solid-State Circuits, Vol. 33, No. 11,
Nov 1998.
18. T. Tuan and B. Lai. “Leakage Power Analysis of a 90nm FPGA”. In Custom Integrated
Circuits Conference, 2003.
19. K. Usami and M. Horowitz. “Clustered voltage scaling technique for low-power design”. In
Proceedings of International Symposium on Low Power Electronics and Design, 1995.
Download