Multi Purpose and Efficient Data Transferring FPGA Based Applications

advertisement
International Journal of Engineering Trends and Technology (IJETT) – Volume 13 Number 8 – Jul 2014
Multi Purpose and Efficient Data Transferring
FPGA Based Applications
1
1
2
M Sudhakar Reddy,
K. S. N. Vittal.
M.tech, , (VLSI&ES), QIS institute of technology, A.P; India
Assistant Professor, Dept .Of ECE, QIS institute of technology, A.P; India
ABSTRACT:
In this project, design of an
asynchronous FPGA blocks is implemented with
power optimization techniques. Concentrated on
STANDBY and DYNAMIC power consumptions are
presented and studied on various gating techniques.
The existing techniques are Standby power is reduced
by using autonomous fine grain power gating and
reducing the dynamic power by using the level
encoding dual rail (LEDR) architecture. The proposed
present circuit design of a low-power delay buffer. The
proposed delay buffer uses several new techniques to
reduce its power consumption in look up table in
FPGA. Since delay buffers are accessed sequentially, it
adopts a ring-counter addressing scheme. In the ring
counter, double-edge-triggered (DET) flip-flops are
utilized to reduce the operating frequency by half and
the C-element gated-clock strategy is proposed. The
gated-driver-tree idea is also employed in the input
and output ports of the memory block to decrease
their loading, thus saving even more power.
KEYWORDS: Level encoding dual rail (LEDR), fine
grain power gating, gated driver tree, C- element,
delay buffer, gated clock, ring counter.
I.
2
INTRODUCTION
Due to the dramatic increase in portable and
battery-operated
applications,
lower
power
consumption has become a necessity in order to
prolong the battery life. Power consumption is an
important part of the equation determining the end
product's size, weight, and efficiency. Selecting an
appropriate FPGA architecture is critical in
achieving the best static and dynamic power
consumption. Flash-based FPGAs by Micro semi
are the low-power leaders in the industry. In
addition to utilizing the low-power attributes of
flash-based FPGAs, you can deploy several design
techniques to further reduce overall power. The
important FPGA power components to consider in
the following sections:
• Power-up (inrush power): Inrush power is the
amount of power drawn by the device during powerup
• Configuration power: Configuration power is the
amount of power required during the loading of the
ISSN: 2231-5381
FPGA upon power-up (specific to SRAM-based
programmable logic devices).
• Static (standby) power: Static power is the amount
of power the device consumes when it is poweredup but not actively performing any operation.
• Dynamic (active) power: Dynamic power is the
amount of power the device consumes when it is
actively operating.
• Sleep power (low-power mode): Some FPGA
devices offer low-power or sleep modes. In some
cases, this may be different from static power. This
application note focuses on reducing the dynamic
power.
In FPGA design, the clock gating and power
gating is important work. To implement clock
gating, circulation is employed. The idea of
circulation is to retain the contents of the flip-flop in
the sleep state. Circulation can reduce the dynamic
power consumption of registers and the gates in the
fan-out of the registers. However, the standby power
consumption of the clock network cannot be
reduced. The standby power is a serious problem
because it has an enormously large number of
transistors to achieve its programmability. Low-cost
FPGAs consume up to hundreds of mille watts
power. Power gating has emerged as the most
effective design technique to achieve low standby
power. Power gating techniques are based on
selectively setting the functional units into a low
leakage mode when they are inactive.
Currently, most circuits adopt static random
access memory plus some control/addressing logic
to implement delay buffers. For smaller length delay
buffers, shift register can be used instead. The
former approach is convenient since SRAM
compilers are readily available and they are
optimized to generate memory modules with low
power consumption and high operation speed with a
compact cell size. Previously, a simplified and thus
lower-power sequential addressing scheme for
SRAM application in delay buffers is proposed. To
use double-edge-triggered (DET) flip-flops instead
of traditional DFFs in the ring counter to halve the
http://www.ijettjournal.org
Page 356
International Journal of Engineering Trends and Technology (IJETT) – Volume 13 Number 8 – Jul 2014
operating clock frequency. A novel approach using
the C-elements instead of the R–S flip-flops in the
control logic for generating the clock-gating signals
is adopted to avoid increasing the loading of the
global clock signal. Also proposed gate the drivers
in the clock tree. The technique will greatly decrease
the loading on distribution network of the clock
signal for the ring counter and thus the overall
power consumption.
II.
The ability to reconfigure functionality to be
implemented on a chip gives a unique advantage to
designer who designs his system on an FPGA It
reduces the time to market and significantly reduces
the cost of production.
FIELD PROGRAMMABLE GATE
ARRAYS
Field Programmable Gate Arrays are two
dimensional arrays of logic blocks and flip-flops
with electrically programmable interconnections
between logic blocks. The interconnections consist
of electrically programmable switches which is why
FPGA differs from Custom ICs, as Custom IC is
programmed using integrated circuit fabrication
technology to form metal interconnections between
logic blocks.
In an FPGA logic blocks are implemented using
multiple level low fan-in gates, which gives it a
more compact design compared to an
implementation with two-level AND-OR logic.
Logic block of an FPGA can be configured in
such a way that it can provide functionality as
simple as that of transistor or as complex as that of a
microprocessor. It can used to implement different
combinations of combinational and sequential logic
functions. Logic blocks of an FPGA can be
implemented by any of the following:

Transistor pairs

Combinational gates like basic

NAND gates or XOR gates

N-input Lookup tables

Multiplexers

Wide fan-in And - OR structure
Routing in FPGAs consists of wire segments of
varying lengths which
can be interconnected
via electrically programmable switches. Density of
logic block used in an FPGA depends on length and
number of wire segments used for routing. Number
of segments used for interconnection typically is a
tradeoff between density of logic blocks used and
amount of area used up for routing.
ISSN: 2231-5381
Fig1: FPGA Architecture
III.
ASYNCHRONOUS ARCHITECTURE
DESIGN
Most digital circuits designed and fabricated
today are “synchronous”. In essence, they are based
on two fundamental assumptions that greatly
simplify their design: (1) All signals are binary, and
(2) All components share a common and discrete
notion of time, as defined by a clock signal
distributed throughout the circuit.
Asynchronous circuits are fundamentally
different; they also assume binary signals, but there
is no common and discrete time. Instead the circuits
use handshaking between their components in order
to perform the necessary synchronization,
communication, and sequencing of operations.
Expressed in ‘synchronous terms’ this results in a
behavior that is similar to systematic fine-grain
clock gating and local clocks that are not in phase
and whose period is determined by actual circuit
delays – registers are only clocked where and when
needed.
This difference gives asynchronous circuits
inherent properties that can be (and have been)
exploited to advantage in the areas listed and
motivated below. The interested reader may find
further introduction to the mechanisms behind the
advantages.

Low power consumption, due to fine-grain
clock gating and zero standby power
consumption.

High operating speed, operating speed is
determined by actual local latencies rather
than global worst-case latency.
http://www.ijettjournal.org
Page 357
International Journal of Engineering Trends and Technology (IJETT) – Volume 13 Number 8 – Jul 2014

Less emission of electro-magnetic noise,
the local clocks tend to tick at random
points in time.

Robustness towards variations in supply
voltage, temperature, and fabrication
process parameters, timing is based on
matched delays (and can even be
insensitive to circuit and wire delays)

Better composability and modularity,
because of the simple handshake interfaces
and the local timing.

No clock distribution and clock skew
problems; there is no global signal that
needs to be distributed with minimal phase
skew across the circuit.
The asynchronous architecture it detects the
activity of a power gated domain. The activities are:

To determine when logic block is standby
state, when sleep state & when active state

It compares the phase of the input data and
output data
It determines the function of lookup table

Dynamic power reducing purposed introduce
dual rail encoding [2] and level encoding dual rail
architecture [3]. Standby power reducing introduced
autonomous fine grain power gating technique [3].
The registers store the data value and produce the
output to switch block. Sleep controller monitor
wake up the successive block when it gets data. The
switch block consists of pass transistor switches. In
a switch block, a wire-set consists of four blocks. In
the switch block there are four signals
IV.
AUTONOMOUS FINE GRAIN POWER
GATING
Fig2: Control Strategy of the power
Gating Method
An efficient control strategy of the autonomous
fine-grain power gating. The standby state is used to
do the following:

Wake up the LB before the data arrives

Power OFF the LB only when the data does
not come for quite a while
The use of the standby state has two major
advantages,
First, the wake-up time can be hidden since the LB
has already been woken up when the data arrivals.
Second, dynamic power can be saved since the
number of the unnecessary switching of the sleep
transistor is reduced [3].
V.
LEDR ENCODING
Fig3: LEDR encoding Data Transmission

Data signal (first bit)

Data signal (second bit)

Acknowledgement signal Logic

Data arrival signal
Asynchronous FPGAs based on LEDR
encoding. LEDR is one of several two-phase dualrail encodings. In LEDR encoding, no spacer is
required shown fig3.This results in high throughput
and low dynamic power consumption because of the
number of signal transitions reduced by half [3].
The above four signal acknowledgement signal
and data arrival signal connected to pervious logic
block. The two pass switches are used for the four
wires of the wire-set, one Va, Ra, ack and wakeup
signal wires respectively. The pass switches are
controlled by the same memory bits.
ISSN: 2231-5381
Table 1 shows the code table of LEDR
encoding. In LEDR encoding, each data value has
two types of code words with different phases. The
example where data values "0," "0," and "1" are
transferred. The main feature is that the sender sends
data values alternately in phase ° and phase 1.
Because no spacer is required, the number of signal
transitions is half of four-phase dual-rail encoding.
As a result, the throughput is high and the power
consumption is small. Based on this observation, in
the FPGA, LEDR encoding is employed for
http://www.ijettjournal.org
Page 358
International Journal of Engineering Trends and Technology (IJETT) – Volume 13 Number 8 – Jul 2014
implementing the asynchronous architecture to
reduce the dynamic power.
Fig5: Block Diagram of the Lookup Table
Table 1: Code table of LEDR encoding
VI.
LOGIC BLOCK DESIGN
The detailed structures of the decoder and
multiplexer based lookup table showed below
Figure. In that diagram a AND gates are used for
decoder and pass transistor used for multiplexer
logic
.
Fig6: The Detail Structure of Look up Table
Fig4: Logic block design
In this logic block contains lookup table, sleep
controller, registers and programmable delay
elements, c-element with ring counter, gated driver
tree presented. The description of the c-element with
ring counter described below
A. Look up table design
VII.
This architecture contains four sub modules.
Each sub modules consist of a decoder, a
multiplexer and memory bits. The decoder designed
by two four input AND gates. The output of the
decoder is given to the multiplexer. The multiplexer
is designed by four pass transistor logic and one
inverter logic. The decoders exclude invalid input
patterns with different phases. The valid data are fed
to the multiplexer. As a result, the numbers of
multiplexers are reduced and the transistor count is
reduced compared to the multiplexer type LOOKUP
TABLE. If the combination of inputs are invalid
(i.e., if the two inputs have the different phases) all
pass transistors turn OFF according to the output of
the decoder. The decoder and multiplexer based
lookup table as shown
ISSN: 2231-5381
The previous outputs stored in latch, if input
patterns are valid (i.e., if the two inputs have the
same phase), according to the corresponding passtransistors turn ON. The value of the memory bit is
selected as outputs; the outputs are stored in the
latches.
PROPOSED DESIGN
A. Delay buffers
In the proposed delay buffer, several power
reduction techniques are adopted. Mainly, these
circuit techniques are designed with a view to
decreasing the loading on high fan-out nets, e.g.,
clock and read/write ports. In, the R–S flip-flop is
replaced by a C-element. Besides, the operating
frequency is reduced to half by using the DET flipflop.
The major advantage of the C-element is that its
output is free of glitches, which is essential for a
clock gating signal. Since the DFFs are replaced by
DET flip-flops to run the ring counter at half speed,
the gating on–off condition needs to be revised.
When the input of the last DET flip-flop in the
http://www.ijettjournal.org
Page 359
International Journal of Engineering Trends and Technology (IJETT) – Volume 13 Number 8 – Jul 2014
previous block has a transition from ‘0’ to ‘1’, the
clock signal in the current block is enabled. When
the output of the first DET flip-flop in the next block
rises from ‘0’ to ‘1’, both inputs of the C-element go
to ‘0’ and the clock is turned off in the current
block.
The ‘gate’ signal for those drivers can utilize the
same clock gating signals of their driving blocks.
Thus, the driver tree ‘gate’ signal should be asserted
when the active cell (whose output is ‘1’) in the ring
counter is one of its descendants in the quaternary
driver tree. Given M blocks, each having DET flipflops, instead of activating all drivers.
C. C-Element for dynamic power reduction
The Muller C-element, or Muller C-gate, is a
commonly used asynchronous logic component
originally designed by David E. Muller. It applies
logical operations on the inputs and has hysteresis.
The output of the C-element reflects the inputs when
the states of all inputs match. The output then
remains in this state until the inputs all transition to
the other state.
Fig7: Diagram of ring counter with clock gated by C-Elements
B. Gated driver tree
Proposed apply gating to the driver tree network
that delivers the global clock signal to all blocks.
Since, at any time, at most, two blocks need the
global clock signal, so only those drivers along the
path from the clock source to the blocks that need to
be driven by the global clock are activated, as shown
in Figure 8.
Fig9: C- Element logic diagram
This model can be extended to the Asymmetric
C-element where some inputs only effect the
operation in one of the transitions (positive or
negative).
A
0
0
1
1
B
0
1
0
1
Q
0
Q(t-1)
Q(t-1)
1
Table2: Truth Table for C-Element
If both inputs are 0, then the pull-up network
changes the latch's state, and the C-element outputs
a 0. If both inputs are 1, then the pull-down network
changes the latch's state, making the C-element
output a 1. Otherwise, the input of the latch is not
connected to either Vdd or ground, and so the weak
inverter (drawn smaller in the diagram) dominates
and the latch outputs its previous state
Fig8: Clock driver tree and gating signal
ISSN: 2231-5381
http://www.ijettjournal.org
Page 360
International Journal of Engineering Trends and Technology (IJETT) – Volume 13 Number 8 – Jul 2014
VIII.
RESULTS
A. Simulations
Fig10: Input simulation for logic block in FPGA
Fig11: Output simulation for logic block in FPGA
ISSN: 2231-5381
http://www.ijettjournal.org
Page 361
International Journal of Engineering Trends and Technology (IJETT) – Volume 13 Number 8 – Jul 2014
B. Synthesis Report
Power existing method
702
mw
260
mw
Power proposed method
Total Equivalent Gate Count for
Existing Design
12,052
CELLS
Total Equivalent Gate Count for
Proposed Design
10,215
CELLS
Latency in existing method
31.80 ns
Latency in proposed method
29.00 ns
IX.
CONCLUSION
In this paper, we presented a low-power
asynchronous FPGA architecture which adopts
several novel techniques to reduce power
consumption. The ring counter with clock gated by
the C-elements can effectively eliminate the
excessive data transition without increasing loading
on the global clock signal. The gated-driver tree
technique used for the clock distribution networks
can eliminate the power wasted on drivers that need
not be activated. Another gated-demultiplexer tree
and a gated-multiplexer tree are used for the input
and output driving circuitry to decrease the loading
of the input and output data bus. All gating signals
are easily generated by a C-element taking inputs
from some DET flip-flop outputs of the ring
counter.
REFERENCES
[5] W. Li and L. Wanhammar, “A pipeline FFT processor,” in
Proc. Workshop Signal Process. Syst. Design Implement, 1999,
pp. 654–662.
[6] E. K. Tsern and T. H. Meng, “A low-power video-rate
pyramid VQ decoder,” IEEE J. Solid-State Circuits, vol. 31, no.
11, pp. 1789–1794, Nov. 1996.
[7] N. Shibata, M.Watanabe, and Y. Tanabe, “A current-sensed
high-speed and low-power first-in-first-out memory using a
wordline/bitline- swapped dual-port SRAM cell,” IEEE J. SolidState circuits, vol. 37, no. 6, pp. 735–750, Jun. 2002.
[8] E. Sutherland, “Micropipelines,” Commun. ACM, vol. 32, no.
6, pp. 720–738, Jun. 1989.
[9] R. Hosain, L. D. Wronshi, and A. albicki, “Low power design
using double edge triggered flip-flop,” IEEE Trans. Very Large
Scale Integr. (VLSI ) Syst., vol. 2, no. 2, pp. 261–265, Jun. 1994.
[10] K. Zhang, U. Bhattacharya, Z. Chen, F. Hamzaoglu, D.
Murray, N.Vallepalli, Y.Wang, B. Zheng, and M. Bohr, “SRAM
design on 65-nm CMOS technology with dynamic sleep
transistor for leakage reduction,” IEEE J. Solid-State Circuits,
vol. 40, no. 4, pp. 895–901, Apr. 2005
[11] Masanori hariyama, shota ishihara, chang chia wei and
michitaka kameyama, "a field-programmable vlsi based on an
asynchronous bit-serial architecture," a-sscc, pp. 380--383, 200
M Sudhakar Reddy was born in
A.P. India. He received the B.Tech degree in
Electronics & communications Engineering from
Jawaharlal Nehru Technological University in
2012.Presentlyhe is pursuing M.Tech VLSI &
Embedded Systems, in QIS Institute of Technology.
His research interests include Low power design.
[1] W. Eberle et al., “80-Mb/s QPSK and 72-Mb/s 64-QAM
flexible and scalable digital OFDM transceiver ASICs for
wireless local area networks in the 5-GHz band,” IEEE J. SolidState Circuits, vol. 36, no. 11, pp. 1829–1838, Nov. 2001.
[2] M. L. Liou, P. H. Lin, C. J. Jan, S. C. Lin, and T. D. Chiueh,
“Design of an OFDM baseband receiver with space diversity,”
IEE Proc. Commun., vol. 153, no. 6, pp. 894–900, Dec. 2006.
[3] N.Rajagopala Krishnan And K. Sivasuparamanyan “A
Reconfigurable Low Power FPGA Design with Autonomous
Power Gating and LEDR Encoding” 978-1-4673-46030/12/$31.00 ©2012 IEEE.
[4] G.Pastuszak, “A high-performance architecture for embedded
block coding in JPEG 2000,” IEEE Trans. Circuits Syst. Video
Technol., vol. 15, no. 9, pp. 1182–1191, Sep. 2005.
ISSN: 2231-5381
K. S. N. Vittal was born in A.P.
India. He received the B.Tech degree in Electronics
& communications Engineering from Jawaharlal
Nehru Technological University in 2008. He
received M.TECH degree in K L University in 2012.
His research interests include Analog design and
Low power design. Presently he is working as
Assistant Professor, Department of E.C.E, in QIS
Institute of Technology.
http://www.ijettjournal.org
Page 362
Download