mCafe Testing

advertisement
Studies
in
Parallel
&
Distributed
Systems
–
159.735
Parallel Computing Using FPGA (Field Programmable Gate Arrays)
Sohaib Ahmed
15th May, 2009
Outlines

FPGAs and their internal structures

Why use FPGAs for parallel computing ?

Types of FPGAs

Application Examples and Processing in Applications

FPGAs in Parallel Computing

FPGA Limitations

Design Methods for FPGAs

Conclusion
FPGAs - Introduction

Ross Freeman, one of the Xilinx founder (www.xilinx.com) invented FPGAs in mid1980s

Other vendors include Altera, Actel, Lattice Semiconductor and Atmel

Support the notion of reconfigurable computing

Reconfigurable Computing

Use of multiple reconfigurable devices (such as FPGAs) and multiple
microprocessors

Processor(s) execute sequential and non-critical code while reconfigurable fabric
(FPGAs) performed that code which can be mapped efficiently to hardware
FPGAs Internal Structure
A semiconductor device consisting of :

Configurable Logic Blocks (CLBs)

Input/Output (I/O) Blocks (IOBs)

Static RAM (SRAM) Blocks

Digital Signal Processing Blocks (DSPBs)
Why using FPGAs ?


Speed up
Technology
Clock Speed
Time Taken
66 MHz
0.36 ms
Hardware is faster than software [1]
XV2V6000
FPGA
Optimized
Software
2.6 GHz
196.71 ms

FPGAs can support thousand-fold parallelism especially for low-precision computations

Cost

Development cost is much less than ASIC (Application-specific integrated circuits) for
lower volumes

Flexibility

FPGAs are flexible as compare to ASIC as they can be reprogrammable
Types of FPGAs
 CPLDs ( Complex Programmable Logic Devices)
 Requires voltage levels that are not usually present on computer systems
 Anti-fuse based devices
 Program only once
 Static-RAM-Based Services
 Can be programmed while the device is running
Application Examples

Virtex-II Pro

Virtex-4

Xilinx Devices

Recent success of FPGA in Tsubame Cluster in
Tokyo

Improved performance by additional 25%
Processing in Applications
[2]
FPGAs in Parallel Computing

Dynamic matching of a node to the computational requirement of an application

Application specific computers become more flexible

Enables the support of multi modes of parallel computing : MIMD, SIMD etc

Partial reconfiguration can allow better hardware resource utilization

Can extend dynamic task allocation scheme to allow for dynamic hardware allocation

Support for variable grain size
FPGAs Limitations

Capacity

Logic blocks have not dense representation as instructions have

Conventional processor run 90 % of code that takes 10 % of execution time

Reconfigurable logic takes 10 % of code that takes 90 % of execution time

Tools

Compilers for reconfigurable logic are not very good

Some operations are hard to implement on FPGAs like random access and pointerbased data structures
Design Methods for FPGA
[3]

Use an algorithm optimal for FPGAs

Systolic arrays for correlation are efficient

Use a computing mode appropriate for FPGAs


Streaming, systolic, arrays of fine-grained automata preferable
Searching biomedical databases for similar sequences

Use appropriate FPGA structures

Analyzing DNA or protein sequences
A straightforward systolic array

Design Methods for FPGA
[3]

Living with Amdahl’s Law


Speeding up an application significantly through an enhancement requires most of the
application to be enhanced
NAMD & ProtoMol framework was designed for computational experimentation

Hide latency of independent functions


Latency hiding is a basic technique for achieving high performance in parallel
applications
Functions on the same chip to operate in parallel

Use rate-matching to remove bottlenecks

Function level parallelism is built in
Design Methods for FPGA
[3]

Take advantage of FPGA-specific hardware

Hard-wired components such as integer multipliers and independently accessible
BRAMs (Block RAMs)

Xilinx VP100 has 400 independent accessible, 32-bit quad-ported BRAMs can help in
achieving 20 Terabytes per sec at capacity

Use appropriate arithmetic precision

Use appropriate arithmetic mode

Minimize use of high-cost arithmetic operations
Current Progress in Hardware & Software

SRC-6 and SRC-7 are parallel architectures in which cross bar switch that can be
piled for scalability

High performance computing vendors like Silicon Graphics Inc. (SGI), Cray and Linux
Networx incorporated FPGAs in their parallel architectures [4]

VHDL, Verilog are used to create hardware kernel

Other hardware description languages like Carte C, Carte Fortran, Impulse C, Mitrion
C and Handel-C are used.

Annapolis Micro Systems’ CoreFire, Starbridge Systems’ Viva, Xilinx System
Generator and DSPlogic’s reconfigurable computing toolbox are the high-level
graphical programming development tools [5]
Conclusion
Using FPGAs in Parallel computing offer following benefits :

Application acceleration

Flexibility in terms of application domain

Potential cost benefits over ASICs

The ability to exploit variable levels and modes of parallelism

More effective use of hardware resources

References
[1]
Todman,T.J,Constantinides, G.A, Witon, S.J.E, Mencer,O., Luk,W. & cheung, P.Y.K (2005) Reconfigurable
computing : architectures and design methods
[2] Altera Cooperation White Paper (2007). Accerating high performance computing with FPGAs. October 2007
[3] Herbordt, M.C., VanCourt, T., Yongfeng, G., Shukhwani, B., Conti,A., Model,J. & Disabello,D. (2007). Achieving
high performance with FPGA-Based computing
[4] Buell, D., El-Ghazawi, T., Gaj,K.,& Kindratenko,V. (2007). High-Performance reconfigurable computing. IEEE
Computer Society, March, 2007
[5] El-Ghazawi, T., El-Araby,E., Miaoqing Huang, Gaj,K., Kindratenko, V.,& Buell, D. (2008).The promise of highperformance reconfigurable computing. IEEE computer society, February, 2008 pp. 69 -76.
Any Questions ?
Thank You
Download