Studies in Parallel & Distributed Systems – 159.735 Parallel Computing Using FPGA (Field Programmable Gate Arrays) Sohaib Ahmed 15th May, 2009 Outlines FPGAs and their internal structures Why use FPGAs for parallel computing ? Types of FPGAs Application Examples and Processing in Applications FPGAs in Parallel Computing FPGA Limitations Design Methods for FPGAs Conclusion FPGAs - Introduction Ross Freeman, one of the Xilinx founder (www.xilinx.com) invented FPGAs in mid1980s Other vendors include Altera, Actel, Lattice Semiconductor and Atmel Support the notion of reconfigurable computing Reconfigurable Computing Use of multiple reconfigurable devices (such as FPGAs) and multiple microprocessors Processor(s) execute sequential and non-critical code while reconfigurable fabric (FPGAs) performed that code which can be mapped efficiently to hardware FPGAs Internal Structure A semiconductor device consisting of : Configurable Logic Blocks (CLBs) Input/Output (I/O) Blocks (IOBs) Static RAM (SRAM) Blocks Digital Signal Processing Blocks (DSPBs) Why using FPGAs ? Speed up Technology Clock Speed Time Taken 66 MHz 0.36 ms Hardware is faster than software [1] XV2V6000 FPGA Optimized Software 2.6 GHz 196.71 ms FPGAs can support thousand-fold parallelism especially for low-precision computations Cost Development cost is much less than ASIC (Application-specific integrated circuits) for lower volumes Flexibility FPGAs are flexible as compare to ASIC as they can be reprogrammable Types of FPGAs CPLDs ( Complex Programmable Logic Devices) Requires voltage levels that are not usually present on computer systems Anti-fuse based devices Program only once Static-RAM-Based Services Can be programmed while the device is running Application Examples Virtex-II Pro Virtex-4 Xilinx Devices Recent success of FPGA in Tsubame Cluster in Tokyo Improved performance by additional 25% Processing in Applications [2] FPGAs in Parallel Computing Dynamic matching of a node to the computational requirement of an application Application specific computers become more flexible Enables the support of multi modes of parallel computing : MIMD, SIMD etc Partial reconfiguration can allow better hardware resource utilization Can extend dynamic task allocation scheme to allow for dynamic hardware allocation Support for variable grain size FPGAs Limitations Capacity Logic blocks have not dense representation as instructions have Conventional processor run 90 % of code that takes 10 % of execution time Reconfigurable logic takes 10 % of code that takes 90 % of execution time Tools Compilers for reconfigurable logic are not very good Some operations are hard to implement on FPGAs like random access and pointerbased data structures Design Methods for FPGA [3] Use an algorithm optimal for FPGAs Systolic arrays for correlation are efficient Use a computing mode appropriate for FPGAs Streaming, systolic, arrays of fine-grained automata preferable Searching biomedical databases for similar sequences Use appropriate FPGA structures Analyzing DNA or protein sequences A straightforward systolic array Design Methods for FPGA [3] Living with Amdahl’s Law Speeding up an application significantly through an enhancement requires most of the application to be enhanced NAMD & ProtoMol framework was designed for computational experimentation Hide latency of independent functions Latency hiding is a basic technique for achieving high performance in parallel applications Functions on the same chip to operate in parallel Use rate-matching to remove bottlenecks Function level parallelism is built in Design Methods for FPGA [3] Take advantage of FPGA-specific hardware Hard-wired components such as integer multipliers and independently accessible BRAMs (Block RAMs) Xilinx VP100 has 400 independent accessible, 32-bit quad-ported BRAMs can help in achieving 20 Terabytes per sec at capacity Use appropriate arithmetic precision Use appropriate arithmetic mode Minimize use of high-cost arithmetic operations Current Progress in Hardware & Software SRC-6 and SRC-7 are parallel architectures in which cross bar switch that can be piled for scalability High performance computing vendors like Silicon Graphics Inc. (SGI), Cray and Linux Networx incorporated FPGAs in their parallel architectures [4] VHDL, Verilog are used to create hardware kernel Other hardware description languages like Carte C, Carte Fortran, Impulse C, Mitrion C and Handel-C are used. Annapolis Micro Systems’ CoreFire, Starbridge Systems’ Viva, Xilinx System Generator and DSPlogic’s reconfigurable computing toolbox are the high-level graphical programming development tools [5] Conclusion Using FPGAs in Parallel computing offer following benefits : Application acceleration Flexibility in terms of application domain Potential cost benefits over ASICs The ability to exploit variable levels and modes of parallelism More effective use of hardware resources References [1] Todman,T.J,Constantinides, G.A, Witon, S.J.E, Mencer,O., Luk,W. & cheung, P.Y.K (2005) Reconfigurable computing : architectures and design methods [2] Altera Cooperation White Paper (2007). Accerating high performance computing with FPGAs. October 2007 [3] Herbordt, M.C., VanCourt, T., Yongfeng, G., Shukhwani, B., Conti,A., Model,J. & Disabello,D. (2007). Achieving high performance with FPGA-Based computing [4] Buell, D., El-Ghazawi, T., Gaj,K.,& Kindratenko,V. (2007). High-Performance reconfigurable computing. IEEE Computer Society, March, 2007 [5] El-Ghazawi, T., El-Araby,E., Miaoqing Huang, Gaj,K., Kindratenko, V.,& Buell, D. (2008).The promise of highperformance reconfigurable computing. IEEE computer society, February, 2008 pp. 69 -76. Any Questions ? Thank You