mCafe Testing

Studies in Parallel & Distributed Systems – 159.735 Parallel Computing Using FPGA (Field Programmable Gate Arrays) Sohaib Ahmed 15th May, 2009 Outlines  FPGAs and their internal structures  Why use FPGAs for parallel computing ?  Types of FPGAs  Application Examples and Processing in Applications  FPGAs in Parallel Computing  FPGA Limitations  Design Methods for FPGAs  Conclusion FPGAs - Introduction  Ross Freeman, one of the Xilinx founder (www.xilinx.com) invented FPGAs in mid1980s  Other vendors include Altera, Actel, Lattice Semiconductor and Atmel  Support the notion of reconfigurable computing  Reconfigurable Computing  Use of multiple reconfigurable devices (such as FPGAs) and multiple microprocessors  Processor(s) execute sequential and non-critical code while reconfigurable fabric (FPGAs) performed that code which can be mapped efficiently to hardware FPGAs Internal Structure A semiconductor device consisting of :  Configurable Logic Blocks (CLBs)  Input/Output (I/O) Blocks (IOBs)  Static RAM (SRAM) Blocks  Digital Signal Processing Blocks (DSPBs) Why using FPGAs ?   Speed up Technology Clock Speed Time Taken 66 MHz 0.36 ms Hardware is faster than software [1] XV2V6000 FPGA Optimized Software 2.6 GHz 196.71 ms  FPGAs can support thousand-fold parallelism especially for low-precision computations  Cost  Development cost is much less than ASIC (Application-specific integrated circuits) for lower volumes  Flexibility  FPGAs are flexible as compare to ASIC as they can be reprogrammable Types of FPGAs  CPLDs ( Complex Programmable Logic Devices)  Requires voltage levels that are not usually present on computer systems  Anti-fuse based devices  Program only once  Static-RAM-Based Services  Can be programmed while the device is running Application Examples  Virtex-II Pro  Virtex-4  Xilinx Devices  Recent success of FPGA in Tsubame Cluster in Tokyo  Improved performance by additional 25% Processing in Applications [2] FPGAs in Parallel Computing  Dynamic matching of a node to the computational requirement of an application  Application specific computers become more flexible  Enables the support of multi modes of parallel computing : MIMD, SIMD etc  Partial reconfiguration can allow better hardware resource utilization  Can extend dynamic task allocation scheme to allow for dynamic hardware allocation  Support for variable grain size FPGAs Limitations  Capacity  Logic blocks have not dense representation as instructions have  Conventional processor run 90 % of code that takes 10 % of execution time  Reconfigurable logic takes 10 % of code that takes 90 % of execution time  Tools  Compilers for reconfigurable logic are not very good  Some operations are hard to implement on FPGAs like random access and pointerbased data structures Design Methods for FPGA [3]  Use an algorithm optimal for FPGAs  Systolic arrays for correlation are efficient  Use a computing mode appropriate for FPGAs   Streaming, systolic, arrays of fine-grained automata preferable Searching biomedical databases for similar sequences  Use appropriate FPGA structures  Analyzing DNA or protein sequences A straightforward systolic array  Design Methods for FPGA [3]  Living with Amdahl’s Law   Speeding up an application significantly through an enhancement requires most of the application to be enhanced NAMD & ProtoMol framework was designed for computational experimentation  Hide latency of independent functions   Latency hiding is a basic technique for achieving high performance in parallel applications Functions on the same chip to operate in parallel  Use rate-matching to remove bottlenecks  Function level parallelism is built in Design Methods for FPGA [3]  Take advantage of FPGA-specific hardware  Hard-wired components such as integer multipliers and independently accessible BRAMs (Block RAMs)  Xilinx VP100 has 400 independent accessible, 32-bit quad-ported BRAMs can help in achieving 20 Terabytes per sec at capacity  Use appropriate arithmetic precision  Use appropriate arithmetic mode  Minimize use of high-cost arithmetic operations Current Progress in Hardware & Software  SRC-6 and SRC-7 are parallel architectures in which cross bar switch that can be piled for scalability  High performance computing vendors like Silicon Graphics Inc. (SGI), Cray and Linux Networx incorporated FPGAs in their parallel architectures [4]  VHDL, Verilog are used to create hardware kernel  Other hardware description languages like Carte C, Carte Fortran, Impulse C, Mitrion C and Handel-C are used.  Annapolis Micro Systems’ CoreFire, Starbridge Systems’ Viva, Xilinx System Generator and DSPlogic’s reconfigurable computing toolbox are the high-level graphical programming development tools [5] Conclusion Using FPGAs in Parallel computing offer following benefits :  Application acceleration  Flexibility in terms of application domain  Potential cost benefits over ASICs  The ability to exploit variable levels and modes of parallelism  More effective use of hardware resources  References [1] Todman,T.J,Constantinides, G.A, Witon, S.J.E, Mencer,O., Luk,W. & cheung, P.Y.K (2005) Reconfigurable computing : architectures and design methods [2] Altera Cooperation White Paper (2007). Accerating high performance computing with FPGAs. October 2007 [3] Herbordt, M.C., VanCourt, T., Yongfeng, G., Shukhwani, B., Conti,A., Model,J. & Disabello,D. (2007). Achieving high performance with FPGA-Based computing [4] Buell, D., El-Ghazawi, T., Gaj,K.,& Kindratenko,V. (2007). High-Performance reconfigurable computing. IEEE Computer Society, March, 2007 [5] El-Ghazawi, T., El-Araby,E., Miaoqing Huang, Gaj,K., Kindratenko, V.,& Buell, D. (2008).The promise of highperformance reconfigurable computing. IEEE computer society, February, 2008 pp. 69 -76. Any Questions ? Thank You

mCafe Testing

Related documents

Products

Support

mCafe Testing

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib