Reconfigurable Computing (EN2911X) Lecture 01: Introduction Prof. Sherief Reda Division of Engineering, Brown University Spring 2007 S. Reda EN2911X FALL’07 Methods for executing algorithms Hardware (Application Specific Integrated Circuits) Advantages: •very high performance and efficient Disadvantages: •not flexible (can’t be altered after fabrication) • expensive S. Reda EN2911X FALL’07 Reconfigurable computing Advantages: •fills the gap between hardware and software •much higher performance than software •higher level of flexibility than hardware Software-programmed processors Advantages: •software is very flexible to change Disadvantages: •performance can suffer if clock is not fast •fixed instruction set by hardware Temporal vs. spatial based computing Temporal-based execution (software) Spatial-based execution (reconfigurable computing) Ability to extract parallelism (or concurrency) from algorithm descriptions is the key to acceleration using reconfigurable computing S. Reda EN2911X FALL’07 Reconfigurable devices Programmable interconnect Programmable logic blocks • Field-Programmable Gate Arrays (FGPAs) are one example of reconfigurable devices • An FPGA consists of an array of programmable logic blocks whose functionality is determined by programmable configuration bits • The logic blocks are connected by a set of routing resources that are also programmable Custom logic circuits can be mapped to the reconfigurable fabric S. Reda EN2911X FALL’07 Configuring FPGAs [Maxfield’04] FPGAs can be dynamically reprogrammed before runtime or during runtime (virtual hardware) • full • partial S. Reda EN2911X FALL’07 Uses of reconfigurable devices 1. Low/med volume IC production 2. Early prototyping and logic emulation 3. Accelerating algorithms in reconfigurable computing environments i. ii. iii. Reconfigurable functional units within a host processor (custom instructions) Reconfigurable units used as coprocessors Reconfigurable units that are accessed through external I/O or a network [Compton’02] S. Reda EN2911X FALL’07 Current problems with conventional Intel VP Patrick Gelsinger (ISSCC 2001) computing “If scaling continues at present pace, by 2005, high speed processors would have power density of nuclear reactor, by 2010, a rocket nozzle, and by 2015, surface of sun.” •Technology scaling doubled the number of devices in an IC (processors, FPGAs, …, etc) every 2-3 years • Scaling also provided devices with reduced delay → frequency doubling (with aggressive pipelining) → increased power density •Increases in clock frequency slowed down (or stopped); available devices are used to create multi-processor (multi-core) processors S. Reda EN2911X FALL’07 Why reconfigurable computing is more relevant these days? • Demand for high-performance computation is soaring: – large-scale optimization problems, physics and earth simulation, bioinformatics, signal processing (e.g. HDTV), …, etc) • Why software-programmed processors are no longer attractive? – Faster temporal execution of instructions) is no longer improving – General-purpose multi-core processors requires coarse grain thread-level parallelism • Why reconfigurable fabrics are currently attractive? – Increased integration densities allow large FPGAs that can implement substantial functions – Provide the spatial computational resources required to implement massively-parallel computations directly in hardware S. Reda EN2911X FALL’07 Topics that will be covered in this class… (entry survey time) S. Reda EN2911X FALL’07 Topic 01: Programmable logic technology overview a b Truth table & | c y = (a & b) | !c y Programmed LUT a b c y SRAM cells 0 0 0 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 000 001 010 011 100 101 110 111 8:1 Multiplexer Required function abc Programming information could be stored in SRAM 4-input Look-Up Table (LUT) is the typical size S. Reda EN2911X FALL’07 y Topic 01: Programmable logic technology overview a b c d 4-input LUT y mux flip-flop q e clock Switch box S. Reda EN2911X FALL’07 Topic 02: Reconfigurable computing methodologies software System Specification compile for target processor Textual HDL Graphical State Diagram partitioning When clock rises If (s == 0) then y = (a & b) | c; else y = c & !(d ^ e); Top-level block-level schematic hardware Graphical Flowchart Block-level schematic synthesis (compilation) Mapping (placement & routing) configuration data S. Reda EN2911X FALL’07 Topic 03: Hardware programming languages (Verilog) • Verilog is a hardware description language used to model digital systems • Similar in syntax to C • Differs from conventional programming languages as the execution of statements is not strictly linear. Possible to have sequential and concurrent execution statements • The language can be synthesized into logic circuits S. Reda EN2911X FALL’07 module mux(a, b, select, y); input a, b, select; output y; initial begin always @ (a or b or select) if (select) y = a; else y = b; end endmodule Topic 04: Rapid prototyping with Altera DE2 board No need to design our board; we will use Altera’s DE2 board and Quartus II software. Features: Cyclone II FPGA 35K LUTs 10/100 Ethernet RS232 Video out (VGA 10-bit DAC) Video in (NTSC/PAL/multi-format) USB 2.0 (type A and type B) PS/2 mouse or keyboard port Line in/out, microphone in (24-bit Audio CODEC) Expansion headers (76 signal pins) Infrared port Memory 8-MBytes SDRAM, 512K SRAM, 4-MBytes flash SD memory card slot Displays 16 x 2 LCD display Eight 7-segment displays Switches and LEDs S. Reda EN2911X FALL’07 Topic 05: High-level synthesis languages (SystemC) #include "systemc.h" • SystemC is a system description SC_MODULE(adder) language for hardware/software systems { • SystemC is a set of library and macros sc_in<int> a, b; implemented in C++ to allow sc_out<int> sum; specification and simulation of void do_add() { concurrent processes sum = a + b; • Allow high-level description of hardware } modules • A subset of the language can be SC_CTOR(adder) { SC_METHOD(do_add); synthesized into logic circuits. We will sensitive << a << b; use Celoxica Agility compiler as our } synthesizer tool }; S. Reda EN2911X FALL’07 Topic 06: Algorithm acceleration using reconfigurable computing • Learn how to use FPGAs and reconfigurable computing principles to accelerate algorithms: sorting, dynamic programming, NP-hard problems, …, etc. • Accelerating application in various fields – Signal and image processing – Cryptology – Bioinformatics – Pattern recognition … etc S. Reda EN2911X FALL’07 Topic 07: Soft multi-core computing environments Nios processor Core 1 Nios processor Core 2 BUS Accelerator • • • • • Memory Learn about hard and soft processors Design multi-core-based reconfigurable computing systems Design of on-chip networks for multi-core systems Design of custom instructions Design of pluggable acceleration function units S. Reda EN2911X FALL’07 Goals of this class • Learn principles of reconfigurable computing with minimum hardware bakground • Acquire hands-on experience and useful implementation skills – Verilog / SystemC / Quartus II • Develop/strengthen research skills S. Reda EN2911X FALL’07 Class organization • • • • HW assignments (paper reviews + mini labs): 20% Class participation: 10% Midterm: 20% Class project (progress/final reports and presentation): 50% • Sources: papers, lecture slides, manuals and book chapters. • Class website: http://ic.engin.brown.edu/classes/EN2911F07 S. Reda EN2911X FALL’07