ECE 506 Reconfigurable Computing http://www.ece.arizona.edu/~ece506 Lecture 1 Course Introduction Ali Akoglu Background needed for this course ° You should be familiar with: • Digital design • Architecture - Controller+Datapath - Memory Hierarchy - Pipelining • More listed in syllabus ° Assumes no knowledge of reconfigurable computing Topic self-contained! Reconfigurable Computing is a lot more than just devices Goals ° Understanding of issues related to RC (reconfigurable computing) • Architectures • Tools • Design methodologies ° Detailed investigation of a specific problem • Research project Course Organization • 3-5 Homework assignments (30%) • Project (45%) • Exam (15%) • Class participation/attendance (10%) • No required text – readings will be assigned from research papers Project ° Groups • Size to be determined based on enrollment - Likely 3-4 per group ° Topic subject to instructor approval • Will give examples ° Phase 1: Literature Review (no page limit), Due: Feb 19, In class hard copy! ° Phase 2: Project Plan, Due Feb 26, In class hard copy ° Phase 3: Presentations, starting March 17 ° Phase 4: Final Report and Demo (Last class, In class hard copy) Reading List ° Reading List will be posted ahead of time • Collection of papers and/or tutorials ° Discussion Oriented • Participation General Purpose Computing? ° In 1945, John Von Neumann demonstrated that a computer could execute any kind of computation, given a properly programmed control, without the need for hardware modification. ° Quickly became the fundament of future generations of high-speed digital computers. • One of the reasons is its simplicity of programming that follows the sequential way of human thinking. General Purpose Computing? ° All algorithms must be sequentially programmed to run on a VN computer, many algorithms cannot be executed with their potential best performance. General Purpose Computing ° Advantage: Flexibility: any well coded program can be executed ° Drawbacks Speed: Not optimal due to the sequential program execution (temporal resource sharing). Resource efficiency: Only one part of the hardware resources is required for the execution of an instruction. The rest remains idle. Memory access: Memories are about 10 times slower than the processor Drawbacks are compensated using high clock speed, pipelining, caches, instruction pre-fetching, etc. Domain Specific Processors ° Data path is tailored for an optimal execution of a common set of operations that mostly characterizes the algorithms • Digital Signal Processor (DSP) belong to the most used domainspecific processors in telecommunication, multimedia, automobile, radar, sonar, seismic, image processing, etc. • Ability to perform one or more multiply accumulate (MAC) operations in single cycle. • Special support for efficient looping. - Special loop or repeat instruction allows a loop implementation without expending any instruction cycles for updating and testing the loop counter or branching back to the top of the loop. • Customized for data with a given width according to the application domain. (image processing, pixels are represented in Red Green Blue (RGB) system where each color is represented by a byte, then an image processing DSP will not need more than 8 bit data path. • Specialization increases the performance and improves the device utilization. • Flexibility is reduced, because it cannot be used anymore to implement other applications other than those for which it was optimally designed. Application Specific Processors ° Although DSPs incorporate a degree of applicationspecific features such as MAC and data width optimization, they still remain sequential machines. ° If a processor has to be used for only one application, then the processing unit could be designed and optimized for that particular application. ° In multimedia processing, processors are usually designed to perform the compression of video frames according to a video compression standard. Application Specific Processors if (a { d = c = } else { d = c = } < b) then a+b; a*b; b+1; a-1; At least 3 instructions Exec.time >= 3*tinstruction The complete execution is done in parallel in a single clock cycle Exec.time = tclock= delay longest path from input to output The VN computer needs to be clocked at least 3 times faster (to reach equal exec.time) Implementation Spectrum (Hardware vs. Software) Microprocessor Reconfigurable ASIC Hardware • Computer hardware, such as application-specific integrated circuits (ASICs) - provides highly optimized resources for quickly performing critical tasks, - but it is permanently configured to only one application via a multimillion-dollar design and fabrication effort. • Computer software provides the flexibility to change applications and perform a huge number of different tasks, - orders of magnitude worse than ASIC implementations in terms of performance, silicon area efficiency , and power usage. • Reconfigurable hardware blends the benefits of both hardware and software. - implement circuits just like hardware, yet can be reprogrammed cheaply and easily to implement a wide range of tasks. Application Specific Processors Processing Approaches, Need for Reconfigurable Computing Programmable Special Purpose Data Level Parallelism SIMD MIMD Reconfigurable ASIC Instruction Level VLIW/ Superscalar General Purpose CISC With Media extended ISA RISC Without ISA What is Reconfigurable Computing? • Computation using hardware that can adapt at the logic level (post-fabrication) to solve specific problems • A way of implementing circuits without fabricating a device • Spatial structure of the device is modified to match the new application. Reconfigurable Computing What is it? Compute by building a circuit rather than executing instructions. Efficient for long running computations Video and image processing DSP Network processing Example: Z[i] = a.X[i] + b.Y[i] //program Load rx, X X Y *a *b Mpy r1, rx, ra Load ry, Y Mpy r2, ry, rb Add r3, r1, r2 + Store r3, Z Z implement computations spatially , simultaneously computing millions of operations in resources distributed across a silicon chip. Reconfigurable Computing? ° can be hundreds of times faster than microprocessorbased designs ° unlike in ASICs, computations are programmed into the chip, not permanently frozen by the manufacturing process. ° FPGA-based system can be programmed and reprogrammed many times. • a bug fix to correct faulty behavior, or it is used to add a new feature. • reconfigure a generic computation engine for a new task • reconfigure a device during operation to allow a single piece of silicon to simultaneously do the work of numerous special-purpose chips Reconfigurable Computing? ° Delivering best of hardware and software , not quite! • creating efficient programs for them is more complex • useful only for operations that process large streams of data, such as signal processing, networking, and the like. • Compared to ASICs, they may be 5 to 25 times worse in terms of area, delay , and performance. ° ASIC design may take months to years to develop and have a multimillion-dollar price tag ° RC design might only take days to create and cost tens to hundreds of dollars. ° For systems that do not require the absolute highest achievable performance or power efficiency, RC is a compelling design alternative. Reconfigurable Computing? ° Current devices can compute functions • on the order of millions of basic gates, • running at speeds in the hundreds of Megahertz. ° To boost speed and capacity , additional, special elements can be embedded • such as large memories, multipliers, fast-carry logic for arithmetic and logic functions, and even complete microprocessors. ° Reconfigurable devices today are capable of implementing complete systems Reconfigurable Computing Devices: FPGA ° Field Programmable Gate Arrays • Logic blocks in a general routing structure. • array of logic gates is the G and A in FPGA. - logic blocks perform simple combinational logic, as well as sequential logic. • FPGA can implement very complex circuits. • The logic and routing elements in an FPGA are controlled by programming points • By way of a configuration file or bitstream, an FPGA can be configured to implement the user’s desired function. - allowing customization at the user’s electronics bench, or even in the final end product. - This is why FPGAs are field programmable FPGA ° customizing an FPGA merely involves storing values to memory locations, similarly to compiling and then loading a program onto a computer, the creation of an FPGA-based circuit is a simple process of creating a bitstream to load into the device Reconfigurable Computing, Function Level Programming? ° Because of the FPGA’s dual nature—combining the flexibility of software with the performance of hardware—an FPGA designer must think differently from designers who use other devices. ° Software developers typically write sequential programs that exploit a microprocessor’s ability to rapidly step through a series of instructions. ° In contrast, a high-quality FPGA design requires thinking about spatial parallelism—that is, simultaneously using multiple resources spread across a chip to yield a huge amount of computation. Reconfigurable Computing, Function Level Design? ° the flexibility of FPGAs gives architects new opportunities generally not available in ASICs • designs can be rapidly developed and deployed, and even reprogrammed in the field with new functionality . • they do not demand the huge design teams and validation efforts required for ASICs. ° Also, the ability to change the configuration, even when the device is running ° However, FPGAs are noticeably slower and have lower capacity than ASICs, designers must carefully optimize their design to the target device. Fields of Application ° Rapid Prototyping: Testing hardware before fabrication • Software simulation - Relatively inexpensive - Slow - Accuracy? • Hardware emulation - Hardware testing under real operation conditions - Fast - (Relatively) Accurate - Allows for several iterations Fields of Application ° Post-fabrication Customization • Time to market advantage • Ship the first version of a product • Remote upgrading with new product versions • Remote repairing Fields of Application ° Multi-modal Computation: Reconfigurable vehicles, mobile phones, etc. Built-in Digital Camera Video phone service Games Internet Navigation system Emergency Diagnostics Different standard and protocols Monitoring Entertainment Fields of Application ° Adaptive Computing Systems • Computing systems that are able to adapt their behavior and structure to changing operating and environmental conditions, time-varying optimization objectives, and physical constraints like changing protocols, new standards, or dynamically changing operation conditions of technical systems. Fields of Application ° Fault tolerance • Autonomous fault detection on communication lines • Detections of defect nodes • Task migration on node failure • Load balancing computation Conclusion ° 10 years of Moore’s-law progress led to the microprocessor • Raised engineers’ productivity • Problem-solving became programming • Grew to billions of units/year ° Further speed gains will not be seen any more due to unreliability and higher variations of transistor ° Stalled progress in design methods for thirty years ° Future Multi-Core Designs are already available, but do have major problems: • Shared Memory Model does not scale to hundreds of processors on a chip • Distributed Memory Model is difficult to program • Power consumption and temperature are further problems ° Reconfigurable Processors, Networks, and Memories on a Chip may be the solution… Hot Reconfigurable Computing Research Areas • Developing power-efficient architectures and CAD techniques for FPGAs • Important new applications for reconfigurable devices (especially embedded applications and security) • Better understanding the role of standard microprocessors and reconfigurable hardware. - Multiple types of parallelism • Coarse-grained reconfigurable architectures • 3D Reconfigurable Architectures • Autonomous Systems - Self-healing