Benefits of Partial Reconfiguration • Reducing the size of the FPGA device required to implement a given function, with consequent reductions in cost and power consumption • Providing flexibility in the choices of algorithms or protocols available to an application • Enabling new techniques in design security • Improving FPGA fault tolerance • Accelerating configurable computing Definitions Partial Reconfiguration (PR) Partial Reconfiguration is modifying a subset of logic in an operating FPGA design by downloading a partial configuration file. Partition Logical section of the design, defined by the user, to be considered for design reuse. This is the section of the FPGA designated for PR. Definitions Configuration Frame The smallest addressable segments of the FPGA configuration memory space. Reconfigurable frames are built from discrete numbers of these lowest level elements Frame The smallest reconfigurable region within an FPGA device. Frame in Virtex-6 are 40 CLBs high by 1 CLB wide. Frame in Virtex-5 are 20 CLBs high by 1 CLB wide. Frame in Virtex-4, are 16 CLBs high by 1 CLB wide. Partition Pins • Logical and physical connection between static logic and reconfigurable logic. • Formerly known as “Bus Macros”. • Now they are automatically created for all Reconfigurable Partition ports, no manual reconfiguration is needed from the user. Partition Pins – Proxy Logic • Each partition pin is connected to a LUT, These LUTs are called “Proxy Logic” • Logically located in the Static Region • Required to be at a fixed known point to act as an interface between static and reconfigurable partitions Partition Pins Bus Macro Implementation PR Region Selection Internal Configuration Access Port (ICAP) • Port to read and write the FPGA configuration at run time • enables a user to write software programs for an embedded processor that modifies the circuit structure and functionality during the circuit’s operation. • Allows for automated runtime reconfiguration Hardware ICAP Core Example Self Reconfiguration System Two Ways to update LUTS • Modular Partial Reconfiguration: – Large changes – Full module is changed between two configurations • Difference Partial Reconfiguration – Small Changes – Modify a LUT function or a memory content Modular Layout Constraints of Modular Method • The size and the position of a module cannot be changed. • Input-output blocks (IOBs) are exclusively accessible by contiguous modules. • Reconfigurable modules can communicate only with neighbor modules, and it must be done through bus macros. • No global signals are allowed (e.g., global reset), with the exception of clocks that use a different bitstream and routing channels. Difference Based Partial Reconfiguration • Small changes on the FPGA configuration • Manually done using the FPGA Editor • What can be modified? – LUTs equations – BRAM contents and BRAM write modes – I/O standards and pull-ups or pull-downs on external pins – muxes that invert polarity, – Flipflop initialization and reset values, • What cannot be modified? – Routing – very dangerous: internal contentions. Problems with Difference Based Method • Lack of Automation – Changes must be done manually • In complex designs it is difficult to find the component you want to modify • Xilinx current support is mostly for modular based method. The Early-Access Partial Reconfiguration Design Flow (The Old Way - used by the paper) This design flow is no longer valid using newer Xilinx tools. The New Way – A little easier 1. Synthesize base system with reconfigurable module with input and output pins (XPS) 2. Create software files to manage PR Region (SDK) 3. Use Plan Ahead to manage PR Region and create bit-streams (Lots of steps here) 4. Merge bit-streams into download.bit to run on the FPGA Customizing Virtual Networks with Partial FPGA Reconfiguration • Use reconfiguration to create/modify a virtual network • Heuristic to assign networks to HW/SW • Allows for migration of virtual networks between HW and SW • Workstation handles SW while FPGA handles HW Motivation • Network applications have diverse performance requirements for underlying network infrastructure • Desire Flexibility, performance and isolation • Numerous hardware network substrates would work, but the hardware implementation overhead is too high • Need for straight-forward reconfigurationover-time method Limitations of Current State of the Art Current Implementation Pros Cons Software virtual networks on microprocessors High Flexibility and Isolation Low performance Network processors / dedicated hardware High Performance Hardware shuts down during reconfiguration (Low isolation) Low performance when reconfiguring (Low isolation) Migrate networks to software when reconfiguring Flexible High Performance Flexible Related Work • Supercharging Planet Lab (SPL) – Intel IXP Network Processors – Dedicated ternary content addressable memory (TCAM) for each forwarding table • FPGA implementation – 4 concurrent virtual networks with dedicated – Dedicated IP routers – Additional networks run in software – Switch to software when reconfiguring Related Work • Field Programmable Port Extender (FPX) – High speed switch – PR packet processing modules using APIs – Specialized software (PARBIT) for bit-stream management and generation – PR network processor – Reconfigurable Accelerator for packet processing functions Proposed Approach • Use reconfiguration to create/modify a virtual network • Heuristic to assign networks to HW/SW • Allows for migration of virtual networks between HW and SW • Workstation handles SW while FPGA handles HW System Design Partially Reconfigurable FPGA System Static Region PR Region Arbiter Header Verification Packet Classifier Checksum Verification Output Queues IP Lookup CPU Transceiver ARP Lookup Time to live (TTL) Lookup System Design Software Based System OpenVZ – lightweight virtualization apporach FPGA receives packets and send them to HW or SW CPU Transeiver sends the packet to the workstation OpenVZ processes the packet on a virtual network All processed packets go to the output queues on FPGA System Design Dynamic Virtual Network Allocation • Virtual network removal • Virtual network addition • Virtual network bandwidth adjustment – Reduction -> No effect on other networks – Increase -> Greedy Rebalancing Algorithm: Migrate lowest bandwidth to software or the highest bandwidth to hardware to make room for adjustment System Implementation Experimental Approach • • • • Used EARP design flow to generate bitstreams Used Bus-Macros Separate Verilog files for all modules Downloaded to the FPGA using JTAG Experiment Testbed Throughput of Routers Static Reconfigurable Method Why is there such a drop in performance in this method? Partial Reconfigurable Method Why is A unaffected by the reconfiguration like in previous method? There isn’t much of a performance gain between the highest performing PR and SR methods. Why? What causes the SR method to outperform the PR method as the number of Virtual Networks increases? Hot-Spot Mitigation • Use Partial Reconfiguration to swap hot and cool modules around the chip based on a thermal trigger. • Set a threshold temperature and when a module reaches that threshold, swap positions with a cool module using reconfiguration Hot-Spot Mitigation Module the size of ¼ a Microblaze Module the size of ½ a Microblaze Module the size of one Microblaze Pixel Video Processor • PR modules can be reconfigured to perform arbitrary one pixel functions • Tested to see how the reconfiguration time overhead effects the system Pixel Processor My Project: Data Adaptable Reconfiguration