EE446 Embedded Architectures It is primarily a semiconductor device that can be configured by the user (customer or designer) after the manufacturing process has been completed The term "field-programmable" means the device is programmed by the customer, not the manufacturer. Can be programmed using a logic circuit diagram or source code in VHDL or Verilog It offers partial re-configuration of a portion of design An FPGA (Field Programmable Gate Array) is a reprogrammable chip which contains hundreds of thousands of logic gates that internally connects together to build complex digital circuitry. 3/16/2016 3 Real-time analysis of high-rate data streams (Performance) Deterministic hardware dedicated to every task (Reliability) Nonrecurring engineering expenses (Reconfigurability ) Radiation Hardened and Program Integrity. (Durability) Flexible and rapid prototyping (Development) FPGAs excel at computing non-data dependent algorithms in parallel. Customizable data path and ALU allow very large amounts of data to be transferred and computed within several clock cycles. Despite lower clock frequencies, FPGA’s can outperform conventional CPU’s on certain data processing tasks 3/16/2016 5 Cheap/fast fuse connections-One time programmable ◦ ◦ ◦ ◦ small area (can fit lots of them) low resistance wires (fast even if in multiple segments) very high resistance when not connected small capacitance (wires can be longer) Antifuse: One-time programmable Pass transistors (switches) ◦ used to connect wires ◦ bi-directional EEPROM SRAM Multiplexors ◦ used to connect one of a set of possible sources to input ◦ can be used to implement logic functions Xilinx FPGAs - 6 FPGAs have always been slower and required more energy leading to less functionality than ASICs Due to fabrication enhancements, and greater R&D the performance has been nearly normalized between FPGAs and ASICs Advantages of FPGAs over ASICs: ◦ Shorter time to market ◦ Can be re-programmed in the field to fix bugs, and lower engineering costs ◦ Hardware can be developed on ordinary FPGAs, leading to a finalized version that can no longer be modified after the design has been decided Power consumption - FPGAs fundamentally use a lot more power than ASICs Price - they also fundamentally cost more Speed - ASICs can still blow any FPGA away in speed although design techniques can help with this issue Density - ASICs can still pack a lot more logic into a single chip than an FPGA IP - modern, complex IP (a complete PCI Express of Hyper-transport core for example) may take up most or all of an FPGA but only 10% of an ASIC Common FPGA architecture involves: ◦ Configurable Logic Blocks (CLBs) ◦ I/O pads ◦ Routing Paths usually of the same width (# of wires) Standard Logic Block Logic Block Pin Assignment Logic optimization. Performs two-level or multi-level minimization of the Boolean equations to optimize area, delay, or a combination of both. Technology mapping. Transforms the Boolean equations into a circuit of FPGA logic blocks. This step also optimizes the total number of logic blocks required (area optimization) or the number of logic blocks in time-critical paths (delay optimization). Placement. Selects the specific location for each logic block in the FPGA, while trying to minimize the total length of interconnect required. Routing. Connects the available FPGA’s routing resources1 with the logic blocks distributed inside the FPGA by the placement tool, carrying signals from where they are generated to where they are used. Fuse and anti-fuse ◦ fuse makes or breaks link between two wires ◦ one-time programmable Flash ◦ High density ◦ Process issues RAM-based ◦ memory bit controls a switch that connects/disconnects two wires ◦ can be programmed and re-programmed easily (tested at factory) Logic block - how are functions implemented: fixed functions (manipulate inputs) or programmable? ◦ support complex functions, need fewer blocks, but they are bigger so less of them on chip ◦ support simple functions, need more blocks, but they are smaller so more of them on chip Interconnect how are logic blocks arranged? how many wires will be needed between them? are wires evenly distributed across chip? programmability slows wires down – are some wires specialized to long distances? ◦ how many inputs/outputs must be routed to/from each logic block? ◦ what utilization are we willing to accept? 50%? 20%? 90%? ◦ ◦ ◦ ◦ CLB - Configurable Logic Block ◦ direct ◦ general-purpose ◦ long lines of various lengths RAM-programmable ◦ can be reconfigured IOB CLB CLB IOB IOB Wiring Channels IOB Built-in fast carry logic Can be used as memory Three types of routing CLB IOB IOB IOB IOB ◦ 5-input, 1 output function ◦ or 2 4-input, 1 output functions ◦ optional register on outputs CLB The Virtex CLB Details of One Virtex Slice Each slice contains two sets of the following: ◦ Four-input LUT Any 4-input logic function, or 16-bit x 1 sync RAM (SLICEM only) or 16-bit shift register (SLICEM only) ◦ Carry & Control Fast arithmetic logic Multiplier logic Multiplexer logic ◦ Storage element Latch or flip-flop Set and reset True or inverted inputs Sync. or async. control 4-input function 3-input function; registered e.g. 9-input parity x1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 x2 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 x3 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 x4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 x1 x2 x3 x4 y 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 LUT y x1 x2 x3 x4 x1 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 x2 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 x3 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 x4 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 y 0 1 0 0 0 1 0 1 0 1 0 0 1 1 0 0 • Look-Up tables are primary elements for logic implementation • Each LUT can implement any function of 4 inputs x1 x2 y y COUT YB G4 G3 G2 G1 Y Look-Up O Table D Carry & Control Logic S Q CK EC R F5IN BY SR XB F4 F3 F2 F1 CIN CLK CE X Look-Up Table O Carry & Control Logic S D Q CK EC R SLICE Carry & Control Logic in Xilinx FPGAs x 0 0 1 1 y COUT 0 1 0 1 y CIN x y y CIN Propagate = x y Generate = y Sum= Propagate CIN = x y CIN Carry & Control Logic LUT Hardwired (fast) logic Critical Path for an Adder Implemented Using Xilinx Spartan 3 FPGAs The general architecture of Xilinx FPGAs consists of a two-dimensional array of programmable blocks, called Configurable Logic Blocks – CLBs, with horizontal and vertical routing channels between CLB’s rows and columns. Connection boxes Flexibility of Connection, Fc = 2, Can A connect to B? Switch Boxes Fs, defines for a wiring segment entering the S block the number of other wiring segments it can be connected to Routings using C and S Boxes Maze Router A* Search Routing The Pathfinder In Comparison to the Virtex 2 Configurable Logic Blocks Array (Row*Column): 160*54 Virtex 5 Slices: 17,280 Max Distributed RAM (Kb): 1,120 Block RAM Blocks 18Kb: 296 36Kb: 148 Max (Kb): 5,328 DSP48E Slices: 64 CMTs: 6 PowerPC Processor Blocks: 0 Configurable Logic Blocks Array (Row*Column): 80*46 Virtex 2 Slices: 13,969 Max Distributed RAM (Kb): 428 Block RAM Blocks Max (Kb): 2,448 I/O blocks provide the interface between package pins and the internal configurable logic Most popular and leading-edge I/O standards are supported by programmable I/O blocks (IOBs) The basic logic elements for Xilinx® FPGAs, providing combinatorial and synchronous sequential logic as well as distributed memory and shift register capability Virtex-5 FPGA CLBs are based on real 6-input look-up table technology and provide superior capabilities and performance Block RAM modules provide flexible 36 Kbit true dual port RAM that are cascadable; this allows for the formation of larger memory blocks Virtex-5 FPGA block RAMs possess programmable FIFO logic for increased device utilization Each block RAM can also be configured as two independent 18 Kbit true dualport RAM blocks, providing for designs needing smaller RAM blocks Cascadable embedded DSP48E slices with 25 x 18 two’s complement multipliers and 48-bit adder/subtracter/accumulator provide massively parallel DSP algorithm support Clock Management Tile (CMT) blocks provide the most flexible, highest-performance clocking for FPGAs 16-Character x 2-Line LCD 256 MB SODIMM Compact Flash Card The Xilinx System ACE Compact Flash (CF) configuration controller allows a Type I Compact Flash card to program the FPGA through the JTAG port. Eight general-purpose (active-High) DIP switches are connected to the user I/O pins of the FPGA 15 LEDs controllable by the FPGA: 8 green LEDs are general purpose LEDs arranged in a row, 5 green LEDs are positioned next to the pushbuttons, 2 red LEDs are for error conditions, but Is not limited to that purpose Ethernet Port 10/100/1000 Mb/s Audio Jacks for Microphone, Line In, Line Out, and Headphone. Supports stereo 16-bit audio with up to 48-kHz sampling The USB Controller provides USB connectivity for the board and supports host and peripheral modes of operation. The USB controller has an internal microprocessor to assist in handling of USB commands. The firmware for this processor can be stored in its own dedicated IIC EEPROM or can be downloaded from a host computer via a peripheral connector. The USB controller‘s serial port is connected to J30 through an RS-232 transceiver to assist with debug. The JTAG configuration port for the allows for programming the FPGA along with debugging support. • ISE Foundation (Project Navigator) allows for the start of the FPGA design process • Runs in background to maintain operation and flow of design by managing the chain of tools involved including but not limited to: Embedded Development Kit (EDK), ChipScope Pro and AccelDSP • EDK consists of XPS as mentioned before this can be run independently to begin a project however use of the project navigator provides for a more organized design process of an embedded system • XPS (Xilinx Platform Studio) and the XPS SDK (Software Development Kit) are the main components of the EDK • Allows for the utilization for the Base System Builder (BSB) if required for development of an existing board including layout and pin connections • Given that you have a supported embedded processor development board available from Xilinx the BSB allows you pick from the peripherals available on that board, automatically match the FPGA pinout to the board, and create a completed platform and test application ready to download and run on the board. The Base System Builder allows for the selection of the following system attributes: •Processor type (MicroBlaze or PowerPC, depending on your selected target FPGA device) •Reference and processor-bus clock frequency (BSB automatically infers and configures a Digital Clock Manager (DCM) primitive when needed) •Standard processor buses (all peripherals are automatically connected via appropriate buses) •Debug interface •Cache configuration •Memory size and type (both on-chip block RAM and controllers for off-chip memory devices) •Common peripherals (such as general purpose I/O, Universal Asynchronous Receiver-Transmitter (UART), and timer) •Automatic selection of the on-board FPGA •Selection of clock rates supported by the on-board oscillators •Automatic setting of reset polarity •Automatic generation of FPGA pinout to match the board connections, for the selected set of peripherals • Upon completion of BSB a Microprocessor Hardware Specification (MHS) file is created and loaded into the XPS project • The XPS can then be used to develop the embedded subsystem that was established through the BSB, which acts as a wizard/template for overall board capabilities • The next course of action would be to design all constraints, etc. of the system • Add the embedded system as a sub module to a top-level Xilinx® ISE® project in Project Navigator; declare, instantiate, and interconnect the embedded sub module in your top-level FPGA design when choosing to begin through the ISE Project Navigator