Lesson 1 (Part 2) FPGA Architectures Sept. 2005 EE37E Adv. Digital Electronics Two competing implementation approaches ASIC Application Specific Integrated Circuit • designs must be sent for expensive and time consuming fabrication in semiconductor foundry • designed all the way from behavioral description to physical layout Sept. 2005 FPGA Field Programmable Gate Array • bought off the shelf and reconfigured by designers themselves • no physical layout design; design ends with a bitstream used to configure a device EE37E Adv. Digital Electronics Which Way to Go? ASICs FPGAs Off-the-shelf High performance Low development cost Low power Short time to market Low cost in high volumes Sept. 2005 Reconfigurability EE37E Adv. Digital Electronics Major FPGA Vendors SRAM-based FPGAs • Xilinx, Inc. • Altera Corp. • Atmel • Lattice Semiconductor Flash & antifuse FPGAs • Actel Corp. • Quick Logic Corp. Sept. 2005 EE37E Adv. Digital Electronics Other FPGA Advantages • Manufacturing cycle for ASIC is very costly, lengthy and engages lots of manpower – Mistakes not detected at design time have large impact on development time and cost – FPGAs are perfect for rapid prototyping of digital circuits • Easy upgrades like in case of software • Unique applications – reconfigurable computing Sept. 2005 EE37E Adv. Digital Electronics • We analyze here the basic structures of FPGAs, known as fabrics. • There are different ways to build an FPGA. • The two major styles of FPGAs are: SRAMbased and antifuse-based FPGAs. • The features of I/O pins are fairly similar among these two types of FPGAs. Sept. 2005 EE37E Adv. Digital Electronics Characteristics of FPGA programming Technologies Sept. 2005 Feature SRAM Antifuse E2PROM / FLASH Technology node State-of-the-art One or more generations behind One or more generations behind Reprogrammable Yes (in system) No Yes (in-system or offline) Reprogramming speed (inc. erasing) Fast ---- 3x slower than SRAM Volatile (must be programmed on power-up) Yes No No (but can be if required) Requires external configuration file Yes No No Good for prototyping Yes (very good) No Yes (reasonable) Instant-on No Yes Yes IP Security (especially when using bitstream encryption) Very Good Very Good Size of configuration cell Large (six transistors) Very small Medium-small (two transistors) Power consumption Medium Low Medium Rad Hard No Yes Not really Acceptable EE37E Adv. Digital Electronics 1-bit Static RAM Sept. 2005 EE37E Adv. Digital Electronics Elements of an FPGA fabric • Logic Element (LE) or CLB • Interconnect. • I/O pins. IOB IOB IOB LE Sept. 2005 … LE LE interconnect LE LE … LE LE LE LE EE37E Adv. Digital Electronics Terminology • Configuration: bits that determine logic function + interconnect. • CLB: combinational logic block = logic element (LE) = Configurable Logic Block (correct meaning) • LUT: Lookup table = SRAM used for truth table. • I/O block (IOB): I/O pin + associated logic and electronics. Sept. 2005 EE37E Adv. Digital Electronics Fine-,Medium-,and Coarse-grained Architectures • It’s common to categorize FPGA by analyzing the size and complexity of its internal logic elements. • In a fine-grained architecture, each logic block can be used to implement only a very simple function. For example,it might be possible to configure the block to act as any 3-input function,such as: – a primitive logic gate (AND,OR,NAND,etc), – a storage element(DFF,D-Latch,etc) • Today fine-grained architectures are being replace by medium- and coarse-grained, where each logic block contains a relative large amount of logic. Sept. 2005 EE37E Adv. Digital Electronics • As the granularity of the blocks increases to medium-grained and higher, the amount of connections into the blocks decreases compared to the amount of functionality they can support. • Today’s FPGAs are devices that have: – Embedded RAMs – Embedded multipliers, adders, and MACs (multiply accumulators) – Embedded Hard Processor Cores – Embedded Clock trees and clock managers Sept. 2005 EE37E Adv. Digital Electronics MUX- versus LUT-based Logic Blocks • There are two fundamental incarnation of the programmable logic blocks used to form the medium-grained architectures referenced as: MUX based and LUT based. Sept. 2005 EE37E Adv. Digital Electronics MUX-based AND a & b OR | c y y = (a & b) | c 0 0 b 1 MUX 0 MUX a y 1 0 x 1 MUX 1 0 MUX-based architectures have an advantage when it comes to implementing control logic along the lines of if ..else 0 0 1 1 MUX c Quicklogic supports MUX-based architectures (www.quicklogic.com) Sept. 2005 EE37E Adv. Digital Electronics LUT-based a b Truth table & | c y = (a & b) | !c y Programmed LUT a b c y SRAM cells 0 0 0 0 1 1 1 1 1 0 1 1 1 0 1 1 1 0 1 1 1 0 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 000 001 010 011 100 101 110 111 8:1 Multiplexer Required function abc LUT-architectures are the leaders in anything to do with arithmetic processing. Sept. 2005 EE37E Adv. Digital Electronics y LUT versus distributed RAM versus SR • The fact that the core of a LUT in a SRAMbased device comprises a number of RAM cells offers a number of interesting possibilities: – Configuration as lookup table – Configuration as small RAM block • This is referred to as distributed RAM because (a) the LUTs are strewn (distributed) across the surface of the chip, and (b) this differentiates it from larger chunks of block RAM. • Each LUT may be considered to be multifaceted. Sept. 2005 EE37E Adv. Digital Electronics 16-bit SR 16 x 1 RAM 4-input LUT A multifacetedLUT. Sept. 2005 EE37E Adv. Digital Electronics CLBs (Xilinx) and LABs (Altera) • The core building block in a modern FPGA from Xilinx is called a logic cell (LC). • An LC comprises: – a 4-inputLUTwhich can also acts as a 16 x 1 RAM or a 16-bit shift register, – a multiplexer, – and a register. • The equivalent core building block in an FPGA from Altera is called a logic element (LE). Sept. 2005 EE37E Adv. Digital Electronics 16-bit SR 16x1 RAM a b c d 4-input LUT y mux flip-flop q e clock clock enable set/reset A simplified view of a Xilinx LC Sept. 2005 EE37E Adv. Digital Electronics Altera’s Logic Element Each Logic Element (LE) contains the following: • A 16-bit SRAM lookup table (LUT) – this can implement an arbitrary 4- input logic function (as truth table). • Circuitry that form fast carry chain and fast cascade chain (see later). • A D-register that can be by-passed. • Various preset/reset logic for the register. Sept. 2005 EE37E Adv. Digital Electronics The next step up the hierarchy is what Xilinx calls a slice: Slice 16-bit SR Logic Cell (LC) 16x1 RAM 4-input LUT LUT 16-bit SR MUX REG Logic Cell (LC) 16x1 RAM 4-input LUT LUT Sept. 2005 MUX REG EE37E Adv. Digital Electronics Moving one more level up the hierarchy, we come to what Xilinx calls a configurable logic block (CLB) and what Altera refers to as a logic array block (LAB). Configurable logic block (CLB) CLB CLB CLB CLB Slice Slice Logic cell Logic cell Logic cell Logic cell Slice Slice Logic cell Logic cell Logic cell Logic cell A CLB Containing four slices (the number of slices depends on the FPGA family). Sept. 2005 EE37E Adv. Digital Electronics Programmable wiring • Organized into channels. – Many wires per channel. • Connections between wires made at programmable interconnection points. • Must choose: – Channels from source to destination. – Wires within the channels. Sept. 2005 EE37E Adv. Digital Electronics Programmable interconnection point Logic elements must be interconnected to implement complex machines. An SRAM-based FPGA uses SRAAM to hold the information used to program the interconnect. When the transistor’s gate is high, the transistor conducts and connects the two wires. D Q An interconnection point controlled by an SRAM cell Sept. 2005 EE37E Adv. Digital Electronics MOS Transistor Programmable wiring paths Sept. 2005 EE37E Adv. Digital Electronics Choosing a path LE LE Sept. 2005 EE37E Adv. Digital Electronics Routing problems • Global routing: – Which combination of channels? • Local routing: – Which wire in each channel? • Routing metrics: – Net length. – Delay. Sept. 2005 EE37E Adv. Digital Electronics I/O • Fundamental selection: input, output, threestate? • Additional features: – Register. – Voltage levels. – Slew rate. Sept. 2005 EE37E Adv. Digital Electronics Configuration • Must set control bits for: – LE. – Interconnect. – I/O blocks. • Usually configured off-line. – Separate burn-in step (antifuse). – At power-up (SRAM). Sept. 2005 EE37E Adv. Digital Electronics Configuration vs. programming • FPGA configuration: – Bits stay at the device they program. – A configuration bit controls a switch or a logic bit. Sept. 2005 • CPU programming: – Instructions are fetched from a memory. – Instructions select complex operations. add r1, r2 addIR r1, r2 memory CPU EE37E Adv. Digital Electronics Xilinx Primary products: FPGAs and the associated CAD software Programmable Logic Devices ISE Alliance and Foundation Series Design Software Main headquarters in San Jose, CA Fabless* Semiconductor and Software Company Sept. 2005 UMC (Taiwan) {*Xilinx acquired an equity stake in UMC in 1996} Seiko Epson (Japan) TSMC (Taiwan) EE37E Adv. Digital Electronics Xilinx FPGA Families • Old families – XC3000, XC4000, XC5200 – Old 0.5µm, 0.35µm and 0.25µm technology. Not recommended for modern designs. • High-performance families – Virtex (0.22µm) – Virtex-E, Virtex-EM (0.18µm) – Virtex-II, Virtex-II PRO (0.13µm) • Low Cost Family – – – – Spartan/XL – derived from XC4000 Spartan-II – derived from Virtex Spartan-IIE – derived from Virtex-E Spartan-3 Sept. 2005 EE37E Adv. Digital Electronics (Adapted from EE449,George Mason University) Sept. 2005 EE37E Adv. Digital Electronics Basic Spartan-II FPGA Block Diagram Sept. 2005 EE37E Adv. Digital Electronics CLB Structure COUT G4 G3 G2 G1 Look-Up Table O Carry & Control Logic COUT YB Y D S Q CK EC Look-Up Table O R F5IN BY SR F4 F3 F2 F1 G4 G3 G2 G1 Carry & Control Logic YB Y D S Q CK EC R F5IN BY SR Look-Up Table O CIN CLK CE Carry & Control Logic XB X D S CK EC Q F4 F3 F2 F1 Look-Up Table O R SLICE CIN CLK CE Carry & Control Logic XB X D S Q CK EC R SLICE • Each slice has 2 LUT-FF pairs with associated carry logic • Two 3-state buffers (BUFT) associated with each CLB, accessible by all CLB outputs Sept. 2005 EE37E Adv. Digital Electronics CLB Slice Structure • Each slice contains two sets of the following: – Four-input LUT • Any 4-input logic function, • or 16-bit x 1 sync RAM • or 16-bit shift register – Carry & Control • Fast arithmetic logic • Multiplier logic • Multiplexer logic – Storage element • • • • Sept. 2005 Latch or flip-flop Set and reset True or inverted inputs Sync. or async. control EE37E Adv. Digital Electronics Distributed RAM RAM16X1S • CLB LUT configurable as Distributed RAM = LUT – A LUT equals 16x1 RAM – Implements Single and DualPorts – Cascade LUTs to increase RAM size • Synchronous write • Synchronous/Asynchronous read D WE WCLK A0 A1 A2 A3 O RAM32X1S D WE WCLK A0 A1 A2 A3 A4 LUT = – Accompanying flip-flops used for synchronous read LUT or O RAM16X2S D0 D1 WE WCLK A0 A1 A2 A3 O0 O1 or RAM16X1D D WE WCLK A0 SPO A1 A2 A3 DPRA0 DPO DPRA1 DPRA2 DPRA3 Sept. 2005 EE37E Adv. Digital Electronics Shift Register • Each LUT can be configured as shift register LUT – Serial in, serial out • Dynamically addressable delay up to 16 cycles • For programmable pipeline • Cascade for greater cycle delays • Use CLB flip-flops to add depth IN CE CLK LUT = DEPTH[3:0] Sept. 2005 EE37E Adv. Digital Electronics D CE Q D CE Q D CE Q D CE Q OUT Shift Register 12 Cycles 64 Operation A Operation B 4 Cycles 8 Cycles 64 Operation C 3 Cycles • Register-rich FPGA 3 Cycles 9-Cycle imbalance – Allows for addition of pipeline stages to increase throughput • Data paths must be balanced to keep desired functionality Sept. 2005 EE37E Adv. Digital Electronics Carry & Control Logic COUT YB G4 G3 G2 G1 Y Look-Up O Table D Carry & Control Logic S Q CK EC R F5IN BY SR XB F4 F3 F2 F1 X Look-Up Table O Carry & Control Logic CIN CLK CE Sept. 2005 S D Q CK EC R SLICE EE37E Adv. Digital Electronics Fast Carry Logic Each CLB contains separate logic and routing for the fast generation of sum & carry signals – Increases efficiency and performance of adders, subtractors, accumulators, comparators, and counters MSB Carry Logic Routing Carry logic is independent of normal logic and routing resources LSB Sept. 2005 EE37E Adv. Digital Electronics Accessing Carry Logic All major synthesis tools can infer carry logic for arithmetic functions – – – – Sept. 2005 Addition (SUM <= A + B) Subtraction (DIFF <= A - B) Comparators (if A < B then…) Counters (count <= count +1) EE37E Adv. Digital Electronics Block RAM Port B Port A Spartan-II True Dual-Port Block RAM Block RAM • Most efficient memory implementation – Dedicated blocks of memory • Ideal for most memory requirements – 4 to 14 memory blocks • 4096 bits per blocks – Use multiple blocks for larger memories • Builds both single and true dual-port RAMs Sept. 2005 EE37E Adv. Digital Electronics Spartan-II Block RAM Amounts Sept. 2005 EE37E Adv. Digital Electronics Block RAM Port Aspect Ratios 1 2 0 4 0 0 1k x 4 2k x 2 1023 4k x 1 1047 8 0 512 x 8 511 16 0 4095 255 Sept. 2005 256 x 16 EE37E Adv. Digital Electronics Basic I/O Block Structure D Q EC Three-State FF Enable Clock SR Three-State Control Set/Reset D Q EC Output FF Enable Output Path SR Direct Input FF Enable Registered Input Q D EC Input Path SR Sept. 2005 EE37E Adv. Digital Electronics IOB Functionality • IOB provides interface between the package pins and CLBs • Each IOB can work as uni- or bi-directional I/O • Outputs can be forced into High Impedance • Inputs and outputs can be registered – advised for high-performance I/O • Inputs can be delayed Sept. 2005 EE37E Adv. Digital Electronics Routing Resources CLB CLB PSM CLB PSM CLB PSM CLB Sept. 2005 CLB CLB PSM CLB CLB EE37E Adv. Digital Electronics Programmable Switch Matrix Spartan-II FPGA Family Members Sept. 2005 EE37E Adv. Digital Electronics Sept. 2005 EE37E Adv. Digital Electronics Virtex-II 1.5V Architecture Multipliers 18 x 18 Block RAMs Multipliers 18 x 18 Block RAMs Multipliers 18 x 18 Block RAMs Multipliers 18 x 18 Configurable Logic Block Block RAMs EE37E Adv. Digital Electronics Sept. 2005 I /O Block Virtex-II 1.5V Device CLB Array Slices Maximum I/O BlockRAM (18kb) Multiplier Blocks Distributed RAM bits XC2V40 8x8 256 88 4 4 8,192 XC2V80 16x8 512 120 8 8 16,384 XC2V250 24x16 1,536 200 24 24 49,152 XC2V500 32x24 3,072 264 32 32 98,304 XC2V1000 40x32 5,120 432 40 40 163,840 XC2V1500 48x40 7,680 528 48 48 245,760 XC2V2000 56x48 10,752 624 56 56 344,064 XC2V3000 64x56 14,336 720 96 96 458,752 XC2V4000 80x72 23,040 912 120 120 737,280 XC2V6000 96x88 33,792 1,104 144 144 1,081,344 XC2V8000 112x104 46,592 1,108 168 168 1,490,944 Sept. 2005 EE37E Adv. Digital Electronics Virtex-II Block SelectRAM • Virtex-II BRAM is 18 kbits WEA – Additional “parity” bits available in selected configurations ENA SSRA CLKA DOA[# : 0] DOPA[# : 0] ADDRA[# : 0] DIA[# : 0] DIPA[# : 0] Width Depth Address Data Parity 1 16,386 [13:0] [0] N/A WEB 2 8,192 [12:0] [1:0] N/A ENB RSTB 4 4,096 [11:0] [3:0] N/A CLKB ADDRB[# : 0] 9 2,048 [10:0] [7:0] [0] DIB[# : 0] DIPA[# : 0] 18 1,024 [9:0] [15:0] [1:0] 36 512 [8:0] [31:0] [3:0] Sept. 2005 EE37E Adv. Digital Electronics DOB[# : 0] DOPB[# : 0] FPGA Nomenclature Sept. 2005 EE37E Adv. Digital Electronics Design Methods and Tools Sept. 2005 EE37E Adv. Digital Electronics Design process (1) Design and implement a simple unit permitting to speed up encryption with RC5-similar cipher with fixed key set on 8031 microcontroller. Unlike in the experiment 5, this time your unit has to be able to perform an encryption algorithm by itself, executing 32 rounds….. Specification (Lab Experiments) VHDL description (Your Source Files) Functional simulation Library IEEE; use ieee.std_logic_1164.all; use ieee.std_logic_unsigned.all; entity RC5_core is port( clock, reset, encr_decr: in std_logic; data_input: in std_logic_vector(31 downto 0); data_output: out std_logic_vector(31 downto 0); out_full: in std_logic; key_input: in std_logic_vector(31 downto 0); key_read: out std_logic; ); end AES_core; Synthesis Sept. 2005 EE37E Adv. Digital Electronics Post-synthesis simulation Design process (2) Implementation Timing simulation Configuration On chip testing Sept. 2005 EE37E Adv. Digital Electronics Design Process control from Active-HDL Sept. 2005 EE37E Adv. Digital Electronics Simulation Tools Many others… Sept. 2005 EE37E Adv. Digital Electronics Sept. 2005 EE37E Adv. Digital Electronics Sept. 2005 EE37E Adv. Digital Electronics Synthesis Tools … and others Sept. 2005 EE37E Adv. Digital Electronics Logic Synthesis VHDL description architecture MLU_DATAFLOW of MLU is signal A1:STD_LOGIC; signal B1:STD_LOGIC; signal Y1:STD_LOGIC; signal MUX_0, MUX_1, MUX_2, MUX_3: STD_LOGIC; begin A1<=A when (NEG_A='0') else not A; B1<=B when (NEG_B='0') else not B; Y<=Y1 when (NEG_Y='0') else not Y1; MUX_0<=A1 and B1; MUX_1<=A1 or B1; MUX_2<=A1 xor B1; MUX_3<=A1 xnor B1; with (L1 & L0) select Y1<=MUX_0 when "00", MUX_1 when "01", MUX_2 when "10", MUX_3 when others; end MLU_DATAFLOW; Sept. 2005 EE37E Adv. Digital Electronics Circuit netlist Features of synthesis tools • Interpret RTL code • Produce synthesized circuit netlist in a standard EDIF format • Give preliminary performance estimates • Some can display circuit schematics corresponding to EDIF netlist Sept. 2005 EE37E Adv. Digital Electronics Implementation • After synthesis the entire implementation process is performed by FPGA vendor tools Sept. 2005 EE37E Adv. Digital Electronics Sept. 2005 EE37E Adv. Digital Electronics Translation Synthesis Circuit netlist Electronic Design Interchange Format EDIF Timing Constraints Constraint Editor Native Constraint File NCF UCF User Constraint File Translation NGD Sept. 2005 Native Generic Database file EE37E Adv. Digital Electronics Sample UCF File • • • • • • • • • • • • • • • • • • • • • • # # Constraints generated by Synplify Pro 7.3.3, Build 039R # # Period Constraints #Begin clock constraints #End clock constraints # Output Constraints # Input Constraints # Location Constraints # End of generated constraints NET "clock" LOC = "P88"; NET "control(0)" LOC = "P50"; NET "control(1)" LOC = "P48"; NET "control(2)" LOC = "P42"; NET "reset" LOC = "P93"; NET "segments(0)" LOC = "P67"; NET "segments(1)" LOC = "P39"; NET "segments(2)" LOC = "P62"; NET "segments(3)" LOC = "P60"; NET "segments(4)" LOC = "P46"; NET "segments(5)" LOC = "P57"; NET "segments(6)" LOC = "P49"; Sept. 2005 EE37E Adv. Digital Electronics Pin Assignment P93 FPGA P39 P42 P46 CLOCK CONTROL(0) CONTROL(1) CONTROL(2) RESET Lab P88 SEGMENTS(0) SEGMENTS(1) SEGMENTS(2) SEGMENTS(3) SEGMENTS(4) SEGMENTS(5) SEGMENTS(6) P67 P62 P60 P48 P49 P50 P57 Sept. 2005 EE37E Adv. Digital Electronics Parallel Port Interface Sept. 2005 EE37E Adv. Digital Electronics Constraints Editor Sept. 2005 EE37E Adv. Digital Electronics Circuit netlist Sept. 2005 EE37E Adv. Digital Electronics Mapping LUT4 LUT1 FF1 LUT5 LUT2 FF2 LUT3 Sept. 2005 EE37E Adv. Digital Electronics Placing FPGA CLB SLICES Sept. 2005 EE37E Adv. Digital Electronics FPGA Routing Programmable Connections Sept. 2005 EE37E Adv. Digital Electronics Static Timing Analyzer • Performs static analysis of the circuit performance • Reports critical paths with all sources of delays • Determines maximum clock frequency Sept. 2005 EE37E Adv. Digital Electronics Static Timing Analysis • Critical Path – The Longest Path From Outputs of Registers to Inputs of Registers tP logic in D Q D Q clk tCritical = tP FF + tP logic + tS FF Sept. 2005 EE37E Adv. Digital Electronics out Static Timing Analysis • Min. Clock Period = Length of The Critical Path • Max. Clock Frequency = 1 / Min. Clock Period Sept. 2005 EE37E Adv. Digital Electronics Configuration • Once a design is implemented, you must create a file that the FPGA can understand – This file is called a bit stream: a BIT file (.bit extension) • The BIT file can be downloaded directly to the FPGA, or can be converted into a PROM file which stores the programming information Sept. 2005 EE37E Adv. Digital Electronics Resources & Required Reading Spartan FPGA devices Xilinx Spartan-II 2.5V FPGA Family: Complete Data Sheet • Module 1: Introduction & Ordering Information • Module 2: Functional Description http://direct.xilinx.com/bvdocs/publications/ds001.pdf Sept. 2005 EE37E Adv. Digital Electronics Resources & Required Reading FPGA Tools Integrated Interfaces: Active-HDL with Synplify® http://www.aldec.com/Previews/active_synplify.htm Integrated Synthesis and Implementation http://www.aldec.com/Previews/synthesis_implementation.htm Sept. 2005 EE37E Adv. Digital Electronics