Evolution of Implementation Technologies Discrete devices: relays, transistors (1940s-50s) Discrete logic gates (1950s-60s) trend toward Integrated circuits (1960s-70s) of integration higher levels e.g. TTL packages: Data Book for 100’s of different parts Map your circuit to the Data Book parts Gate Arrays (IBM 1970s) “Custom” integrated circuit chips Design using a library (like TTL) Transistors are already on the chip Place and route software puts the chip together automatically + Large circuits on a chip + Automatic design tools (no tedious custom layout) - Only good if you want 1000’s of parts Xilinx FPGAs - 1 Gate Array Technology (IBM - 1970s) Simple logic gates Use transistors to implement combinational and sequential logic Interconnect Wires to connect inputs and outputs to logic blocks I/O blocks Special blocks at periphery for external connections Add wires to make connections Done when chip is fabed “mask-programmable” Construct any circuit Xilinx FPGAs - 2 Programmable Logic Disadvantages of the Data Book method Constrained to parts in the Data Book Parts are necessarily small and standard Need to stock many different parts Programmable logic Use a single chip (or a small number of chips) Program it for the circuit you want No reason for the circuit to be small Xilinx FPGAs - 3 Programmable Logic Technologies Fuse and anti-fuse Fuse makes or breaks link between two wires Typical connections are 50-300 ohm One-time programmable (testing before programming?) Very high density EPROM and EEPROM High power consumption Typical connections are 2K-4K ohm Fairly high density RAM-based Memory bit controls a switch that connects/disconnects two wires Typical connections are .5K-1K ohm Can be programmed and re-programmed in the circuit Xilinx FPGAs - 4 Low density Programmable Logic Program a connection Connect two wires Set a bit to 0 or 1 Regular structures for two-level logic (1960s-70s) All rely on two-level logic minimization PROM connections - permanent EPROM connections - erase with UV light EEPROM connections - erase electrically PROMs Program connections in the _____________ plane PLAs Program the connections in the ____________ plane PALs Program the connections in the ____________ plane Xilinx FPGAs - 5 Making Large Programmable Logic Circuits Alternative 1 : “CPLD” Put a lot of PLDS on a chip Add wires between them whose connections can be programmed Use fuse/EEPROM technology Alternative 2: “FPGA” Emulate gate array technology Hence Field Programmable Gate Array You need: A way to implement logic gates A way to connect them together Xilinx FPGAs - 6 Field-Programmable Gate Arrays PALs, PLAs = 10 - 100 Gate Equivalents Field Programmable Gate Arrays = FPGAs Altera MAX Family Actel Programmable Gate Array Xilinx Logical Cell Array 100 - 1000(s) of Gate Equivalents! Xilinx FPGAs - 7 Field-Programmable Gate Arrays Logic blocks To implement combinational and sequential logic Interconnect Wires to connect inputs and outputs to logic blocks I/O blocks Special logic blocks at periphery of device for external connections Key questions: How to make logic blocks programmable? How to connect the wires? After the chip has been fabbed Xilinx FPGAs - 8 Tradeoffs in FPGAs Logic block - how are functions implemented: fixed functions (manipulate inputs) or programmable? Support complex functions, need fewer blocks, but they are bigger so less of them on chip Support simple functions, need more blocks, but they are smaller so more of them on chip Interconnect How are logic blocks arranged? How many wires will be needed between them? Are wires evenly distributed across chip? Programmability slows wires down – are some wires specialized to long distances? How many inputs/outputs must be routed to/from each logic block? What utilization are we willing to accept? 50%? 20%? 90%? Xilinx FPGAs - 9 Altera EPLD (Erasable Programmable Logic Devices) Historical Perspective PALs: same technology as programmed once bipolar PROM EPLDs: CMOS erasable programmable ROM (EPROM) erased by UV light Altera building block = MACROCELL CLK 8 Product Term AND-OR Array + Programmable MUX's Clk MUX AND ARRAY Output MUX Q I/O Pin Inv ert Control F/B MUX Programmable polarity pad Xilinx FPGAs - 10 Seq. Logic Block Programmable feedback Altera EPLD Altera EPLDs contain 8 to 48 independently programmed macrocells Global CLK Personalized by EPROM bits: Clk MUX Synchronous Mode 1 Flipflop controlled by global clock signal OE/Local CLK Q EPROM Cell Global CLK Clk MUX local signal computes output enable Asynchronous Mode 1 OE/Local CLK Q EPROM Cell Flipflop controlled by locally generated clock signal + Seq Logic: could be D, T positive or negative edge triggered + product term to implement clear function Xilinx FPGAs - 11 Altera Multiple Array Matrix (MAX) AND-OR structures are relatively limited Cannot share signals/product terms among macrocells Logic Array Blocks (similar to macrocells) LAB A LAB H LAB B LAB C LAB G P I A LAB F LAB D LAB E Xilinx FPGAs - 12 Global Routing: Programmable Interconnect Array EPM5128: 8 Fixed Inputs 52 I/O Pins 8 LABs 16 Macrocells/LAB 32 Expanders/LAB LAB Architecture I/O Pad Macrocell ARRAY I N P U T S I/O Block I/O Pad P I A Expander Product Term ARRAY Macrocell P-Terms Expander P-Terms Expander Terms shared among all macrocells within the LAB Xilinx FPGAs - 13 P22V10 PAL INCREMENT 2904 1 0 0 FIRST FUSE NUMBERS 4 8 12 16 20 24 28 32 36 2948 2992 3036 3080 3124 3168 3212 3256 3300 3344 3388 3432 3476 3520 3564 3608 40 ASYNCHRONOUS RESET (TO ALL REGISTERS) 44 88 132 176 220 264 308 352 396 1 1 1 0 AR D Q 23 0 0 0 1 Q 5808 SP P R 1 0 5809 OUTPUT LOGIC MACROCEL L 18 P - 5818 R - 5819 6 440 3652 484 528 572 616 660 704 748 792 836 880 OUTPUT LOGIC MACROCELL 3696 3740 3784 3828 3872 3916 3960 4004 4048 4092 4136 4180 4224 4268 22 P - 5810 R - 5811 2 924 968 1012 1056 1100 1144 1188 1232 1276 1320 1364 1408 1452 P - 5820 R - 5821 4312 OUTPUT LOGIC MACROCELL 4356 4400 4444 4488 4532 4576 4620 4664 4708 4752 4796 4840 21 P - 5812 R - 5813 1496 OUTPUT LOGIC MACROCEL L 16 P - 5822 R - 5823 8 4884 OUTPUT LOGIC MACROCELL 4928 4972 5016 5060 5104 5148 5192 5236 5280 5324 20 P - 5814 R - 5815 4 OUTPUT LOGIC MACROCEL L 15 P - 5824 R - 5825 9 5368 2156 2200 2244 2288 2332 2376 2420 2464 2508 2552 2596 2640 2684 2728 2772 2816 2860 5 17 7 3 1540 1584 1628 1672 1716 1760 1804 1848 1892 1936 1980 2024 2068 2112 OUTPUT LOGIC MACROCEL L OUTPUT LOGIC MACROCELL 5412 5456 5500 5544 5588 5632 5676 5720 OUTPUT LOGIC MACROCEL L 14 P - 5826 R - 5827 19 10 P - 5816 R - 5817 SYNCHRONOUS PRESET (TO ALL REGISTERS) 5764 11 INCREMEN T 13 0 4 8 12 16 20 24 28 32 36 40 Supports large number of product terms per output Xilinxassociated FPGAs - 14 Latches and muxes with output pins + rows of interconnect Anti-fuse Technology: Program Once Use Anti-fuses to build up long wiring runs from short segments I/O Buffers, Programming and Test Logic I/O Buffers, Programming and Test Logic Logic Module I/O Buffers, Programming and Test Logic Rows of programmable logic building blocks I/O Buffers, Programming and Test Logic Actel Programmable Gate Arrays Wiring Tracks 8 input, single output combinational logic blocks Xilinx FPGAs - 15 FFs constructed from discrete cross coupled gates Actel Logic Module SOA S0 Basic Module is a Modified 4:1 Multiplexer S1 D0 2:1 MUX D1 2:1 MUX Y D2 2:1 MUX D3 R "0" SOB Example: Implementation of S-R Latch 2:1 MUX "0" 2:1 MUX "1" 2:1 MUX Xilinx FPGAs - 16 S Q Actel Interconnect Logic Module Horizontal Track Anti-fuse Vertical Track Interconnection Fabric Xilinx FPGAs - 17 Actel Routing Example Logic Module Input Logic Module Logic Module Output Input Jogs cross an anti-fuse minimize the # of jogs for speed critical circuits 2 - 3 hops for most interconnections Xilinx FPGAs - 18 Xilinx Programmable Gate Arrays CLB - Configurable Logic Block Three types of routing RAM-programmable can be reconfigured CLB CLB Wiring Channels CLB IOB direct general-purpose long lines of various lengths IOB IOB Can be used as memory IOB IOB Built-in fast carry logic IOB IOB IOB 5-input, 1 output function or 2 4-input, 1 output functions optional register on outputs Xilinx FPGAs - 19 CLB CLB Slew Rate Control CLB D Q Passive Pull-Up, Pull-Down Output Buffer Switch Matrix Vcc Pad Input Buffer CLB Q CLB Programmable Interconnect C1 C2 C3 C4 S/R Control DIN G Func. Gen. SD F' H' EC RD 1 F4 F3 F2 F1 H Func. Gen. F Func. Gen. Y G' H' S/R Control DIN SD F' D G' Q H' 1 H' K Q D G' F' EC RD X Delay I/O Blocks (IOBs) H1 DIN S/R EC G4 G3 G2 G1 D Configurable Logic Blocks (CLBs) The Xilinx 4000 CLB Xilinx FPGAs - 21 Two 4-input functions, registered output Xilinx FPGAs - 22 5-input function, combinational output Xilinx FPGAs - 23 CLB Used as RAM Xilinx FPGAs - 24 Fast Carry Logic Xilinx FPGAs - 25 Xilinx 4000 Interconnect Xilinx FPGAs - 26 Switch Matrix Xilinx FPGAs - 27 Xilinx 4000 Interconnect Details Xilinx FPGAs - 28 Global Signals - Clock, Reset, Control Xilinx FPGAs - 29 Xilinx 4000 IOB Xilinx FPGAs - 30 Xilinx FPGA Combinational Logic Examples Key: General functions are limited to 5 inputs (4 even better - 1/2 CLB) No limitation on function complexity Example 2-bit comparator: A B = C D and A B > C D implemented with 1 CLB (GT) F = A C' + A B D' + B C' D' (EQ) G = A'B'C'D'+ A'B C'D + A B'C D'+ A B C D Can implement some functions of > 5 input Xilinx FPGAs - 31 Xilinx FPGA Combinational Logic Examples N-input majority function: 1 whenever n/2 or more inputs are 1 N-input parity functions: 5 input/1 CLB; 2 levels yield 25 inputs! 5-input Majority Circuit 9 Input Parity Logic CLB CLB 7-input Majority Circuit CLB CLB CLB CLB Xilinx FPGAs - 32 Xilinx FPGA Adder Example Example 2-bit binary adder - inputs: A1, A0, B1, B0, CIN outputs: S0, S1, Cout A3 B3 A2 CLB Cout B2 A1 CLB S3 A3 B3 A2 B2 CLB A0 CLB S2 C2 B1 C1 Full Adder, 4 CLB delays to final carry out CLB S1 C0 S0 A1 B1 A0 B0 Cin S2 2 x Two-bit Adders (3 CLBs each) yields 2 CLBs to final carry out CLB S0 S3 Cout B0 Cin S1 C2 Xilinx FPGAs - 33 Computer-Aided Design Can't design FPGAs by hand Way too much logic to manage, hard to make changes Hardware description languages Specify functionality of logic at a high level Validation: high-level simulation to catch specification errors Verify pin-outs and connections to other system components Low-level to verify mapping and check performance Logic synthesis Process of compiling HDL program into logic gates and flip-flops Technology mapping Map the logic onto elements available in the implementation technology (LUTs for Xilinx FPGAs) Xilinx FPGAs - 34 CAD Tool Path (cont’d) Placement and routing Assign logic blocks to functions Make wiring connections Timing analysis - verify paths Determine delays as routed Look at critical paths and ways to improve Partitioning and constraining If design does not fit or is unroutable as placed split into multiple chips If design it too slow prioritize critical paths, fix placement of cells, etc. Few tools to help with these tasks exist today Generate programming files - bits to be loaded into chip for configuration Xilinx FPGAs - 35 Xilinx CAD Tools Verilog (or VHDL) use to specify logic at a high-level Combine with schematics, library components Synopsys Compiles Verilog to logic Maps logic to the FPGA cells Optimizes logic Xilinx APR - automatic place and route (simulated annealing) Provides controllability through constraints Handles global signals Xilinx Xdelay - measure delay properties of mapping and aid in iteration Xilinx XACT - design editor to view final mapping results Xilinx FPGAs - 36 Applications of FPGAs Implementation of random logic Easier changes at system-level (one device is modified) Can eliminate need for full-custom chips Prototyping Ensemble of gate arrays used to emulate a circuit to be manufactured Get more/better/faster debugging done than with simulation Reconfigurable hardware One hardware block used to implement more than one function Functions must be mutually-exclusive in time Can greatly reduce cost while enhancing flexibility RAM-based only option Special-purpose computation engines Hardware dedicated to solving one problem (or class of problems) Accelerators attached to general-purpose computers Xilinx FPGAs - 37 Implementation Strategies ROM-based Design Example: BCD to Excess 3 Serial Converter Conversion Process Bits are presented in bit serial fashion starting with the least significant bit Single input X, single output Z Xilinx FPGAs - 38 BCD 0000 0001 0010 0011 0100 0101 0110 0111 1000 1001 Excess 3 Code 0011 0100 0101 0110 0111 1000 1001 1010 1011 1100 Implementation Strategies Present State S0 S1 S2 S3 S4 S5 S6 Next X=0 S1 S3 S4 S5 S5 S0 S0 State X=1 S2 S4 S4 S5 S6 S0 -- Output X=0 X=1 1 0 1 0 0 1 0 1 1 0 0 1 1 -- State Transition Table Reset 0/1 S1 0/1 S0 S2 1/0 0/0, 1/1 0/1 0/0, 1/1 S5 Derived State Diagram S4 S3 0/0, 1/1 1/0 1/0 S6 0/1 Xilinx FPGAs - 39 Implementation Strategies ROM-based Implementation ROM Address X Q2 Q1 Q0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 1 1 0 1 0 0 0 1 0 1 0 1 1 0 0 1 1 1 1 0 0 0 1 0 0 1 1 0 1 0 1 0 1 1 1 1 0 0 1 1 0 1 1 1 1 0 1 1 1 1 ROM Outputs Z D2 D1 D0 1 0 0 1 1 0 1 1 0 1 0 0 0 1 0 1 1 1 0 1 0 0 0 0 1 0 0 0 X X X X 0 0 1 0 0 1 0 0 1 1 0 0 1 1 0 1 0 1 1 0 1 0 0 0 X X X X X X X X 1 CLK 1 0 X conv erter ROM Z X D2 Q2 D1 Q1 D0 Q0 1 0 9 13 12 5 4 CLK D C B A QD 175 QD QC QC QB QB 1 CLR \Reset 15 14 10 11 7 6 2 QA 3 QA Circuit Level Realization 74175 = 4 x positive edge triggered D FFs Truth Table/ROM I/Os In ROM-based designs, no need to consider state assignment Xilinx FPGAs - 40 Z Implementation Strategies LSB MSB Timing Behavior for input strings 0 0 0 0 (0) and 1 1 1 0 (7) 0000 1100 LSB 1110 LSB Xilinx FPGAs - 41 0101 Implementation Strategies PLA-based Design State Assignment with NOVA 0 1 0 1 0 1 0 1 0 1 0 1 0 S0 S0 S1 S1 S2 S2 S3 S3 S4 S4 S5 S5 S6 S1 S2 S3 S4 S4 S4 S5 S5 S5 S6 S0 S0 S0 S0 S1 S2 S3 S4 S5 S6 1 0 1 0 0 1 0 1 1 0 0 1 1 = = = = = = = 000 001 011 110 100 111 101 NOVA derived state assignment 9 product term implementation NOVA input file Xilinx FPGAs - 42 Implementation Strategies .i 4 .o 4 .ilb x q2 .ob d2 d1 .p 16 0 000 001 1 000 011 0 001 110 1 001 100 0 011 100 1 011 100 0 110 111 1 110 111 0 100 111 1 100 101 0 111 000 1 111 000 0 101 000 1 101 --0 010 --1 010 --.e Espresso Inputs q1 q0 d0 z 1 0 1 0 0 1 0 1 1 0 0 1 1 - Espresso Outputs Xilinx FPGAs - 43 .i 4 .o 4 .ilb x q2 q1 q0 .ob d2 d1 d0 z .p 9 0001 0100 10-0 0100 01-0 0100 1-1- 0001 -0-1 1000 0-0- 0001 -1-0 1000 --10 0100 ---0 0010 .e Implementation Strategies D2 = Q2 • Q0 + Q2 • Q0 D1 = X • Q2 • Q1 • Q0 + X • Q2 • Q0 + X • Q2 • Q0 + Q1 • Q0 D0 = Q0 Z = X• Q1 + X • Q1 1 CLK 9 1 0 X conv erter PLA X Q2 Q1 Q0 Z D2 D1 D0 1 0 13 12 5 4 CLK 175 D C B A 1 CLR \Reset Xilinx FPGAs - 44 15 QD 14 QD 10 QC 11 QC 7 QB 6 QB 2 QA 3 QA Z Implementation Strategies 10H8 PAL: 10 inputs, 8 outputs, 2 product terms per OR gate D1 = D11 + D12 D11 = X • Q2 • Q1 • Q0 + X • Q2 • Q0 D12 = X • Q2 • Q0 + Q1 • Q0 0 1 2 3 0. Q2 • Q0 1. Q2 • Q0 8. X • Q2 • Q1 • Q0 9. X • Q2 • Q0 16. X • Q2 • Q0 17. Q1 • Q0 24. D11 25. D12 32. Q0 33. not used 40. X • Q1 41. X • Q1 45 89 12 13 16 17 20 21 24 25 28 29 30 31 X 0 1 D2 8 9 D11 16 17 D12 24 25 D1 32 33 D0 40 41 Z Q2 Q1 Q0 D11 D12 Xilinx FPGAs - 45 Implementation Strategies 0 1 2 3 45 89 12 13 16 17 20 21 24 25 28 29 30 31 X 0 1 D2 8 9 D11 16 17 D12 24 25 D1 32 33 D0 40 41 Z Q2 Q1 Q0 D11 D12 Xilinx FPGAs - 46 Implementation Strategies Buffered Input or product term Registered PAL Architecture CLK OE Q2 • Q0 + Q2 • Q0 Q2 • Q0 Q2 • Q0 D2 Q2+ Q2+ DQ Q Q2+ Q2 • Q0 + Q2 • Q0 X Q2 Q2 Q0 Q0 Negative Logic Feedback D1 = X • Q2 • Q1 • Q0 + X • Q2 + X • Q0 + Q2 • Q0 + Q1 • Q0 D2 = Q2 • Q0 + Q2 • Q0 D0 = Q0 Z = X • Q1 + X • Q1 Xilinx FPGAs - 47 Implementation Strategies Programmable Output Polarity/XOR PALs CLK OE Buried Registers: decouple FF from the output pin DQ Q Advantage of XOR PALs: Parity and Arithmetic Operations AB AB AB AB AB AB AB AB C C C C C C C C D D D D D D D D A B C D AB AB CD CD Xilinx FPGAs - 48 A B C D Implementation Strategies Example of XOR PAL Example of Registered PAL INCREMEN T INCREMEN T 1 1 0 FIRST FUSE NUMBER 4 8 12 16 20 24 28 32 0 36 0 40 D Q 23 80 120 FIRST FUSE NUMBER S Q 2 4 8 12 16 20 24 28 0 32 64 96 128 160 192 224 19 2 160 200 D 240 280 Q 22 256 288 320 352 384 416 448 480 Q 3 320 360 D 400 440 Q 21 512 544 576 608 640 672 704 736 4 D 560 600 Q Q 18 Q 3 Q 480 520 D 20 D Q 17 Q 4 Q 768 800 832 864 896 928 960 992 5 640 680 D 720 760 Q 19 Q D Q 16 Q 5 6 800 840 D 880 920 Q 1024 1056 1088 1120 1152 1184 1216 1248 18 Q 7 D Q 15 Q 6 960 1000 D Q 17 1280 1312 1344 1376 1408 1440 1472 1504 1040 1080 Q 8 1120 1160 D Q 16 1536 1568 1600 1632 1664 1696 1728 1760 Q 9 D Q 15 1360 1400 Q 14 Q 7 1200 1240 1280 1320 D D Q 13 Q 8 Q 1792 1824 1856 1888 1920 1952 1984 2016 10 1440 1480 D Q 14 1520 1560 Q 13 11 INCREMEN T 0 4 8 12 16 20 24 28 NOTE: FUSE NUMBER = FIRST FUSE NUMBER + INCREMENT 32 36 Xilinx FPGAs - 49 9 12 11 Specifying PALs with ABEL P10H8 PAL module bcd2excess3 title 'BCD to Excess 3 Code Converter State Machine' u1 device 'p10h8'; "Input Pins X,Q2,Q1,Q0,D11i,D12i pin 1,2,3,4,5,6; "Output Pins D2,D11o,D12o,D1,D0,Z pin 19,18,17,16,15,14; INSTATE = [Q2, Q1, Q0]; S0 = [0, 0, 0]; S1 = [0, 0, 1]; S2 = [0, 1, 1]; S3 = [1, 1, 0]; S4 = [1, 0, 0]; S5 = [1, 1, 1]; S6 = [1, 0, 1]; Explicit equations for partitioned output functions equations D2 = (!Q2 & Q0) # (Q2 & !Q0); D1 = D11i # D12i; D11o = (!X & !Q2 & !Q1 & Q0) # (X & !Q2 & !Q0); D12o = (!X & Q2 & !Q0) # (Q1 & !Q0); D0 = !Q0; Z = (X & Q1) # (!X & !Q1); end bcd2excess3; Xilinx FPGAs - 50 Specifying PALs with ABEL P12H6 PAL module bcd2excess3 title 'BCD to Excess 3 Code Converter State Machine' u1 device 'p12h6'; "Input Pins X, Q2, Q1, Q0 pin 1, 2, 3, 4; "Output Pins D2, D1, D0, Z pin 17, 18, 16, 15; INSTATE = [Q2, Q1, S0in = [0, 0, 0]; S1in = [0, 0, 1]; S2in = [0, 1, 1]; S3in = [1, 1, 0]; S4in = [1, 0, 0]; S5in = [1, 1, 1]; S6in = [1, 0, 1]; Simpler equations Q0]; OUTSTATE = [D2, D1, D0]; S0out = [0, 0, 0]; S1out = [0, 0, 1]; S2out = [0, 1, 1]; S3out = [1, 1, 0]; S4out = [1, 0, 0]; S5out = [1, 1, 1]; S6out = [1, 0, 1]; equations D2 = (!Q2 & Q0) # (Q2 & !Q0); D1 = (!X & !Q2 & !Q1 & Q0) # (X & !Q2 & !Q0) # (!X & Q2 & !Q0) # (Q1 & !Q0); D0 = !Q0; Z = (X & Q1) # (!X & !Q1); end bcd2excess3; Xilinx FPGAs - 51 Specifying PALs with ABEL P16R4 PAL module bcd2excess3 title 'BCD to Excess 3 Code Converter' u1 device 'p16r4'; "Input Pins Clk, Reset, X, !OE "Output Pins D2, D1, D0, Z SREG S0 = S1 = S2 = S3 = S4 = S5 = S6 = = [D2, [0, 0, [0, 0, [0, 1, [1, 1, [1, 0, [1, 1, [1, 0, pin 1, 2, 3, 11; pin 14, 15, 16, 13; D1, D0]; 0]; 1]; 1]; 0]; 0]; 1]; 1]; state_diagram SREG state S0: if Reset then S0 else if X then S2 with Z = 0 else S1 with Z = 1 state S1: if Reset then S0 else if X then S4 with Z = 0 else S3 with Z = 1 state S2: if Reset then S0 else if X then S4 with Z = 1 else S4 with Z = 0 state S3: if Reset then S0 else if X then S5 with Z = 1 else S5 with Z = 0 state S4: if Reset then S0 else if X then S6 with Z = 0 else S5 with Z = 1 state S5: if Reset then S0 else if X then S0 with Z = 1 else S0 with Z = 0 state S6: if Reset then S0 else if !X then S0 with Z = 1 end bcd2excess3; Xilinx FPGAs - 52 FSM Design with Counters Synchronous Counters: CLR, LD, CNT 0 Four kinds of transitions for each state: (1) to State 0 (CLR) (2) to next state in sequence (CNT) (3) to arbitrary next state (LD) (4) loop in current state CLR CNT n+1 n no signals asserted LD m Careful state assignment is needed to reflect basic sequencing of the counter Xilinx FPGAs - 53 FSM Design with Counters Excess 3 Converter Revisited Reset 0/1 1 0/1 0 4 1/0 0/0, 1/1 5 2 0/0, 1/1 0/1 3 0/0, 1/1 1/0 1/0 6 0/1 Xilinx FPGAs - 54 Note the sequential nature of the state assignments FSM Design with Counters Excess 3 Converter Inputs/Current Next State State X Q2 Q1 Q0 Q2+ Q1+ 0 0 0 0 0 0 0 0 0 1 0 1 0 0 1 0 0 1 0 0 1 1 0 0 0 1 0 0 1 0 0 1 0 1 0 1 0 1 1 0 0 0 0 1 1 1 X X 1 0 0 0 1 0 1 0 0 1 1 0 1 0 1 0 0 1 1 0 1 1 0 0 1 1 0 0 1 0 1 1 0 1 1 1 1 1 1 0 X X 1 1 1 1 X X Outputs Q0+ 1 0 1 0 1 1 0 X 0 1 1 0 1 0 X X Z CLR LD 1 1 1 1 1 1 0 1 1 0 0 X 1 1 1 0 1 0 1 0 X X X X 0 1 0 0 1 0 1 1 1 1 0 X 0 1 1 1 1 1 X X X X X X EN 1 1 1 X 1 X X X X X 1 X 1 1 X X C X X X X X 0 X X 1 1 X X X X X X B X X X X X 1 X X 0 0 X X X X X X A X X X X X 0 X X 0 1 X X X X X X CLR signal dominates LD which dominates Count Xilinx FPGAs - 55 Implementing FSMs with Counters .i 5 Espresso Input File .i 5 .o 7 .o 7 .ilb res x q2 q1 q0 .ilb res x q2 q1 q0 .ob z clr ld en c b a .ob z clr ld en c b a .p 17 .p 10 1---- -0----0-001 0101101 00000 1111---0-01 1000000 00001 1111---11-0 1000000 00010 0111--0-0-0 0101100 00011 00----Excess 3 Converter -000- 1010000 00100 0111---0--0 0010000 00101 110-011 0-10- 0101011 00110 10------11- 1000000 00111 -------11-- 0010000 01000 010-100 -1-1- 1010000 01001 010-101 .e 01010 1111--Espresso Output File 01011 10----01100 1111--01101 0111--01110 ------01111 ------Xilinx FPGAs - 56 .e FSM Implementation with Counters CLK 1 0 1 0 7 P 10 163 T RCO15 2 CLK 6 D QD 11 5 C QC 12 4 B QB 13 3 A 14 QA 9 LOAD excess 3 PLA X Rese t X Q2 Q1 Q0 Z \CLR \L D EN C B A 1 CLR Excess 3 Converter Schematic Synchronous Output Register Xilinx FPGAs - 57 D Q C Q Z Implementation Strategies Xilinx LCA Architecture Implementing the BCD to Excess 3 FSM Q2+ = Q2 • Q0 + Q2 • Q0 Q1+ = X • Q2 • Q1 • Q0 + X • Q2 • Q0 + X • Q2 • Q0 + Q1 • Q0 Q0+ = Q0 Z = Z • Q1 + X • Q1 No function more complex than 4 variables 4 FFs implies 2 CLBs Synchronous Mealy Machine Global Reset to be used Place Q2+, Q0+ in once CLB Q1, Z in second CLB maximize use of direct & general purpose interconnections Xilinx FPGAs - 58 Implementing the BCD to Excess 3 FSM Clk Clk X CE CE A CE DI B X Q2 Q0 FG DI Q2 B C C Y K Q0 Q0 FG E K X Q2 Q1 Q0 X Q1 A X FG Y FG E D RES D RES CLB2 CLB1 Xilinx FPGAs - 59 Q1 Z Design Case Study Traffic Light Controller Decomposition into primitive subsystems • Controller FSM next state/output functions state register • Short time/long time interval counter • Car Sensor • Output Decoders and Traffic Lights Xilinx FPGAs - 60 Design Case Study Traffic Light Controller Block Diagram Reset Clk TS short time/ long time counter TL ST C (async) F controller fsm Reset Next State Output Logic Car Sensor C (sync) Clk 2 2 2 2 State Register Xilinx FPGAs - 61 Encoded Light Light Decoders Signals 3 3 H Design Case Study Subsystem Logic 0 + Cin Pres ent D C Q Light Decoders F0 2 A 3 B F1 1 G Y0 Y1 Y2 Y3 139a 4 5 6 7 1 \Res et H0 CLK Interval Timer CLK Reset H1 + 7 P 10 T 163 1 2 CLK RCO 5 6 D QD 11 5 C QC 12 4 B QB 13 3 A QA 14 9 LOAD CLR 1 CLR FPGAs - 62 Xilinx ST 0 0 HG HY HR + Car Detector 1 FG FY FR RQ \Pre sent 0 1 A Y0 1 4 B Y1 3 Y2 1 G Y3 5 139b TL TS 1 1 2 1 09 Design Case Study State Assignment: HG = 00, HY = 10, FG = 01, FY = 11 P1 = C TL Q1 + TS Q1 Q0 + C Q1 Q0 + TS Q1 Q0 P0 = TS Q1 Q0 + Q1 Q0 + TS Q1 Q0 ST = C TL Q1 + C Q1 Q0 + TS Q1 Q0 + TS Q1 Q0 HL[1] = TS Q1 Q0 + Q1 Q0 + TS Q1 Q0 HL[0] = TS Q1 Q0 + TS Q1 Q0 Next State Logic FL[1] = Q0 FL[0] = TS Q1 Q0 + TS Q1 Q0 PAL/PLA Implementation: 5 inputs, 7 outputs, 8 product terms PAL 22V10 -- 11 inputs, 10 prog. IOs, 8 to 14 prod terms per OR ROM Implementation: 32 word by 8-bit ROM (256 bits) Xilinxsize FPGAs - 63 Reset may double ROM Design Case Study Counter-based Implementation HG TL•C / ST HY TS / ST FG TL+C / ST FY TS / ST ST = Count TS TL \C TL C 1 GA 3 A3 4 A2 5 A1 6 A0 13 12 11 10 153 YA 7 B3 B2 B1 B0 15 GB 2 x 4:1 MUX YB 9 S1 SO 2 14 ST + 7 10 P 163 T 15 2 CLK RCO 6 5 4 3 D C B A QD QC QB QA 11 12 13 14 9 LOAD \Reset 1 CLR TTL Implementation with MUX and Counter Can we reduce package count by using an 8:1 MUX? Xilinx FPGAs - 64 Q1 Q0 Design Case Study Counter-based Implementation Dispense with direct output functions for the traffic lights Why not simply decode from the current state? HG HY HR 1 1 G Q1 Q0 3B 2A Y3 Y2 Y1 Y0 0 7 6 5 4 139a ST is a Synchronous Mealy Output Light Controllers are Moore Outputs Xilinx FPGAs - 65 0 FG FY FR 0 0 1 Design Case Study LCA-Based Implementation Discrete Gate Method: None of the functions exceed 5 variables P1, ST are 5 variable (1 CLB each) P0, HL1, HL0, FL0 are 3 variable (1/2 CLB each) FL1 is 1 variable (1/2 CLB) 4 1/2 CLBs total! Xilinx FPGAs - 66 Design Case Study TL C TS TS TS LCA-Based Implementation Q0 Placement of functions selected to maximize the use of direct connections DI CE A B X C F0 K Y E D R TL DI CE A B X C K Y E D R TS C Q0 Xilinx FPGAs - 67 Q0 DI CE A B X C Q1 K Y E D R Q0 Q1 TS DI CE A B X C K Y E D R C Q1 TL F1 Q1 Q0 TS Q1 DI CE A B X C ST K Y E D R DI CE A B X C K Y E D R H1 H0 Design Case Study LCA-Based Implementation Counter/Multiplexer Method: 4:1 MUX, 2 Bit Upcounter MUX: six variables (4 data, 2 control) but this is the kind of 6 variable function that can be implemented in 1 CLB! 2nd CLB to implement TL • C and TL + C' But note that ST/Cnt is really a function of TL, C, TS, Q1, Q0 1 CLB to implement this function of 5 variables! 2 Bit Counter: 2 functions of 3 variables (2 bit state + count) Also implemented in one CLB Traffic light decoders: functions of 2 variables (Q1, Q0) 2 per CLB = 3 CLB for the six lights Total count = 5 CLBs Xilinx FPGAs - 68