FPLDS Introduction What is Programmable Logic? Circa 1970 -- TTL Design Design a logic circuit that implements the function 74HC04 Y A BC 74HC32 74HC08 Design is done “by hand” using TTL DataBook. Verification is performed using a “breadboard.” TTL Design A B C +5V 0 1 2 3 4 1 2 3 4 a1 Vcc1 a2 b2 a3 b3 a4 a1 0 b1 b4 74HC04 b1 a2 b2 a3 b3 a4 GND 0 b4 5 1 6 2 7 3 8 4 5 1 6 2 7 3 8 4 a1 Vcc1 a2 b2 a3 b3 a4 a1 0 b1 b4 74HC08 b1 a2 b2 a3 b3 a4 GND 0 b4 5 1 6 2 7 3 8 4 5 1 6 2 7 3 8 4 a1 Vcc1 b1 a2 b2 a3 b3 a4 a1 b4 74HC32 b1 a2 b2 a3 b3 a4 GND b4 5 6 7 8 5 6 7 8 0 Y We need three separate Dual Inline Package (DIP) TTL packages to implement this design in hardware. Note, because of the multiple components this design consumes power, board space is costly, hard to debug and manufacture. FPLD Design In field programmable logic device (FPLD) design (FPLD), we use a computer aided design (CAD) software tool (e.g. QUARTUS II) to perform “design entry.” We can also use the same package for “design verification” and also to “download” the “design program” into hardware (i.e. the PLD). Our design now becomes: +5V A B C Y 0 1 2 3 4 1 2 3 4 a1 Vcc1 b1 a2 b2 a3 b3 a4 a1 b4 EPM7032 b1 a2 b2 a3 b3 a4 GND 0 b4 5 6 7 8 5 6 7 8 This single chip design requires Less power, less board space, should cost less on a per gate basis, is easier to debug (in software), and be easier to manufacture. Also, Intellectual Property (IP) can be protected and exploited using a FPLD. Benefits of FPLD Design 1. Increased system performance (Speed) This is due to the reduced interconnect distances between gates. In a TTL design we have large RC delays as we propagate signals from one chip to another. In FPLD designs, this distances are in the um range. A B C 0 1 2 3 4 1 2 Large Delay on this net +5V 3 4 a1 Vcc1 0 b1 a2 b2 a3 b3 a4 b4 a1 74HC04 b1 a2 b2 a3 b3 a4 GND 0 b4 5 1 6 2 7 3 8 4 5 1 6 2 7 3 8 4 a1 Vcc1 0 b1 a2 b2 a3 b3 a4 b4 a1 74HC08 b1 a2 b2 a3 b3 a4 GND 0 b4 5 1 6 2 7 3 8 4 5 1 6 2 7 3 8 4 a1 Vcc1 b1 a2 b2 a3 b3 a4 b4 a1 74HC32 b1 a2 b2 a3 b3 a4 GND b4 5 6 7 8 5 6 7 8 0 Y FPLD Design The same net is now internal to the FPLD A B C Y +5V 0 1 2 3 4 1 2 3 4 a1 Vcc1 b1 a2 b2 a3 b3 a4 b4 a1 EPM7032 b1 a2 b2 a3 b3 a4 GND 0 b4 5 6 7 8 5 6 7 8 Benefits of FPLD Design 2. Increased Gate Density More logic gates on each FPLD implies that you can have more functionality per unit area of board space. A single FPLDs/FPGAs can hold the equivalent of over 1 million TTL logic gates. 3. Reduced Development Time CAD tools significantly reduce the development time for new designs. This not only cuts down the “time to market,” but also allows reduces the size of the team needed to complete a design. Benefits of FPLD Design 4. Rapid Hardware Prototyping Hardware prototyping is greatly simplified using FPLDs because it is relatively easy to change the design. One major concern however is I/O pin assignments. 5. Reduced “Time to Market” Since FPLDs are already “complete,” there is no need to wait for fabrication. Benefits of FPLD Design 6. Future Modifications Since FPLDs can be “reconfigured” in the field. It is possible to have the end user perform system “upgrades.” 7. Reduced Inventory Risk The same type of FPLD can be used in multiple designs, so the inventory risk is significantly reduced. Benefits of FPLD Design 8. Reduced Development Costs The development costs for FPLDs tend to be lower than Application Specific Integrated Circuits (ASICs); however, the per unit cost of a FPLD is higher than an ASIC for large volumes. Shorthand Notation AA BB CC DD EE Y Programmable Interconnect at each node. Blue dot means a connection has been made. A B C D E Y Shorthand Notation (Cont) A A A A A A Programmable Logic Array (PAL) AND-OR Architecture OR Plane (Fixed) AND Plane (Programmable) A B Inputs C Z1 Z2 Outputs Z3 PAL Example AA Fixed Interconnect Programmable Interconnect BB P1 AND Plane (Prog) P2 P3 P R O D U C T T E R M S OR Plane (Fixed) SUM TERMS A B Z1 Z2 PAL Example We can use a PAL to implement Sum-of-Products (SOP) Logic Example: Use a PAL to design a logic circuit which implements Z1 AB AB A B Z 2 AB AB Note: In our PAL, we have the “fixed” logic Z1 P1 P2 ; Z 2 P2 P3 PAL Example Let’s “program” the AND Array (or AND plane), so that P1 AB ; P2 AB ; P3 AB Since, Z1 P1 P2 ; Z 2 P2 P3 We find, Z1 AB AB A B Z 2 AB AB PAL Example AA Fixed Interconnect Programmable Interconnect BB P1 AND Plane (Prog) P2 P3 Programmable Interconnects OR Plane (Fixed) A B Z1 AB AB Z2 AB AB AB __ AB AB PAL Example We can use the same type of device to “program” Z1 AB AB A B Z 2 AB AB A Let P1 AB ; P2 AB ; P3 AB PAL Example AA Fixed Interconnect Programmable Interconnect BB Programmable Interconnects OR Plane (Fixed) A B Z1 Z2 P1 AB P2 AB P3 AB PAL Example However, what if, I want Z1 AB AB A B Z 2 AB AB A B Let P1 AB ; P2 AB ; P3 AB What about AB term? I’ve run out of pterms!!! Need to pick a bigger PAL!!! Survey of FPLDs PALs OR Plane (Fixed) AND Plane (Programmable) Ex: 16V8 Circa: 1978 A B Inputs C Z1 Z2 Outputs Z3 Survey of FPLDs Simple PLDs Add programmable I/O “macrocells” to PAL architecture. I/O Macrocells contain registers. OR Plane (Fixed) AND Plane (Programmable) I/O Macrocells Ex: 22V10 Circa: 1980 A B I/O Macrocells C Z1 Clock Z2 Z3 Survey of FPLDs Complex PLDs “Mini” PALs, programmable with registers called Logic Array Blocks (LABS) are interconnected using a Programmable Interconnect Array (PIA). Dedicated Inputs Altera’s Max-5032 Max-7032 Circa: 1985 LAB LAB P I A I/O LAB LAB=Logic Array Block (prog) I/O LAB PIA = Prog. Interconnect Array Survey of FPLDs Field Programmable Gate Arrays (FPGAs) An array of “small” blocks of programmable logic within an Vendors Xilinx (Actel) Circa: 1990 Routing Channels LC=Logic Cell LC LC LC LC LC LC LC LC LC LC LC LC LC LC LC LC I/O I/O I/O I/O I/O = Input/Output Cell Programmable Interconnects Connects LCs to routing channels Survey of FPLDs System-on Programmable Chip (SOPC) Combines Programmable Logic with embedded Static Random Access Memory (SRAM) on the same Integrated Circuit (IC). Circa: 2000 to Now!! Programmable Logic (FPLD/FPGA) SRAM SOPC Altera and Xilinx Programming Elements - PE PEs are used to physically “program” the interconnects. B A Vgate Field Effect Transistor (FET) FET acts like a “switch” If Vgate is ONE, switch is closed, connecting A and B otherwise A and B are isolated. Programming Elements - PE Example B B Closed A Open A ONE Vgate=One Switch Closed Open Ckt Vgate=Zero Switch Open Programming Elements - PE So, we’ll have one FET at every programmable Interconnect, but we need a method or technique to “program” VGATE to be ONE or ZERO. Before, we look at our options, some definitions Programming Elements - PE Two Types: 1. Volatile “Program” is lost when power is removed 2. Non-volatile “Program” is retained with power is removed. Two Classes: 1. Re-programmable PE can be “erased” and “re-programmed” 2. One-time-programmable (OTP) PE can only be programmed “one” time. (not really used anymore) Programming Technologies EPROM – Erasable Programmable Read Only Memory Reprogrammable and non-volatile It is possible to physically program an EPROM cell to always be ONE when power is applied. Also, we can use ultraviolet (UV) light to reset or “erase” the EPROM cell back to ZERO. Ex: Max-5000 Programming Technologies EPROM B A EPROM Cell UV To erase We can, therefore, erase all the cells of the EPROM and then program the PEs that we want to be ONEs. Programming Technologies EEPROM – (E2PROM) Electrically Erasable Programmable Read Only Memory Reprogrammable and non-volatile Similar to an EPROM except cell can be “erased” electrically. Ex: MAX-7000 family Programming Technologies SRAM Static Random Access Memory Volatile and Reprogrammable (electrically) SRAM Cell To Vgate Store the value of VGATE within a SRAM cell. We lose the program whenever the power is removed. Therefore, we’ll need the ability to “reload” the design upon power-up. SRAM CELL Write Write 0 Write 1 BL BL 1 0 1 WL 1 0 To VGATE 0 0 1 WL 1 WL=1, turns “ON” FET, connecting BL to the cell To VGATE 1 SRAM CELL Read Read BL X data data To VGATE data WL 0 WL=0, turns “OFF” FET, isolating data from the cell. However, Due to “positive” feedback, data is retained in the memory cell until power is removed Programming Technologies SRAM B A SRAM Cell Use a SRAM cell to store VGATE. Lose “program” when power is removed. Programming Technologies Anti-Fuse Non-volatile and OTP Normally, anti-fuse behaves like an “open” circuit, however you can “destroy” the fuse electrically so that it behaves like a short circuit. B Anti-fuse . A The antifuse is very small compared to the other PEs. Summary FPLD Benefits 1. 2. 3. 4. 5. 6. 7. 8. Increased Performance Increased Gate Density Reduced Development Time Rapid Hardware Prototyping Reduced “Time to Market” Future Modifications Reduced Inventory Risks Reduced Development Costs Summary FPLD Types 1.PALS 2.Simple PLDs 3.Complex PLDs (FPLDs) 4.FPGAs 5.SOPC Summary Programming Elements Types: Classes: 1. Volatile 2. Non-Volatile 1. Reprogrammable 2. OTP Technologies: 1. EPROM (Obsolete) 2. EEPROM 3. Anti-Fuse 4. SRAM Summary Programming Elements Technology SRAM EEPROM EPROM Antifuse Volatile yes no no no Reprogrammable yes-In Circuit yes-In Circuit yes-Out circuit no Relative Size Very Large Large Small Very Small Relative Cost Low High Very High High Relative Importance Strong Strong Weak Moderate Generic FPLD Design At a minimum, every FPLD needs 1. Programmable Logic (L) 2. Programmable Interconnects (I) 3. Input/Output Logic (I/O) FPLD L I I/O Generic FPLD Design 1/3 Logic, 1/3 Interconnects, 1/3 Input/Output FPLD L I I/O Do I have enough logic? Generic FPLD Design 1/2 Logic, 1/4 Interconnects, 1/4 Input/Output FPLD L I I/O Logic is good, but now do I have enough interconnects for my logic? Generic FPLD Design 1/4 Logic, 1/2 Interconnects, 1/4 Input/Output FPLD L I I/O Ok, I have enough interconnects for my logic. Do I have enough I/O? Generic FPLD Design Different vendors use different approaches FPLD L I I/O Let’s examine Altera MAX and Altera Flex!!! Altera Max-7000 Altera MAX-7000 Device Family •EEPROM used as PE •Non-volatile and Re-programmable Definitions Useable gates Number of equivalent TTL NAND gates Macrocells Number of unique mini PALs Maximum user I/O Pins Tpd = Input to non-registered output Tsu = External global clock register setup time Tfsu = External fast input register setup time Tco1 = Global clock to output delay Fcnt (MHz) = Maximum 16 bit up/down counter freq MAX-7000S Block Diagram Block Diagram Notes • • • • • • Global clocks Global reset Global Output Enable Global Inputs PIA - Programmable Interconnect Array LABs – Logic Array Blocks •Macrocells are contained in LABs MAX-7000 Device Features BST = Built-in Self Test - ISP – In-system programmability MAX-7000 Features (cont) MAX-7000S Macrocell Macrocell Notes Macrocell is customizable Local and Global Clocks Global clock used if no logic added to clock line Register bypass for combinational logic designs One programmable register per MC D, T, JK or SR operation Enable function Preset and reset functions Sharable expanders allow extra pterm to be “shared” with another macrocell Sharable Expanders Parallel Expanders Similar to sharable expanders Up to 20 pterms in one MC Altera Flex 10K Altera Flex 10KE Device Features SRAM as PEs, reprogrammable and volatile Acronyms SOPC – System on a Programmable Chip LE – Logic Elements Core logic block LAB – Logic Array Block EAB – Embedded Array Block On Chip SRAM Features (cont) Device Performance Metrics Flex 10KE Block Diagram Block Diagram Notes Still have LABs, but MC replaced with LE Each LAB has eight (8) LEs Embedded memory stored in EABs Asynchronous and Synchronous modes Flex 10KE Logic Element Logic Element (LE) Notes LUT – Look Up Table has replaced MC 4 inputs: 16 x 1 SRAM Array Register bypass for combinational logic designs Register packing LUT and register can be used for different functions One programmable register per LE D, T, JK or SR operation Enable function Preset and reset functions High-speed carry and cascade chains Look Up Tables (LUTS) Why use a LUT for logic implementation? Example 2 bit multiplier Y = AxB A1 A0 B1 B0 A1B0 A0B0 A1B1 A0B1 0 S3 S2 S1 Where S0 = A0B0 S1 = A1B0 + A0B1 S2 = A1B1 +Carry_S1 S3 = Carry_S2 S0 Sum Example Example 2 bit multiplier Y = 11x11 + 1 1 0 1 1 1 1 0 3x3 = 9 1 1 1 0 1 Let’s implement this using Logic Gates Full Adder XOR 12 13 A B INPUT VCC INPUT VCC XOR 10 OUTPUT 15 sum 11 14 Cin INPUT VCC AND2 16 OR2 OUTPUT 19 AND2 17 Symbol 18 cout 2x2 Bit Multiplier 1 3 1 6 A0 INPUT VCC AND2 7 B0 INPUT VCC 1 AND2 3 1 8 A1 INPUT VCC 9 B1 INPUT VCC S0 1 OUTPUT 12 S1 0 fulladder 3 AND2 1 OUTPUT 10 2 1 a sum b cout cin 9 1 1 4 GND 11 AND2 5 1 fulladder a sum b cout cin 13 GND 16 OUTPUT 14 S2 OUTPUT 15 S3 0 1 Same Design using a LUT LUT = Look Up Table A1 S3 A0 S2 B1 B0 LUT 16 x 4 S1 S0 Outputs Inputs LUT contains the “Truth table” of the design LUT Design A[1..0] B[1..0] A1 A0 B1 B0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 S3 S2 S1 S0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 S[3..0] LUT Example Let A=11 and B=11 A[1..0]=11 B[1..0]=11 A1 A0 B1 B0 0 0 0 0 0 0 0 0 1 1 1 1 1 1 1 1 0 0 0 0 1 1 1 1 0 0 0 0 1 1 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 0 1 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 0 1 S3 S2 S1 S0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 1 1 0 0 1 0 0 0 0 0 0 0 1 1 0 1 0 1 0 1 1 0 0 0 0 0 0 1 0 1 0 0 0 0 0 1 0 1 S[3..0]=1001 2x2 Bit Multiplier Delay Calculation 6 A0 INPUT VCC AND2 7 B0 INPUT VCC AND2 8 A1 INPUT VCC B1 INPUT VCC S0 OUTPUT 12 S1 fulladder 3 AND2 9 OUTPUT 10 2 a sum b cout cin 1 4 GND 11 AND2 fulladder 5 a sum b cout cin 13 GND 16 Worst Case Delay = tgate + 2*tfa OUTPUT 14 S2 OUTPUT 15 S3 LUT Delay Calculation LUT = Look Up Table A1 S3 A0 S2 B1 B0 LUT 16 x 4 SRAM Tdelay = t_LUT_access S1 S0 . Memory Arrays Memory Arrays We can combine memory cells into memory arrays. Memory arrays used to store State information Data information Startup information Linear Parameterized Modules LPMs Quartus allows access to EABs through the use of Linear Parameterized Modules (LPMs). Let’s look at various memory arrays available through Quartus. Memory Arrays Read-Only Memory (ROM) Single Port SRAM Dual Port SRAM First-in First-out (FIFO) SRAM Last-in First-out (LIFO) “Stack” Content Addressable Memory (CAM) ROMs Read Only Memory (ROM) A ROM has “pre-loaded” data that is not intended to change overtime. Traditionally, ROMs are used to store program code in a computer system. However, we can also use a ROM as a giant look-up table (LUT) to perform functional translations on our data. There are two basic access modes we will need to examine: 1. Asynchronous mode 2. Synchronous mode LPM_ROM Implementation Asynchronous mode X Y Timing Diagram Read Mode Asynchronous Access Address (x) tacc Q (y) Note: Apply X to address bus, tacc seconds later the value of Y appears on Q. Tacc = data access time LPM_ROM Implementation Synchronous access X Y clock We can add a clock to make the ROM access synchronous. Timing Diagram Read Mode Synchronous Access Address (x) tsu tsu tacc tacc Q (y) Clock Apply X to address bus, tsu (setup) seconds before the clock edge. The value of Y will appear on Q, tacc seconds after the clock edge. ROM Example Example: Let’s Implement the function y=2x + 5 We could design a circuit to perform this calculation, but it may be more efficient to “pre-load” a ROM with the answer to every possible input value of x. Note: we’ll need a 2n x W ROM where n is the number of bits in x and W is the number of bits needed to represent the maximum value of y. ROM Example: Y=2x+5 Let x = 3 bits. Range of x is 0 to 7. Max Y = 2(7)+5 =19 W=5 bits So, we will need a 3x5 bit ROM ROM Table X 0 1 2 3 4 5 6 7 Y 5 7 9 11 13 15 17 19 LPM_ROM QUARTUS II Design SIMULATION Single Port SRAM Single Port SRAM A single port SRAM allows data to be read and written into the memory array by the user. In general a single port has the following input/output lines: 1. Data Input bus 2. Data Output bus 3. R/W Control line 4. Address bus The control line is needed to determine which access mode we will need. This can be read mode or write mode. Single port means we use a single port to interface to the RAM Static Random Access Memory (SRAM) Array Symbol Data Input Bus LPM_RAM_DQ Din[n-1..0] data[] address[] Add[n-1..0] q[] Address Bus Dout[n-1..0] R/W Read/Write Control Line Data Output Bus we 1 1 R/W = 1 : Read mode R/W = 0 : Write mode Let’s look at an internal block diagram of the SRAM Block Diagram of SRAM 4x4 B2 B1 B0 MC MC B3 MC MC W0 ADD[1..0] 2 A D D MC W[3..0] MC MC MC W1 D E C MC MC MC MC W2 MC MC MC MC W3 B[3..0] R/W Dout[3..0] ENB Din[3..0] Timing Diagram Read Mode Access Asynchronous R/W Address tacc Data Out Tacc = access time Block Diagram of SRAM B2 B1 B0 MC MC B3 MC MC W0 Add=10 ADD[1..0] A D D MC W[3..0] MC MC W1 D E C 2 MC MC MC MC MC W2 MC MC MC MC W3 B[3..0] 1 R/W Dout[3..0] ENB Din[3..0] Read Mode Timing Diagram Write Mode Access Asynchronous twp R/W tahd tasu Address A0 A1 A2 tdsu Data In tdhd Din1 Tasu,tahd = address setup and hold times Tdsu,tdhd = data setup and hold times Twp = write pulse time Block Diagram of SRAM B2 B1 B0 MC MC B3 MC MC W0 Add=10 ADD[1..0] 2 A D D MC W[3..0] MC MC MC W1 D E C MC MC MC MC W2 MC MC MC MC W3 B[3..0] 0 R/W Dout[3..0] ENB Din[3..0] Write Mode LPM_RAM_DQ Dual Port SRAM Dual Port SRAM A dual port SRAM allows data to be read from one port and written to another port. Read Port Write Port This is not a “true” dual port You can read and write to both ports simultaneously as long as you are not using the same address. FIFO First-in First-out Buffer FIFO Buffer A first-in first-out (FIFO) buffer is used to synchronize two data streams that are processing data at different rates. Note: the “average” data rates of both sides have to be equivalent. As the name implies, the first data byte written on the input side(First-in) is the first data byte read on the output side (First-out). LPM_FIFO Input Data Port Write request port Read request port Clock Asynch reset Output Data port Buffer empty signal Content Addressable Memory (CAM) CAMS Content Addressable Memory (CAM) A CAM is an “inverse” RAM. That is, you provide input data and the CAM provides the address location of the data. Address DATA CAM