Jeffrey Dwoskin & Kevin Green VLSI Design Project Report Fall 2001 – Spring 2002 Project Description We have designed a microcontroller chip based on the RTL design and instruction set of an AMD 2910. Our chip is the sequencer component of the microcontroller, which processes instructions from the micro-program ROM and determines the address of the next microprogram instruction. The controller has a 4 input multiplexor that is used to determine the next instruction from the direct address input, the microprogram counter, the register/counter, or the 4-word stack. The direct address is an input to the chip which is used to initialize the controller, start execution of a new instruction with an address from the mapping PROM, or for branching instructions from the microprogram ROM. The microprogram counter consists of an incrementer followed by a 12-bit register. The incrementer takes as input, the current microprogram address and adds one to advance to the next instruction. This new address is then stored in the register. The address register/counter is used for looping over a set of instructions. It does this by first loading a count from the direct input, and then decrementing after each loop iteration. When the count reaches zero, a signal is sent that can be used to stop executing the loop. It can alternatively be used to store an address for a conditional jump instruction. The 4-word 12-bit stack is used to execute subroutines and stores the return address. Since it has 4-words, it can be used for 4 levels of depth for subroutine calls. During each instruction, the microcode ROM also provides the control signals for the rest of the micro-controlled CPU. The combination of our design, the microprogram ROM, and the mapping PROM make up the control unit for the CPU. The advantage of the microprogrammed control unit over a standard state machine design of a control unit is that by changing the program in the microprogram ROM and mapping PROM, it can be made to emulate the instruction set of any standard microprocessor. We also designed a bit-sliced 8-bit ALU which resides on a separate chip. Multiple ALU chips can be cascaded to produce a large a bus width as desired. The ALU implements, addition/subtraction, all logic functions including: AND, OR, XOR, and their complements. Also, the second input can be inverted to allow for useful logic functions such as A AND (NOT B). The result can simultaneously be shifted left or right by 1-bit. Testing Plan We plan to use a sequential ATPG to find the all the test vectors for our chip. We plan to use SEST for this purpose. The fault coverage should be very high because there is a global reset line that will initialize all flip-flops, and our chip is designed in a such a way as to give access through the pins to all of the components directly by setting the correct control signals. Moreover, there is an absence of cycles among the flip flops, which will allow the sequential ATPG to fully test the circuit. We could then use an ATE to apply the test vectors. The first step involved would be to netlist our design into the rutmod format suitable for input into SEST. Then SEST will be able to generate all of the test vectors. However, due to the fact that our design is in Cadence and this software does not support the rutmod format we would have to design the netlist by hand, which is not feasible considering we have thousands of transistors. Therefore, we will have to wait for a tool to convert our netlist from Cadence into rutmod. However, as stated above we believe that our design is optimal for testability because all components have high observability and controllability. Criticism of the CAD Tools Overall we found the CAD tools were straightforward and easy to use, however we found a few things that need improvement. At first we had designed one portion of our project in synopsis, but we were unable to convert the resulting hardware from synopsis into cadence. Synopsis also does not provide enough details of the interconnection between components for us to convert the design manually. For example, when it chose to use a JK flip flop, it showed a box with 4 or more inputs and the wires going into it. It did not tell us which wires went to which inputs in the flip flop. More importantly, we could not get synopsis to restrict its design to only use certain components that we chose. For example, we wanted it to use inverting logic instead of noninverting logic and only D flip flops instead of JK flip flops, but we were unable to build a new library or restrict the standard G-tech library to accomplish this. We instead had to design the component by hand using K-maps. As for Cadence, we were unable to simulate extracted layouts. For most of the fall semester, many of the components of cadence did not work for the AMI process. This included the simulator, extractor, and LVS. This made it difficult to test and verify our design as we went along. Also the design rule checker (DRC) was not working for most of the time we were laying out our design. This made the layout time consuming and more error prone and set us back a couple of months. It also meant that we didn’t know to follow some of the more obscure rules from the printed design rules. We had some other problems with the Affirma analog simulator. First, it is very slow and difficult to work with. Many of the settings we have to set to the same values repeatedly, which it should remember. Also, it could use a much simpler interface, especially for entering the stimuli. For any circuit with more than 2 or 3 input pins, setting the stimuli correctly is very tedious. We’ve had problems simulating designs with a hierarchy, especially when it doesn’t identify global sources deeper in the hierarchy. There is a problem with the model libraries as well, although we're using AMI06 tech library, it still is using tsmc25 in the netlist and we can't determine where its getting this information. Timing/Critical Path Analysis Data from AMI C5N Process: Sampled from: http://www.mosis.org/Technical/Testdata/ami-c5n-prm.html Sheet resistance: metal1/2: metal3: poly: 0.09 ohm/sq 0.06 ohm/sq 22 ohm/sq m1/2 contact: 0.7 - 0.85 ohm Capacitance: (aF(10-18)/µm or µm2) area (sub) area (m1) area (m2) fringe (sub) fringe (m1) fringe (m2) m1 31 m2 17 32 76 59 56 m3 10 13 36 39 35 51 Wire Delay: Longest path: assume worst case all metal 1 pc output to next addr mux: 1550µm + 3 contacts at minimum width = 0.9µm 1550µm/0.9µm = 1722.2 squares long Resistance = 0.09ohm/sq * 1722.2 sq = 155 ohm + 3 * 0.85ohm = 157.55 ohm Capacitance: area (sub) = 31aF/µm^2 * 1550*0.9µm = 43245 aF fringe (sub) = 76aF/µm * 1550µm = 117800 aF total cap: 161045 aF = 0.161045 pF RC = Tdelay = 25.373 ps = 0.025373 ns Conclusion: wire delay is insignificant rough approx (min width wire, assuming 1µm wide) RC = 25.373ps / 1550µm = 0.01637 ps / µm Component Delay: Simulated to find delays: DFF w/clr - .5 ns 2to1 mux - .675 ns 4to1 mux - .900 ns inc/dec 12 bit - 2.55ns condition mux (4to1 + 2to1 mux) 1.575 ns Calculated from components that were simulated: bus enable - .325 ns control unit - 1.775 ns stack control -- longest path through two 2-to-1 muxes - 1.35 ns Critical Path Analysis: There are 8 major paths in our chip that we are considering. They are the 4 inputs into the next address mux, and the paths that drive the components in each of those path. Stack path output: 675um from control unit to stack control – ignore delay for signal thru mux -- .900 ns 1150um from stack dff output to next addr mux input -- ignore Total: 0.900 ns Stack input: delay thru stack control -- .900 ns or 620um from pc to stack inputs -- ignore delay to load registers in stack -- .5 ns Total: 1.4 ns Program counter Input: 700um from next addr mux output to incrementor -- ignore delay thru incrementer -- 2.55 ns delay to set register -- .5 ns Total: 2.65ns Program counter output: 1550um from PC output to next addr mux -- 0.025373ns delay thru next addr mux -- .9ns Total: 0.925ns Addr/reg load input: 1100um from input reg to addr/reg mux -- ignore or 750um control signals from control unit to load/dec -- ignore delay thru 2to1 mux – 0.675ns 400um to other 2to1 mux -- ignore delay thru 2to1 mux – 0.675ns delay to set register – 0.5ns Total: 1.85ns Addr/reg decrement input: 750um control signals from control unit to load/dec --ignore delay thru decrementer -- 2.55ns delay thru 2to1 mux -- .675ns 400um to other 2to1 mux -- ignore delay thru 2to1 mux -- .675ns delay to set register -- .5ns Total: 4.4ns Addr/reg output 825um from register output to next addr mux -- ignore delay thru next addr mux -- .9ns Total: 0.9ns Control unit input: 575um from pads into condition mux -- ignore delay thru condition mux (8to1) -- 1.575 ns 650um from condition mux to control unit -- ignore delay thru control unit -- 1.775 ns Total: 3.35ns Conclusion: The address register’s decrementer input results in the longest delay of 4.4ns. This must occur during half of the clock cycle, which makes our clock period: 1/8.8ns = 113.6 MHz In order to be safe, we’ll say our maximum clock rate is 100 MHz. Transistor Count Control Unit: 8to1 Mux: 4to1 Mux – 12-bit: Addr Reg/Dec- 12-bit: Stack: 2 x 12-bit registers: 2 x bus enable: 5 x clock inverters: Incrementor: 336 36 192 496 1358 504 48 24 108 Total: 3102 Power Dissipation Estimate chip as 3102 transistors/2 = 1551 inverters Gate Capacitance: Gate cap on an inverter: 1008.1 aF Total gate capacitance Cg = 1008.1 * 1551 = 1.56 pF Diffusion Capacitance: Cd = Cja x a x b + Cjp x (2a + 2b) a = 1.5µ, b = 1.2µ P-trans: 5x10-4pF x 1.8 + 4 x 10-4pF x 5.4 = 0.00306pF N-trans: 3x10-4pF x 1.8 + 4 x 10-4pF x 5.4 = 0.0027pF Total Cd = (0.00306 + 0.0027) x 1551 inverters = 8.93pF Interconnect Capacitance: Total length of major interconnects in Metal 1: 21,505µm Total length of major interconnects in Metal 2: 53,660µm Metal 1: 21505µ x 0.9µ x 31 aF/µ2 + 21505µ x 76aF/µ = 2.2pF Metal 2: 53660µ x 0.9µ x 17 aF/µ2 + 53660µ x 59aF/µ = 3.98pF Total Interconnect Capacitance = 6.18pF Total Load Capacitance: CL = 1.56pF + 8.93pF + 6.18pF = 16.67pF Power = CL x VDD2 x f = 16.67pF x 5V2 x100MHz = 41.675mW Metal Migration: I = 41.675/5V = 8.3mA Width = 8.3mA / 0.5mA/µ = 16.67µ = 8.33µ per power line. We made them 12µ wide to be safe. Address Register/Decrementer Usage The address register/decrementer is used primarily for execution of loops. First the register is loaded with an initial count from the direct input. On each clock period the value stored is decremented and compared to zero. When the value reaches zero a signal is sent to the condition mux so execution of the loop is completed. The address register/decrementer can also be used to store an address to jump back to during conditional branches. Components 12-bit register w/ clear 12-bit decrementer 2x12 2 to 1 muxes zero detector The address register/decrementer is composed of the four components listed above. The register is loaded from a mux which selects from either its previous value or the other mux. The other mux selects between the output of the decrementer and the direct input. The decrementer takes its input from the current value held in the registers and subtracts one from this value. All the components are 12-bits wide. Of course the zero detector signals when the current value of the register is all zeros. The register also has a 12-bit output to the next address mux, which is used for the branching operations. Bit-sliced 8-bit ALU Usage Multiple ALU chips can be cascaded to produce a large a bus width as desired. The ALU implements, addition/subtraction, all logic functions including: AND, OR, XOR, and their complements. Also, the second input can be inverted to allow for useful logic functions such as A AND (NOT B). The result can simultaneously be shifted left or right by 1-bit. Components The 8-bit ALU is a ripple carry configuration of 8 1-bit ALUs. Bus Enable Usage Used to connect to the I/O bus for selecting between the direct address input and the next address output. It is controlled by the level of the clock so that when the clock is high we read the direct input and when the clock is low the next address output lines are set. Components Three state-buffer The Bus enable is a series of twelve three state-buffers. Clock Inverter Usage The clock inverter is used whenever we need a signal and its complement as control signals. It is not just used for the clock, we use it all over the chip. It generates two signals that are complements of each other with no overlap of the signals. Components We used a transmission gate designed with the same delay as a strong inverter, which are placed in parallel and given the same input. The two outputs are 180 degrees out of phase. Condition Multiplexer Usage The 8 to 1 condition mux selects between various external signals which are used to determine the way an instruction is executed. For example, which path of a branch is taken. By using the mux, the decision can be made based on: - The sign of the ALU output - Whether the ALU output equals zero - Whether the ALU output overflowed - The shift out bit from the ALU - The carry out bit from the ALU - The interrupt signal - Always true (1) - Always false (0) Control Unit Usage Decodes the instruction given as input to the chip, along with signals address zero and condition to generate control signals for all the components on chip. The address zero signal comes from the address register/counter and is used for loop control. The condition signal comes from the condition mux, which selects from external signals coming from other parts of the CPU. Components The control unit is made up of random logic. We used the instruction set from Mick&Brick and used K-maps to design the logic schematic. 12-bit Decrementer Usage The 12-bit decrementer is used in the address register/counter to decrement the current value by one. Components dec first bit dec last bit dec two bit The decrementer is a ripple carry decrementer which is composed of a one bit decrementer which is basically an inverter followed by 5 dec two bit components. The two-bit component uses alternating logic for speed. The decrementer is completed with a one bit decrementer at its tail, which is a single xor. 12-bit Incrementer Usage The 12-bit incrementer is used in the address register/counter to decrement the current value by one. Components inc first bit inc last bit inc two bit The incrementer is a ripple carry incrementer which is composed of a one bit incrementer which is basically an inverter followed by 5 dec two bit components. The two-bit component uses alternating logic for speed. The incrementer is completed with a one bit incrementer at its tail, which is a single xor. Next Address Multiplexor Usage The next address multiplexor is a 12-bit, 4 to 1 multiplexor which selects the source of the next address output. It selects from: - Direct Input - Stack - Address Register/Counter - Program Counter 12-bit Register w/ clear Usage The 12-bit register is used as a component of the stack, address register, and also to hold the input/output signals. Components 12 DFF w/ clear The 12-bit register is composed of 12 master/slave DFFs that have a clear line. The clock and clear lines are shared among the twelve bits but they are otherwise independent. Stack 4 x 12 Usage This FIFO stack holds four twelve bit words. It is used to hold return addresses while making a subroutine call or conditional jump. Components stack 4x1 stack control reg2 w/ clear decoder The stack is composed of 12 4 word, 1-bit units, and a component to generate the correct address to load or read when given a push or pop instruction. The two registers always hold the next write address and read address, which are two bits each. They are set by a stack control unit whenever a push or pop instruction has been issued. The decoder then activates the correct word position in each 4x1 unit. The 4x1 unit has a mux to activate the correct word position for output based on the read address.