Usage - Jeffrey Dwoskin

advertisement
Jeffrey Dwoskin & Kevin Green
VLSI Design Project Report
Fall 2001 – Spring 2002
Project Description
We have designed a microcontroller chip based on the RTL design and instruction set of an
AMD 2910. Our chip is the sequencer component of the microcontroller, which processes
instructions from the micro-program ROM and determines the address of the next microprogram
instruction. The controller has a 4 input multiplexor that is used to determine the next instruction
from the direct address input, the microprogram counter, the register/counter, or the 4-word
stack.
The direct address is an input to the chip which is used to initialize the controller, start execution
of a new instruction with an address from the mapping PROM, or for branching instructions
from the microprogram ROM.
The microprogram counter consists of an incrementer followed by a 12-bit register. The
incrementer takes as input, the current microprogram address and adds one to advance to the next
instruction. This new address is then stored in the register.
The address register/counter is used for looping over a set of instructions. It does this by first
loading a count from the direct input, and then decrementing after each loop iteration. When the
count reaches zero, a signal is sent that can be used to stop executing the loop. It can
alternatively be used to store an address for a conditional jump instruction.
The 4-word 12-bit stack is used to execute subroutines and stores the return address. Since it has
4-words, it can be used for 4 levels of depth for subroutine calls.
During each instruction, the microcode ROM also provides the control signals for the rest of the
micro-controlled CPU. The combination of our design, the microprogram ROM, and the
mapping PROM make up the control unit for the CPU. The advantage of the microprogrammed
control unit over a standard state machine design of a control unit is that by changing the
program in the microprogram ROM and mapping PROM, it can be made to emulate the
instruction set of any standard microprocessor.
We also designed a bit-sliced 8-bit ALU which resides on a separate chip. Multiple ALU chips
can be cascaded to produce a large a bus width as desired. The ALU implements,
addition/subtraction, all logic functions including: AND, OR, XOR, and their complements.
Also, the second input can be inverted to allow for useful logic functions such as A AND (NOT
B). The result can simultaneously be shifted left or right by 1-bit.
Testing Plan
We plan to use a sequential ATPG to find the all the test vectors for our chip. We plan to use
SEST for this purpose. The fault coverage should be very high because there is a global reset
line that will initialize all flip-flops, and our chip is designed in a such a way as to give access
through the pins to all of the components directly by setting the correct control signals.
Moreover, there is an absence of cycles among the flip flops, which will allow the sequential
ATPG to fully test the circuit. We could then use an ATE to apply the test vectors.
The first step involved would be to netlist our design into the rutmod format suitable for input
into SEST. Then SEST will be able to generate all of the test vectors. However, due to the fact
that our design is in Cadence and this software does not support the rutmod format we would
have to design the netlist by hand, which is not feasible considering we have thousands of
transistors. Therefore, we will have to wait for a tool to convert our netlist from Cadence into
rutmod. However, as stated above we believe that our design is optimal for testability because
all components have high observability and controllability.
Criticism of the CAD Tools
Overall we found the CAD tools were straightforward and easy to use, however we found a few
things that need improvement. At first we had designed one portion of our project in synopsis,
but we were unable to convert the resulting hardware from synopsis into cadence. Synopsis also
does not provide enough details of the interconnection between components for us to convert the
design manually. For example, when it chose to use a JK flip flop, it showed a box with 4 or
more inputs and the wires going into it. It did not tell us which wires went to which inputs in the
flip flop. More importantly, we could not get synopsis to restrict its design to only use certain
components that we chose. For example, we wanted it to use inverting logic instead of noninverting logic and only D flip flops instead of JK flip flops, but we were unable to build a new
library or restrict the standard G-tech library to accomplish this. We instead had to design the
component by hand using K-maps.
As for Cadence, we were unable to simulate extracted layouts. For most of the fall semester,
many of the components of cadence did not work for the AMI process. This included the
simulator, extractor, and LVS. This made it difficult to test and verify our design as we went
along. Also the design rule checker (DRC) was not working for most of the time we were laying
out our design. This made the layout time consuming and more error prone and set us back a
couple of months. It also meant that we didn’t know to follow some of the more obscure rules
from the printed design rules.
We had some other problems with the Affirma analog simulator. First, it is very slow and
difficult to work with. Many of the settings we have to set to the same values repeatedly, which it
should remember. Also, it could use a much simpler interface, especially for entering the stimuli.
For any circuit with more than 2 or 3 input pins, setting the stimuli correctly is very tedious.
We’ve had problems simulating designs with a hierarchy, especially when it doesn’t identify
global sources deeper in the hierarchy. There is a problem with the model libraries as well,
although we're using AMI06 tech library, it still is using tsmc25 in the netlist and we can't
determine where its getting this information.
Timing/Critical Path Analysis
Data from AMI C5N Process:
Sampled from: http://www.mosis.org/Technical/Testdata/ami-c5n-prm.html
Sheet resistance:
metal1/2:
metal3:
poly:
0.09 ohm/sq
0.06 ohm/sq
22 ohm/sq
m1/2 contact: 0.7 - 0.85 ohm
Capacitance: (aF(10-18)/µm or µm2)
area (sub)
area (m1)
area (m2)
fringe (sub)
fringe (m1)
fringe (m2)
m1
31
m2
17
32
76
59
56
m3
10
13
36
39
35
51
Wire Delay:
Longest path: assume worst case all metal 1
pc output to next addr mux: 1550µm + 3 contacts
at minimum width = 0.9µm
1550µm/0.9µm = 1722.2 squares long
Resistance = 0.09ohm/sq * 1722.2 sq = 155 ohm + 3 * 0.85ohm = 157.55 ohm
Capacitance:
area (sub) = 31aF/µm^2 * 1550*0.9µm = 43245 aF
fringe (sub) = 76aF/µm * 1550µm = 117800 aF
total cap: 161045 aF = 0.161045 pF
RC = Tdelay = 25.373 ps = 0.025373 ns
Conclusion: wire delay is insignificant
rough approx (min width wire, assuming 1µm wide)
RC = 25.373ps / 1550µm = 0.01637 ps / µm
Component Delay:
Simulated to find delays:
DFF w/clr - .5 ns
2to1 mux - .675 ns
4to1 mux - .900 ns
inc/dec 12 bit - 2.55ns
condition mux (4to1 + 2to1 mux) 1.575 ns
Calculated from components that were simulated:
bus enable - .325 ns
control unit - 1.775 ns
stack control -- longest path through two 2-to-1 muxes - 1.35 ns
Critical Path Analysis:
There are 8 major paths in our chip that we are considering. They are the 4 inputs into the next
address mux, and the paths that drive the components in each of those path.
Stack path output:
675um from control unit to stack control – ignore
delay for signal thru mux -- .900 ns
1150um from stack dff output to next addr mux input -- ignore
Total: 0.900 ns
Stack input:
delay thru stack control -- .900 ns
or 620um from pc to stack inputs -- ignore
delay to load registers in stack -- .5 ns
Total: 1.4 ns
Program counter Input:
700um from next addr mux output to incrementor -- ignore
delay thru incrementer -- 2.55 ns
delay to set register -- .5 ns
Total: 2.65ns
Program counter output:
1550um from PC output to next addr mux -- 0.025373ns
delay thru next addr mux -- .9ns
Total: 0.925ns
Addr/reg load input:
1100um from input reg to addr/reg mux -- ignore
or 750um control signals from control unit to load/dec -- ignore
delay thru 2to1 mux – 0.675ns
400um to other 2to1 mux -- ignore
delay thru 2to1 mux – 0.675ns
delay to set register – 0.5ns
Total: 1.85ns
Addr/reg decrement input:
750um control signals from control unit to load/dec --ignore
delay thru decrementer -- 2.55ns
delay thru 2to1 mux -- .675ns
400um to other 2to1 mux -- ignore
delay thru 2to1 mux -- .675ns
delay to set register -- .5ns
Total: 4.4ns
Addr/reg output
825um from register output to next addr mux -- ignore
delay thru next addr mux -- .9ns
Total: 0.9ns
Control unit input:
575um from pads into condition mux -- ignore
delay thru condition mux (8to1) -- 1.575 ns
650um from condition mux to control unit -- ignore
delay thru control unit -- 1.775 ns
Total: 3.35ns
Conclusion: The address register’s decrementer input results in the longest delay of 4.4ns. This
must occur during half of the clock cycle, which makes our clock period:
1/8.8ns = 113.6 MHz
In order to be safe, we’ll say our maximum clock rate is 100 MHz.
Transistor Count
Control Unit:
8to1 Mux:
4to1 Mux – 12-bit:
Addr Reg/Dec- 12-bit:
Stack:
2 x 12-bit registers:
2 x bus enable:
5 x clock inverters:
Incrementor:
336
36
192
496
1358
504
48
24
108
Total:
3102
Power Dissipation
Estimate chip as 3102 transistors/2 = 1551 inverters
Gate Capacitance:
Gate cap on an inverter: 1008.1 aF
Total gate capacitance Cg = 1008.1 * 1551 = 1.56 pF
Diffusion Capacitance:
Cd = Cja x a x b + Cjp x (2a + 2b)
a = 1.5µ, b = 1.2µ
P-trans: 5x10-4pF x 1.8 + 4 x 10-4pF x 5.4 = 0.00306pF
N-trans: 3x10-4pF x 1.8 + 4 x 10-4pF x 5.4 = 0.0027pF
Total Cd = (0.00306 + 0.0027) x 1551 inverters = 8.93pF
Interconnect Capacitance:
Total length of major interconnects in Metal 1: 21,505µm
Total length of major interconnects in Metal 2: 53,660µm
Metal 1: 21505µ x 0.9µ x 31 aF/µ2 + 21505µ x 76aF/µ = 2.2pF
Metal 2: 53660µ x 0.9µ x 17 aF/µ2 + 53660µ x 59aF/µ = 3.98pF
Total Interconnect Capacitance = 6.18pF
Total Load Capacitance: CL = 1.56pF + 8.93pF + 6.18pF = 16.67pF
Power = CL x VDD2 x f = 16.67pF x 5V2 x100MHz = 41.675mW
Metal Migration: I = 41.675/5V = 8.3mA
Width = 8.3mA / 0.5mA/µ = 16.67µ = 8.33µ per power line. We made them 12µ wide to be safe.
Address Register/Decrementer
Usage
The address register/decrementer is used primarily for execution of loops. First the register is
loaded with an initial count from the direct input. On each clock period the value stored is
decremented and compared to zero. When the value reaches zero a signal is sent to the condition
mux so execution of the loop is completed. The address register/decrementer can also be used to
store an address to jump back to during conditional branches.
Components
12-bit register w/ clear
12-bit decrementer
2x12 2 to 1 muxes
zero detector
The address register/decrementer is composed of the four components listed above. The register
is loaded from a mux which selects from either its previous value or the other mux. The other
mux selects between the output of the decrementer and the direct input. The decrementer takes
its input from the current value held in the registers and subtracts one from this value. All the
components are 12-bits wide. Of course the zero detector signals when the current value of the
register is all zeros. The register also has a 12-bit output to the next address mux, which is used
for the branching operations.
Bit-sliced 8-bit ALU
Usage
Multiple ALU chips can be cascaded to produce a large a bus width as desired. The ALU
implements, addition/subtraction, all logic functions including: AND, OR, XOR, and their
complements. Also, the second input can be inverted to allow for useful logic functions such as
A AND (NOT B). The result can simultaneously be shifted left or right by 1-bit.
Components
The 8-bit ALU is a ripple carry configuration of 8 1-bit ALUs.
Bus Enable
Usage
Used to connect to the I/O bus for selecting between the direct address input and the next address
output. It is controlled by the level of the clock so that when the clock is high we read the direct
input and when the clock is low the next address output lines are set.
Components
Three state-buffer
The Bus enable is a series of twelve three state-buffers.
Clock Inverter
Usage
The clock inverter is used whenever we need a signal and its complement as control signals. It is
not just used for the clock, we use it all over the chip. It generates two signals that are
complements of each other with no overlap of the signals.
Components
We used a transmission gate designed with the same delay as a strong inverter, which are placed
in parallel and given the same input. The two outputs are 180 degrees out of phase.
Condition Multiplexer
Usage
The 8 to 1 condition mux selects between various external signals which are used to determine
the way an instruction is executed. For example, which path of a branch is taken. By using the
mux, the decision can be made based on:
- The sign of the ALU output
- Whether the ALU output equals zero
- Whether the ALU output overflowed
- The shift out bit from the ALU
- The carry out bit from the ALU
- The interrupt signal
- Always true (1)
- Always false (0)
Control Unit
Usage
Decodes the instruction given as input to the chip, along with signals address zero and condition
to generate control signals for all the components on chip. The address zero signal comes from
the address register/counter and is used for loop control. The condition signal comes from the
condition mux, which selects from external signals coming from other parts of the CPU.
Components
The control unit is made up of random logic. We used the instruction set from Mick&Brick and
used K-maps to design the logic schematic.
12-bit Decrementer
Usage
The 12-bit decrementer is used in the address register/counter to decrement the current value by
one.
Components
dec first bit
dec last bit
dec two bit
The decrementer is a ripple carry decrementer which is composed of a one bit decrementer
which is basically an inverter followed by 5 dec two bit components. The two-bit component
uses alternating logic for speed. The decrementer is completed with a one bit decrementer at its
tail, which is a single xor.
12-bit Incrementer
Usage
The 12-bit incrementer is used in the address register/counter to decrement the current value by
one.
Components
inc first bit
inc last bit
inc two bit
The incrementer is a ripple carry incrementer which is composed of a one bit incrementer which
is basically an inverter followed by 5 dec two bit components. The two-bit component uses
alternating logic for speed. The incrementer is completed with a one bit incrementer at its tail,
which is a single xor.
Next Address Multiplexor
Usage
The next address multiplexor is a 12-bit, 4 to 1 multiplexor which selects the source of the next
address output. It selects from:
- Direct Input
- Stack
- Address Register/Counter
- Program Counter
12-bit Register w/ clear
Usage
The 12-bit register is used as a component of the stack, address register, and also to hold the
input/output signals.
Components
12 DFF w/ clear
The 12-bit register is composed of 12 master/slave DFFs that have a clear line. The clock and
clear lines are shared among the twelve bits but they are otherwise independent.
Stack 4 x 12
Usage
This FIFO stack holds four twelve bit words. It is used to hold return addresses while making a
subroutine call or conditional jump.
Components
stack 4x1
stack control
reg2 w/ clear
decoder
The stack is composed of 12 4 word, 1-bit units, and a component to generate the correct address
to load or read when given a push or pop instruction. The two registers always hold the next
write address and read address, which are two bits each. They are set by a stack control unit
whenever a push or pop instruction has been issued. The decoder then activates the correct word
position in each 4x1 unit. The 4x1 unit has a mux to activate the correct word position for output
based on the read address.
Download