A Case For Transistor Level Design

advertisement
A Case For Transistor
Level Design
Eitan Rosen, MSIL, Dec 2006
It’s a question of where and when…
• Admittedly, it’s more difficult
– More effort, longer time
– Not well covered in flows
– No tools to tell you it’s OK
• But rewarding, when done right
– Build the right cells for blocks
– Invest in whole blocks that can not be designed using cells
• Cells to be used in Cell based designs
– Within synthesis blocks
– Within custom blocks
• Complete functional blocks
– Register files, shifters, caches, adders
At MSIL: a Fast Embedded Controller
• Custom blocks where cell based not feasible
– Neither synthesis, nor manual design using standard cells worked
– Not meeting speed or area spec.
– Going beyond plain CMOS methodology
– Small signal, Domino, OPL, pulsed
• Custom cells help meet spec in control blocks
–
–
–
–
Faster cells of same functionality
Or, Offering more functionality at improved performance
Typically dynamic, often pulsed
Suitable for use by automated tools
– Self contained (no special clocks required)
– Meeting Layout, noise and drive constrains
Circuit Challenges
• The Design itself
– A large design space
– Directed optimization. Can’t try out everything
– No concrete flow / tool to do the design
– That’s what circuit designers are for
• Timing verification and modeling
– Using HSIM heavily, with home made tool for results analysis
– Cell characterization + static timing analysis (PT)
• Noise
– Avoidance and over design. No dedicated tools
– Limit routing distance, non minimal width and space
– Time window filtering
– Over design where filtering doesn’t apply
• Layout
– Layout is where “the rubber meets the road”
– Require engineering attention from the start for “low cap”/ “low noise”
– A realistic parasitic estimation flow/tool is mandatory
More Challenges
•
Logic Verification
– Device level Exhaustive simulation for small cells
– Device level Simulation and formal verification for blocks
•
Power Supplies
– IR, current density, thermal
•
Interconnects
– space , width, number of vias, variation
•
Memory Redundancy
– SER and ECC – will need to handle in L2 cache
Static (and cell based) Tools Do Not Handle
• Simultaneous transitions
– What if several flops (or Domino gates) drive a gate?
• Pulsed circuits, Timers
– Loops. Extra margins needed
• Noise
– Supply noise
– Simultaneous noise at several inputs
– Time filtering, logical filtering
• Complementary and OPL circuits
– Keeper “kicks in” when other node flips
– Input discharging while evaluation in progress, resulting in a glitch
• => Spice (Hsim) primary design tool
– HSIM output evaluation tool critically required. Built one
Consider: Flop, Adder, tools
• A fast flop
– Reduces overhead time
– Pack in some functionality to elevate pressure on control paths
– Usable by automatic P&R tools
• A fast 32 bit adder
– Critical speed bottleneck in many designs
– Requires subtract operation, as well
• Tools
– An RC and size Excel calculator
– HSIM traces wave analyzer, event based
– CRC and Sizer (circuit rule checker)
Flop
• A pulsed Domino structure
–
–
–
–
Takes a standard clock
Data and enable equally fast
May perform some functionality as well
Scan almost for free (no serial mux)
• Followed by an SPC latch (RS)
– Also known as an SDL (Set Dominant Latch)
• Fast. But burns more power
– And area
• Involves regeneration
– And a “generation” Race
– Cost of robust design is increased Hold requirement
Flop Essentials
• Enable as critical as data
• Scan required (at no delay cost)
• Topology: Pulsed Dynamic + RS latch
clk
en
d
chop
Adder
• Getting stable complementary signals
– In addition to “op code” {a+b, -a+b, a-b}
• 1st stage performs 3:1 mux+ logic
– And, OR + their complement
– Xor and complement
– Plain complementary Domino
• 2nd stage - Carry propagation chain
– Is OPL. Calculates P and G and complements
of up to 4 bits
• 3rd stage is OPL, still
– With delayed clock
• 4th stage is plain 2 stack CMOS
• 5th stage is a mux, 6th – an SDL driver
• Zero and Carry flags included
Tunable clocks
• Glitches controlled by clock delay configuration signals
clock1
clock2
From complement
Cells Layout
• Conflicting local and global resources
– Using lowest metals only increases cap, resistance, Xcap
– Allowing use of higher metals denies resources from router
– Causing higher capacitance, resistance, Xcap
• Rout planning
– Figure out who drives which signal, at what metal
– Use those track in the driving/receiving cells internals
• Use larger devices
– Gate RC limits effective device size
– Braking devices up costs more routing overhead. Compromise
• Estimate local parasitics up front
– Metals associated with gate and diffusion strapping
Clock Trees
• Long clock lines introduce overhead
– Need to account for latest as well as earliest point
• It helps to design a balanced drive
Excel Calculator
• Built a VB two dimension interpolation function
• Use it on RC components of manufacturer’s data
– To supply R, Cs, Cx
– Layer and corner dependent
• Built simple, normalized delay/slope tables
– Vs load and input slope
– Per stacks of 1,2,3
– Provide delay and slope calculator as well
• Build and optimize paths
– Converge through several iterations of a computation
– Apply Excel solver on top of that
RC Calc example
Metal Layer
w
sp
L
mcf
m5
corner:
typB
rc_bcB
rc_wcB
c_bcB
c_wcB
1.20u
0.30u Resistance
99.17ohm
74.17ohm
155.80ohm
118.41ohm
97.58ohm
Cx
26.6fF
37.3fF
17.0fF
23.4fF
30.6fF
1000.00u Ctot
533.6fF
501.1fF
575.7fF
473.4fF
606.1fF
1.5 Cx/Ctot
5%
7%
3%
5%
5%
RC
25.7ps
19.1ps
40.9ps
26.9ps
28.5ps
HSIM Wave Analyzer
• Scans huge fsdb files
– Builds and stores events data-base
– Of switches, glitches, slopes
– Possibly on a subset of the signals
• Reports
–
–
–
–
–
Delays, switching times (negative delays, too)
Data setup and hold issues (domino checks, device based)
Glitches (nets that crossed stable threshold, not making the other)
More, per designers wishes
Max RC on each net (“missing vias tool”)
• Suitable for post layout analysis
– Queries, reports net oriented (a net is a superset of all it’s nodes)
– Queries based on device name and pin, rather than on net name
CRC – circuit rule checker (Sizer)
• a Netlist Parser
– Reading netlist (spice or Verilog)
• Circuit Checker
– Identify topologies
– Assign attributes and check rules
– Check sizing rules
• A static Sizing tool
– Estimate load on each net
– Size driving devices, iteratively
– Using tables or spice runs
– Calculate load on driving net (once load is known): outputs to inputs
It’s Much Fun…
• Exploring design options
– At a powerful device level
• Forming methodologies
– Not in the beaten ASIC path
• Evaluating and Defining tools
– Often tools are not commercially available
• It is effective
– Pushing the envelope of speed/power
– Higher effort compensated by results and re-use
Download