ECE 5745 Complex Digital ASIC Design Topic 4: Full-Custom Design Methodology Christopher Batten School of Electrical and Computer Engineering Cornell University http://www.csl.cornell.edu/courses/ece5745 Design Domains, Abstractions, and Principles Full-Custom Design Part 1: ASIC Design Overview P P M M Topic 4 Full-Custom Design Methodology Topic 6 Closing the Gap Topic 1 Hardware Description Languages Topic 5 Automated Design Methodologies Topic 8 Testing and Verification Topic 3 CMOS Circuits Topic 2 CMOS Devices ECE 5745 T04: Full-Custom Design Methodology Topic 7 Clocking, Power Distribution, Packaging, and I/O 2 / 53 • Design Domains, Abstractions, and Principles • Full-Custom Design Agenda Design Domains, Abstractions, and Principles Modularity Hierarchy Encapsulation Regularity Extensibility Full-Custom Design ECE 5745 T04: Full-Custom Design Methodology 3 / 53 • Design Domains, Abstractions, and Principles • Full-Custom Design Behavioral, Structural, and Physical Abstractions Adapted from [Ellervee’04] ECE 5745 T04: Full-Custom Design Methodology 4 / 53 • Design Domains, Abstractions, and Principles • Full-Custom Design Behavioral, Structural, and Physical Abstractions Adapted from [Ellervee’04] ECE 5745 T04: Full-Custom Design Methodology 5 / 53 • Design Domains, Abstractions, and Principles • Full-Custom Design Computer Engineering Stack Abstractions Application Algorithm Programming Language Operating System Instruction Set Architecture Microarchitecture Register-Transfer Level Gate Level Circuits Devices Technology ECE 5745 Processors Memories Control Datapath Logic State Transistors T04: Full-Custom Design Methodology Networks Memories Interconnect Wires 6 / 53 • Design Domains, Abstractions, and Principles • Full-Custom Design Design Principles in VLSI Design I Modularity – Decompose into components with well-defined interfaces I Hierarchy – Recursively apply modularity principle I Encapsulation – Hide implementation details from interfaces I Regularity – Leverage structure at various levels of abstraction I Extensibility – Include mechanisms/hooks to simplify future changes ECE 5745 T04: Full-Custom Design Methodology 7 / 53 • Design Domains, Abstractions, and Principles • Full-Custom Design ontain all functionality Design Principle: Modularity modules b_out; mux_out; A A sel en x B B sel en B=0 A<B zero? lt Network s_bits_A), ), el), ut) A I$ I$ I$ I$ P P P P sub B _pf out), I Separate design into components Network w/ well-defined interfaces 6.375 Spring 2006 • L03 Verilog 2 - Design Examples • 21 I Reason, design, and test D$ D$ D$ D$ components in isolation I Interface may or may not Network encapsulate implementation ECE 5745 T04: Full-Custom Design Methodology 8 / 53 • Design Domains, Abstractions, and Principles • Full-Custom Design Design Principle: Modularity Modularity can also impact electrical and physical characteristics Physical Modularity Electrical Modularity What happens if we cascade many of these tranmission gate multiplexers? ECE 5745 pwr/gnd rails & wells in fixed locations so they connect via abutment Adapted from [Weste’11] T04: Full-Custom Design Methodology 9 / 53 • Design Domains, Abstractions, and Principles • Full-Custom Design Design Principle: Hierarchy GCD Control Unit Datapath Unit Multi-Bit Register Multi-Bit Multiplexer Multi-Bit Subtracter Single-Bit Multiplexer Single-Bit Flip-Flop Full-Adder Recursively apply modularity principle until complexity of submodules is manageable ECE 5745 T04: Full-Custom Design Methodology 10 / 53 • Design Domains, Abstractions, and Principles • Full-Custom Design Design Principle: Hierarchy System Network I$ I$ I$ I$ Processor P P P D$ D$ Network ECE 5745 Data Cache Bank P Processor Control Network D$ Instruction Cache Proc-to-D$ Network Processor Datapath Router Cache Control Cache Datapath D$ Router Control T04: Full-Custom Design Methodology Router Datapath 11 / 53 • Design Domains, Abstractions, and Principles • Full-Custom Design Design Principle: Encapsulation I Modularity requires well-defined interfaces, but these interfaces might still expose significant implementation details (e.g., interface in control/datapath split reveals many details of the implementation) I Choose interfaces that hide implementation details where possible to enable more robost composition I Lab 1 multipliers all use a latency-insenstive val/rdy message interface to hide timing details, any one of these can be swaped into a processor and should work without modification . Fixed-latency iterative multiplier . Variable-latency iterative multplier . Pipelined multiplier ECE 5745 T04: Full-Custom Design Methodology 12 / 53 • Design Domains, Abstractions, and Principles • Full-Custom Design Design Principle: Regularity I Modularity, hierarchy, and Network I$ I$ I$ I$ P P P P Network D$ D$ D$ D$ encapsulation can still lead to many different kinds of modules which can increase design complexity I Choose a hierarchical decomposition to leverage structure and thus faciliate reuse and reduce complexity I Both structural and physical regularity can be exploited Network ECE 5745 T04: Full-Custom Design Methodology 13 / 53 • Design Domains, Abstractions, and Principles • Full-Custom Design Design Principle: Regularity Common Cache Design Network Structural Regularity in Ripple-Carry Adder I$ I$ I$ I$ P P P P Network D$ D$ D$ D$ Structural Regularity in Network Physical Regularity in Datapaths Physical Regularity in Memories Network Common Network Design ECE 5745 T04: Full-Custom Design Methodology 14 / 53 • Design Domains, Abstractions, and Principles • Full-Custom Design ntain all functionality Design Principle: Extensibility odules out; x_out; Network A A sel en B B sel en B=0 A<B bits_A), zero? A ), ) I$ I$ I$ I$ P P P P lt sub B Network f t), D$ Simple form of polymorphism enables varying bitwidth of operands 6.375 Spring 2006 • L03 Verilog 2 - Design Examples • 21 Difficult with full-custom design methodology! ECE 5745 D$ D$ D$ Network Parameterization of network and caches enables reuse; static elaboration could enable varying the number of cores and the types of components T04: Full-Custom Design Methodology 15 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Agenda Design Domains, Abstractions, and Principles Full-Custom Design Cells Datapaths Memories Control ECE 5745 T04: Full-Custom Design Methodology 16 / 53 devices (Intel microprocessors, RF power amps for Design Domains, Abstractions, and Principles • Full-Custom Design • Requires complete customization of all layers of waf Full Custom Design Piece of full-custom multiplier array, 1.0 m 2-metal Key is that all circuits and transistors are optimized for specific context 2 Feb 2005 6.884 – Spring 2005 Intel 4004 ECE 5745 T04: Full-Custom Design Methodology 17 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Overview of Full Custom Design Methodology Adapted from [Weste’11] ECE 5745 T04: Full-Custom Design Methodology 18 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Cells: Lambda-Based Design Rules Lambda-based design rules One lambda = one half of the “minimum” mask dimension, typically the length of a transistor channel. Usually all edges must be “on grid”, e.g., in the MOSIS scalable rules, all edges must be on a lambda grid. 2x2 1 3 2 2 2 3 3 1 diffusion (active) poly 1 2 2 3 2x2 3 metal1 contact Adapted from [Terman’02] More info at: http://www.mosis.org/Technical/Designrules/scmos/scmos-main.html 6.371 – Fall 2002 ECE 5745 10/16/02 T04: Full-Custom Design Methodology L12 – CMOS Layout 2 19 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Cells: Sample “Lambda” Layout Sample vdd A Y vss Adapted from [Terman’02] 6.371 – Fall 2002 ECE 5745 10/16/02 T04: Full-Custom Design Methodology L12 – CMOS Layout 3 20 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Cells: Lambda vs. Micron Rules Lamba vs. Micron rules Lambda-based design rules are based on the assumption that one can scale a design to the appropriate size before manufacture. The assumuption is that all manufacturing dimensions scale equally, an assumption that “works” only over some modest span of time. For example: if a design is completed with a poly width of 2 and a metal width of 3 then minimum width metal wires will always be 50% wider than minimum width poly wires. Consider the following data from Weste, contacted metal pitch Table 3.2: 1/2 * contact size contact surround metal-to-metal spacing contact surround 1/2 * contact size lambda lambda rule = 0.5u 1 0.5 1 0.5 3 1.5 1 0.5 1 0.5 8 4 micron rule 0.375 0.5 1.0 0.5 0.375 2.75 Scaled design is legal but much larger than it needs to be! Adapted from [Terman’02] 6.371 – Fall 2002 ECE 5745 10/16/02 T04: Full-Custom Design Methodology L12 – CMOS Layout 5 21 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Cells:and Sticks and Compaction Sticks Compaction Stick diagram Compact X then Y Horizontal constraints for compaction in X Compact Y then X Compact X with jog insertion, then Y Adapted from [Terman’02] 6.3715745 – Fall 2002 ECE 10/16/02 T04: Full-Custom Design Methodology L12 – CMOS Layout227 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Cells: Example Stick Diagram ECE 5745 T04: Full-Custom Design Methodology 23 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Cells: Example Stick Diagram ECE 5745 T04: Full-Custom Design Methodology 24 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Cells: Cell “Styles” Choosing a “style” What about routing signals between gates? Note that both layouts block metal/poly routing inside the cell. Choices: metal2 routing over the cell or routing above/below the cell. avoid long (> 50 squares) poly runs Vertical Gates Good for circuits where fets sizes are similar and each gate has limited fanout. Best choice for multiple input static gates and for datapaths. Horizontal Gates Good for circuits where long and short fets are needed or where nodes must control many fets. Often used in multiple-output complex gates (e.g, sum/carry circuits). don’t “capture” white space in a cell don’t obsess over the layout, instead make a second pass, optimizing where it counts Adapted from [Terman’02] 6.371 – Fall 2002 ECE 5745 T04: Full-Custom10/16/02 Design Methodology L12 – CMOS Layout258/ 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Cells: Optimizing Connections Optimizing connections Which does this gate do? Which is the better gate layout? Which is better considering node capacitances? ECE 5745 T04: Full-Custom Design Methodology considering node capacitances? Adapted from [Terman’02] 26 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Cells: Optimizing Large Transistors area = 928 area = 1008 area = 1080 is better considering Which isWhich the better gate layout? wire resistances? Which is better considering node capacitances? ECE 5745 considering node capacitances? T04: Full-Custom Design Methodology Adapted from [Terman’02] 27 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Cells: Optimizing Diffusion Sharing Eliminating Gaps A B C D C E B A A C D E ECE 5745 D C A D D E B D A C 6.371 – Fall 2002 E B B B A E C 10/16/02 T04: Full-Custom Design Methodology E L12 – CMOS Layout 11 Adapted from [Terman’02] 28 / 53 Replicating Cells Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Cells: Optimizing Across Cells Whatdoes does this this cell What celldo? do? What if we want to replicate What if we want to replicate th this cell vertically to process vertically, i.e., make a stack of many bits in parallel? cells, to process many bits in parallel? what nodes are shared among the cells? what nodes aren’t share how should we arrange t cells vertically? Adapted from [Terman’02] ECE 5745 T04: Full-Custom Design Methodology 29 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Vertical Replication Custom Cells: Optimizing Across Cells Place shared geometry symmetrically about shared boundary. Place items that aren’t to be shared 1/2 minspacing rule from shared boundary. Reflect cell about X axis so that Pfets are next to each other: this avoids large ndiff/pdiff spacing. Run shared control signals vertically -they’ll wire themselves up automatically? 6.371 – Fall 2002 ECE 5745 10/16/02 T04: Full-Custom Design Methodology L12from – CMOS Layout 13 Adapted [Terman’02] 30 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Cells: Merging Simple Cells Two latch implementations: Left implementation composes primitive gates, while right implementation uses single tightly integrated gate Adapted from [Weste’11] ECE 5745 T04: Full-Custom Design Methodology 31 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Datapaths, Memories, Control decode br_targ jr j_targ pc_plus4 br_targ_X pc_F pc_plus4_D +4 ir[25:0] ir[15:0] rst_vect ir[10:6] ir_D addr pc_mux _sel_P rdata j_targ br_targ shamt ir[25:21] ir[20:16] imem ir[15:0] ir[15:0] branch _cond_zero/neg_X op0_mux _sel_D regfile (read) op0_mux _out_X alu_fn_X rf_waddr_W branch _cond_eq_X wb_mux_sel_M result_W alu_out_M 16 0 rf_wen_W regfile (write) alu proc2cop_data_W zext op1_mux _out_X sext execute _mux_sel_X muldivreq_msg _fn_X muldiv _mux_sel_X md_mux op1_mux _sel_D dmemresp _mux_sel_M subword imuldiv wdata_X addr wdata dmem_msg _rw_M Fetch (F) ECE 5745 Decode (D) Execute (X) T04: Full-Custom Design Methodology rdata dmem Memory (M) Writeback (W) 32 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Datapaths, Memories, Control Adapted from [Weste’11] ECE 5745 T04: Full-Custom Design Methodology 33 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Datapaths: PC Generation Unit MIPS PC Generation Unit 32 32 PC for Link Register Jump Seq + + 4 Reset Exc. Branch 32 32 32 32 Jump Register PC Instruction Register PC to Instruction Memory Routing every path as a separate 32-bit bus would take a large area Adapted from [Terman’02] ECE 5745 T04: Full-Custom Design Methodology 34 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Datapaths: Bitslices Datapath Bitslices Implement datapath as single bit slices, contain one bit from each functional unit Route each bus bit position within bitslice Carry chain connections 6.371 – Fall 2002 ECE 5745 + + Bit 2 + + Bit 1 + + Bit 0 9/27/02 T04: Full-Custom Design Methodology Adapted from [Terman’02] L08 – Logical Effort and ASIC Design Styles 17 35 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Datapaths: Pitch Datapath Bit Bit Pitch Height of each bit slice depends on: – height of tallest cell in entire bitslice – maximum number of buses running through any cell in bitslice Two layers metal, buses run by side of cells Bit Pitch Three+ layers metal, buses run over cells Adapted from [Terman’02] ECE 5745 6.371 – Fall 2002 T04: Full-Custom Design Methodology 9/27/02 36 / 53 L08 – Logical Effort and ASIC Design Styles 18 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Datapaths: Gen Example DatapathPCExample Bus rip out at right angles Buses routed by side of cells In 1.0 m, 2-metal CMOS process Adapted from [Terman’02] 6.371 – Fall 2002 ECE 5745 9/27/02 T04: Full-Custom Design Methodology L08 – Logical Effort and ASIC Design37 Styles / 53 19 Design Domains, Abstractions, and Principles • Full-Custom Design • CustomDatapath Datapaths: Library Datapath Library Cells Cells Have to choose maximum datapath cell height – Too high, wastes area in simple cells – Too small, squeezes complex cells. Grow superlinearly in length dimension, so also wastes area. Compromise, around 8-12 metal tracks works OK Power run vertically in M2 VDD GND 12 track pitch in M3 (buses fly over cell horizontally) Control lines run vertically in M2, e.g., mux selects, clocks 6.371 – Fall 2002 ECE 5745 Width varies according (example metal assignment, to cell type others possible) 9/27/02 T04: Full-Custom Design Methodology Adapted from [Terman’02] L08 – Logical Effort and ASIC Design Styles 20 38 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Datapaths: Optimizing Datapath Layout + mux + mux Reduce congestion by rearranging datapath components to minimize required number of vertical tracks per bitslice + + Count number of wires between each component Datapath Input or Output 1 2 4 5 Bus Writer Bus Reader 3 1 Bit cells must be >5 tracks high 2 4 4 3 Bit cells must be >4 tracks high Count number of wires crossing over each cell Carefully account for buses going in opposite directions 2 ECE 5745 4 5 5 2 T04: Full-Custom Design Methodology 4 4 4 39 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Datapaths: MIPS Datapath Track Allocation Adapted from [Weste’11] ECE 5745 T04: Full-Custom Design Methodology 40 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Datapaths: MIPS Datapath Example (1) Adapted from [Weste’11] ECE 5745 T04: Full-Custom Design Methodology 41 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Datapaths: MIPS Datapath Example (2) MIPS Datapath Example Bypass network for six-stage Pipe Adder Memory address leaves here Fetch address leaves here 32 bits MIPS register file 6.371 – Fall 2002 ECE 5745 Load data arrives here Log shifter layout Multiply/Divide Unit Instruction bits arrive here (also in bypass) 9/27/02 T04: Full-Custom Design Methodology Program counter generation Adapted from [Terman’02] L08 – Logical Effort and ASIC Design Styles 22 42 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Memories: Array Structure Adapted from [Weste’11] ECE 5745 T04: Full-Custom Design Methodology 43 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Memories: Register File Circuits Adapted from [Weste’11] ECE 5745 T04: Full-Custom Design Methodology 44 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Memories: Register File Circuits Adapted from [Weste’11] ECE 5745 T04: Full-Custom Design Methodology 45 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Memories: SRAM Circuits Adapted from [Weste’11] ECE 5745 T04: Full-Custom Design Methodology 46 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Memories: SRAM Layut Adapted from [Weste’11] ECE 5745 T04: Full-Custom Design Methodology 47 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Custom Memories: SRAM Memory Layout Regular arrays built with cells that abut in two dimensions – Have to pitch match in both dimensions Word Line Drivers SRAM Cell Final Decode Power Buses Address Predecoders Sense Amps Adapted from [Terman’02] 6.371 – Fall 2002 ECE 5745 23/ 53 T04: Full-Custom9/27/02 Design Methodology L08 – Logical Effort and ASIC Design Styles 48 Design Domains, Abstractions, and Principles • Full-Custom Design • Finite-State-Machine Control Unit Adapted from [Weste’11] ECE 5745 T04: Full-Custom Design Methodology 49 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Full Custom Control Logic with PLA Adapted from [Weste’11] ECE 5745 T04: Full-Custom Design Methodology 50 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Top-Level Chip Floorplan Adapted from [Weste’11] ECE 5745 T04: Full-Custom Design Methodology 51 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Final Full-Custom MIPS Processor Adapted from [Weste’11] ECE 5745 T04: Full-Custom Design Methodology 52 / 53 Design Domains, Abstractions, and Principles • Full-Custom Design • Acknowledgments I [Weste’11] N. Weste and D. Harris, “CMOS VLSI Design: A Circuits and Systems Perspective,” 4th ed, Addison Wesley, 2011. I [Terman’02] C. Terman and K. Asanović, MIT 6.371 Introduction to VLSI Systems, Lecture Slides, 2002. I [Ellervee’04] P. Ellervee, IAY3714 VLSI Synthesis and HDLs, Lecture Slides, 2004. ECE 5745 T04: Full-Custom Design Methodology 53 / 53