VLSI for Architects .UVWH$VDQRYLü Computer Architecture Group MIT Laboratory for Computer Science krste@lcs.mit.edu http://www.cag.lcs.mit.edu/6.893-f2000/ 6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 1. © .UVWH$VDQRYLü Future Computing Infrastructure µWatt Wireless Sensor Networks Base Stations MegaWatt Data Centers Wireless Internet Internet PDAs, Cameras, Cellphones, Laptops, GPS, Set-tops, 0.1-10 Watt Clients Routers 6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 2. © .UVWH$VDQRYLü Semiconductor Trends n Non-Recurring Engineering (NRE) costs are increasing rapidly for new designs o >$1M for masks to spin a new design o Engineers cost ~$200K/year (salary+benefits+overhead) o Pentium Pro design verification took around 350 engineer years or ~$70M => Tremendous economies of scale (Can’t sell <1,000,000 parts for <$100 each) n CMOS following Moore’s Law until (at least) 2011-2014 o ITRS’99* roadmap 2011, 50nm technology l 64 Gb DRAMs (8 GB/chip) l 7 billion transistor CPUs l 10 GHz clocks (100 ps cycle time) => Smallest viable chips have huge capacity (~10 million transistors/mm2, 10 million transistors per person per day) [*International Technology Roadmap for Semiconductors] 6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 3. © .UVWH$VDQRYLü Programmable Silicon Replaces Custom Hardware Stereo Video Universal Wireless Flash Storage Stereo Audio I/O Programmable Silicon Position Sensor/ Accelerometer Other Sensors/Effectors DRAM Display + Touchscreen Programmable silicon replaces ASICs, or collections of DSPs, microprocessors and glue logic 6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 4. © .UVWH$VDQRYLü Benchmarks & Metrics n Application space wider than desktop processors o Benchmark as many applications as possible o Include apps done with special hardware now (graphics, audio, crypto) o Whole system measures o Real-time important n Primary metrics o Cost (related to die area but also whole system cost) o Execution Time (latency and throughput, average and worst-case) o Energy (also peak power and peak switching current) n Compare against best possible solution for each application o How much worse than application-specific circuitry? o Moore’s law perhaps makes area the most forgiving dimension l try to keep energy and delay competitive, possibly at expense of area 6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 5. © .UVWH$VDQRYLü VLSI for Architects Two types of question architects ask: n How will this change affect area/delay/energy in current technology? n How will this design scale to future technologies? n For next 10-15 years, the technology is CMOS 6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 6. © .UVWH$VDQRYLü Transistors Gate Drain Source Drain Gate Width Source a) Circuit Symbol Length Minimum Length=2λ b) Physical Realization Drain Gate Ron Drain Width=4λ Cdrain Gate Cgate Csource Source c) Layout View d) Simple RC Model 6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 7. © .UVWH$VDQRYLü Transistors ©IBM Source Bulk IBM SOI Technology 6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 8. © .UVWH$VDQRYLü Method of Logical Effort (Sutherland and Sproul) n Easy way to estimate delays in CMOS process n Indicates correct number of logic stages to use and transistor sizes n Characterize process speed with single delay parameter: τ τ, delay of inverter driving same-sized inverter (no parasitics) τ in range 10-15ps for 0.18µm processes 6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 9. © .UVWH$VDQRYLü Gate Delay Components Cin Logic Gate Cout n Split delay of logic gate into three components Delay = Logical Effort x Electrical Effort + Parasitic Delay n Logical Effort o Complexity of logic function (Invert, NAND, NOR, etc) o Define inverter has logical effort = 1 o Depends only on topology not transistor sizing n Electrical Effort o Ratio of output capacitance to input capacitance Cout/Cin n Parasitic Delay o Intrinsic self-loading of gate o Independent of transistor sizes and output load 6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 10. © .UVWH$VDQRYLü Logical Effort for Simple Gates n Define Logical Effort of Inverter = 1 n For other gates, size to give same current drive as inverter n Logical Effort is ratio of logic gate’s input cap. to inverter’s input cap. Relative Transistor Widths 2 2 2 4 4 2 1 1 2 1 Inverter NAND NOR Input Cap = 3 units Input Cap = 4 units Input Cap = 5 units L.E.=1 (definition) L.E.=4/3 L.E.=5/3 6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 11. © .UVWH$VDQRYLü Electrical Effort Cin Logic Gate Cout n Ratio of output load capacitance over input capacitance: E.E. = Cout/Cin n Usually, transistors have minimum length n Input and output capacitances can be measured in units of transistor gate widths 6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 12. © .UVWH$VDQRYLü Parasitic Delay n Main cause is drain capacitances n These scale with transistor width CgateP RonP CdrainP so P.D. independent of transistor sizes n Useful approximation: Cgate ~= Cdrain n For inverter: RonN CdrainN Parasitic Delay ~= 1.0 τ CgateN 6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 13. © .UVWH$VDQRYLü Inverter Chain Delay n For each stage: Delay = Logical Effort x Electrical Effort + Parasitic Delay = 1.0 (definition) x 1.0 (in = out) + 1.0 (drain caps) = 2.0 units 6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 14. © .UVWH$VDQRYLü Optimizing Circuit Paths Cin Cout n Path logical effort, G = Π gi (gi = L.E. stage i) n Path electrical effort, H = Cout/Cin (hi = E.E. stage i) n Parasitic delay, P = Σ pi (pi = P.D. stage i) n Path effort, F = GH n Minimum delay when each of N stages has equal effort Min. D = NF1/N + P i.e. gi hi = F1/N 6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 15. © .UVWH$VDQRYLü Optimal Number of Stages Cin Cout n Minimum delay when: stage effort = logical effort x electrical effort ~= 3.4-3.8 o Some derivations have e = 2.718.. as best stage effort – this ignores parasitics o Broad optimum, stage efforts of 2.4-6.0 within 15-20% of minimum n Fan-out-of-four (FO4) is convenient design size (~5τ) FO4 delay: Delay of inverter driving four copies of itself 6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 16. © .UVWH$VDQRYLü Wires © IBM © IBM © IBM IBM CMOS7 process 6 layers of copper wiring 6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 17. © .UVWH$VDQRYLü Wires Pitch Height Length Width n Resistance fixed by (length*resistivity) / (height*width) o bulk aluminum 2.8 µΩ-cm, bulk copper 1.7 µΩ-cm n Capacitance depends on geometry of surrounding wires and relative permittivity, εr,of dielectric o silicon dioxide εr = 3.9, new low-k dielectrics in range 1.2-3.1 6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 18. © .UVWH$VDQRYLü Current Interconnect Densities n Intel Pentium-III, 0.18µm, 6 aluminum layers, SiOF dielectric (εr = 3.1) Metal Layer Pitch (µm) Aspect Ratio M1 (Height/Width) 0.50 1.9 M2 0.64 2.2 M3 0.64 2.2 M4 1.08 2.0 M5 1.60 2.0 M6 1.76 2.0 6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 19. © .UVWH$VDQRYLü Wire Delays n Resistance, R, increases per unit length o in 0.25µm CMOS, ~1000λ thin M1 wire = minimum inverter resistance n Capacitance, C, increases per unit length o in 0.25µm CMOS, ~1000λ thin M1 wire = minimum inverter capacitance n Wire delay increases as RC, quadratic in length o in 0.25µm, ~1000λ thin M1 wire = 30ps (~τ) n Inserting repeaters makes delay linear with length Rw Cw Rw/2 Rw/2 Cw/2 Cw/2 6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 20. © .UVWH$VDQRYLü Scaling n Scale linear dimensions by factor S (around 0.7 / generation) n Chip size also increases 6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 21. © .UVWH$VDQRYLü Scaling Slides from Horowitz DAC’2000 Talk http://www.dac.com/37slides/05_2.ppt (link on class web page) 6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 22. © .UVWH$VDQRYLü