VLSI for Architects Future Computing Infrastructure

advertisement
VLSI for Architects
.UVWH$VDQRYLü
Computer Architecture Group
MIT Laboratory for Computer Science
krste@lcs.mit.edu
http://www.cag.lcs.mit.edu/6.893-f2000/
6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 1. © .UVWH$VDQRYLü
Future Computing Infrastructure
µWatt Wireless
Sensor Networks
Base Stations
MegaWatt
Data Centers
Wireless
Internet
Internet
PDAs, Cameras,
Cellphones,
Laptops, GPS,
Set-tops,
0.1-10 Watt Clients
Routers
6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 2. © .UVWH$VDQRYLü
Semiconductor Trends
n Non-Recurring Engineering (NRE) costs are increasing
rapidly for new designs
o >$1M for masks to spin a new design
o Engineers cost ~$200K/year (salary+benefits+overhead)
o Pentium Pro design verification took around 350 engineer
years or ~$70M
=> Tremendous economies of scale
(Can’t sell <1,000,000 parts for <$100 each)
n CMOS following Moore’s Law until (at least) 2011-2014
o ITRS’99* roadmap 2011, 50nm technology
l 64 Gb DRAMs (8 GB/chip)
l 7 billion transistor CPUs
l 10 GHz clocks (100 ps cycle time)
=> Smallest viable chips have huge capacity
(~10 million transistors/mm2,
10 million transistors per person per day)
[*International Technology Roadmap for Semiconductors]
6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 3. © .UVWH$VDQRYLü
Programmable Silicon Replaces Custom Hardware
Stereo Video
Universal
Wireless
Flash
Storage
Stereo Audio I/O
Programmable
Silicon
Position Sensor/
Accelerometer
Other
Sensors/Effectors
DRAM
Display +
Touchscreen
Programmable silicon replaces ASICs, or collections of
DSPs, microprocessors and glue logic
6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 4. © .UVWH$VDQRYLü
Benchmarks & Metrics
n Application space wider than desktop processors
o Benchmark as many applications as possible
o Include apps done with special hardware now (graphics, audio, crypto)
o Whole system measures
o Real-time important
n Primary metrics
o Cost (related to die area but also whole system cost)
o Execution Time (latency and throughput, average and worst-case)
o Energy (also peak power and peak switching current)
n Compare against best possible solution for each application
o How much worse than application-specific circuitry?
o Moore’s law perhaps makes area the most forgiving dimension
l try to keep energy and delay competitive, possibly at expense of area
6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 5. © .UVWH$VDQRYLü
VLSI for Architects
Two types of question architects ask:
n How will this change affect area/delay/energy in current
technology?
n How will this design scale to future technologies?
n For next 10-15 years, the technology is CMOS
6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 6. © .UVWH$VDQRYLü
Transistors
Gate Drain
Source
Drain
Gate
Width
Source
a) Circuit Symbol
Length
Minimum Length=2λ
b) Physical Realization
Drain
Gate
Ron
Drain Width=4λ
Cdrain
Gate
Cgate
Csource
Source
c) Layout View
d) Simple RC Model
6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 7. © .UVWH$VDQRYLü
Transistors
©IBM
Source
Bulk
IBM SOI Technology
6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 8. © .UVWH$VDQRYLü
Method of Logical Effort
(Sutherland and Sproul)
n Easy way to estimate delays in CMOS process
n Indicates correct number of logic stages to use and
transistor sizes
n Characterize process speed with single delay parameter: τ
τ, delay of inverter driving same-sized inverter (no parasitics)
τ in range 10-15ps for 0.18µm processes
6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 9. © .UVWH$VDQRYLü
Gate Delay Components
Cin
Logic Gate
Cout
n Split delay of logic gate into three components
Delay = Logical Effort x Electrical Effort + Parasitic Delay
n Logical Effort
o Complexity of logic function (Invert, NAND, NOR, etc)
o Define inverter has logical effort = 1
o Depends only on topology not transistor sizing
n Electrical Effort
o Ratio of output capacitance to input capacitance Cout/Cin
n Parasitic Delay
o Intrinsic self-loading of gate
o Independent of transistor sizes and output load
6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 10. © .UVWH$VDQRYLü
Logical Effort for Simple Gates
n Define Logical Effort of Inverter = 1
n For other gates, size to give same current drive as inverter
n Logical Effort is ratio of logic gate’s input cap. to inverter’s
input cap.
Relative
Transistor
Widths
2
2
2
4
4
2
1
1
2
1
Inverter
NAND
NOR
Input Cap = 3 units
Input Cap = 4 units
Input Cap = 5 units
L.E.=1 (definition)
L.E.=4/3
L.E.=5/3
6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 11. © .UVWH$VDQRYLü
Electrical Effort
Cin
Logic Gate
Cout
n Ratio of output load capacitance over input
capacitance:
E.E. = Cout/Cin
n Usually, transistors have minimum length
n Input and output capacitances can be measured in
units of transistor gate widths
6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 12. © .UVWH$VDQRYLü
Parasitic Delay
n Main cause is drain capacitances
n These scale with transistor width
CgateP
RonP
CdrainP
so P.D. independent of transistor
sizes
n Useful approximation:
Cgate ~= Cdrain
n For inverter:
RonN
CdrainN
Parasitic Delay ~= 1.0 τ
CgateN
6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 13. © .UVWH$VDQRYLü
Inverter Chain Delay
n For each stage:
Delay = Logical Effort x Electrical Effort + Parasitic Delay
= 1.0 (definition) x 1.0 (in = out) + 1.0 (drain caps)
= 2.0 units
6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 14. © .UVWH$VDQRYLü
Optimizing Circuit Paths
Cin
Cout
n Path logical effort, G = Π gi
(gi = L.E. stage i)
n Path electrical effort, H = Cout/Cin
(hi = E.E. stage i)
n Parasitic delay, P = Σ pi
(pi = P.D. stage i)
n Path effort, F = GH
n Minimum delay when each of N stages has equal effort
Min. D = NF1/N + P
i.e.
gi hi = F1/N
6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 15. © .UVWH$VDQRYLü
Optimal Number of Stages
Cin
Cout
n Minimum delay when:
stage effort = logical effort x electrical effort ~= 3.4-3.8
o Some derivations have e = 2.718.. as best stage effort – this ignores
parasitics
o Broad optimum, stage efforts of 2.4-6.0 within 15-20% of minimum
n Fan-out-of-four (FO4) is convenient design size (~5τ)
FO4 delay: Delay of
inverter driving four
copies of itself
6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 16. © .UVWH$VDQRYLü
Wires
© IBM
© IBM
© IBM
IBM CMOS7
process
6 layers of
copper wiring
6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 17. © .UVWH$VDQRYLü
Wires
Pitch
Height
Length
Width
n Resistance fixed by (length*resistivity) / (height*width)
o bulk aluminum 2.8 µΩ-cm, bulk copper 1.7 µΩ-cm
n Capacitance depends on geometry of surrounding
wires and relative permittivity, εr,of dielectric
o silicon dioxide εr = 3.9, new low-k dielectrics in range 1.2-3.1
6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 18. © .UVWH$VDQRYLü
Current Interconnect Densities
n Intel Pentium-III, 0.18µm, 6 aluminum layers, SiOF
dielectric (εr = 3.1)
Metal Layer Pitch (µm)
Aspect Ratio
M1
(Height/Width)
0.50
1.9
M2
0.64
2.2
M3
0.64
2.2
M4
1.08
2.0
M5
1.60
2.0
M6
1.76
2.0
6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 19. © .UVWH$VDQRYLü
Wire Delays
n Resistance, R, increases per unit length
o in 0.25µm CMOS, ~1000λ thin M1 wire = minimum inverter resistance
n Capacitance, C, increases per unit length
o in 0.25µm CMOS, ~1000λ thin M1 wire = minimum inverter capacitance
n Wire delay increases as RC, quadratic in length
o in 0.25µm, ~1000λ thin M1 wire = 30ps (~τ)
n Inserting repeaters makes delay linear with length
Rw
Cw
Rw/2
Rw/2
Cw/2
Cw/2
6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 20. © .UVWH$VDQRYLü
Scaling
n Scale linear dimensions by factor S (around 0.7 / generation)
n Chip size also increases
6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 21. © .UVWH$VDQRYLü
Scaling Slides from Horowitz DAC’2000 Talk
http://www.dac.com/37slides/05_2.ppt
(link on class web page)
6.893: Advanced VLSI Computer Architecture, September 12, 2000, Lecture 2, Slide 22. © .UVWH$VDQRYLü
Download