Technology Trends

advertisement
Outline
• Goal of this class:
CS5100 Advanced Computer Architecture
− To understand the trends of IC technology and be able to
relate the trends with computer architecture designs
− Why need to know the trends?
Technology Trends
• Learn from the history
• Understand possible future and know how to adapt now
Prof. Chung-Ta King
• Class outline:
Department of Computer Science
National Tsing Hua University, Taiwan
− Trends in technology (Sec. 1.4)
− Trends in power and energy (Sec. 1.5)
− Trends in cost (Sec. 1.6)
1
National Tsing Hua University
National Tsing Hua University
IC Technology and Processor Performance
Review of Transistors (MOSFET) on IC
# transistors on ICs x2 every 2 years
Exponential growth
Source: Intel
Corp.
2
National Tsing Hua University
3
National Tsing Hua University
Technology Scaling
• Feature size:
− Minimum size of transistor or wire in x or y
dimension
− 10 microns in 1971 to 22 nm in 2012
− New technology node every 2 years or so
− ~70% (S) reduction for each generation
0.7x
0.7x
National Tsing Hua University
10 µm – 1971
6 µm – 1974
3 µm – 1977
1.5 µm – 1982
1 µm – 1985
800 nm – 1989
600 nm – 1994
350 nm – 1995
250 nm – 1997
180 nm – 1999
130 nm – 2001
90 nm – 2004
65 nm – 2006
45 nm – 2008
32 nm – 2010
22 nm – 2012
14 nm – 2014
4
10 nm – 2016
Effects of Scaling
• More transistors per unit area
− Feature size reduced by 0.7 (S) area of a transistor
reduced by 0.5 (S2)
− 2X # transistors/unit area
− Fixed cost per wafer lower cost per transistor
• Faster transistors
− Reduce time to switch on/off transistors
speed improved by S
exponential increase in clock rate
• Less supplied voltage and power
− Power to switch transistor reduced, but not power density
− Voltage to drive transistors reduced
5
National Tsing Hua University
Effects of Scaling
Summary: Technology Trends
• Local wires are getting faster
• Global wires are getting slower, i.e. scale poorly
• Integrated circuit technology
− Transistor density: 35%/year
− Die size: 10-20%/year
− Integration overall: 40-55%/year (slow down after 2003!)
− No longer possible to cross chip in one cycle
− Computer architects need to plan around this
• DRAM capacity: 25-40%/year (slowing)
• Flash capacity: 50-60%/year
Chip size
− 15-20X cheaper/bit than DRAM
Scaling of
reachable radius
3D stacking
Distributed mechanisms
• Magnetic disk capacity: 40%/year
− 15-25X cheaper/bit than Flash
− 300-500X cheaper/bit than DRAM
− But not speed
6
National Tsing Hua University
7
National Tsing Hua University
Implications for Computer Architecture
Bandwidth versus Latency
• High rate of density improvements
• Bandwidth or throughput
− Used for bringing 4-bit, 8-bit, through 64-bit
microprocessors in the early days of microprocessors
− Used for multiprocessor per chip, wider SIMD, …, in recent
years
• Quantitative changes leading to qualitative changes
− 25K to 30K transistors per chip in early 1980s
possible to build a single-chip 32-bit microprocessor
− By mid 1980s, FP unit can be integrated
− By late 1980s, L1 cache can fit on the same chip
Performance improvements often in discrete steps
− Total work done in a given time
− 10,000-25,000X improvement for processors
− 300-1200X improvement for memory and disks
• Latency or response time
− Time between start and completion of an event
− 30-80X improvement for processors
− 6-8X improvement for memory and
disks
• Work with signal propagation delay on wires
8
National Tsing Hua University
9
National Tsing Hua University
Bandwidth and Latency
Summary: Bandwidth and Latency
• For disk, LAN, memory & microprocessor, bandwidth
improves by square of latency improvement
Log-log plot
of
bandwidth
and latency
milestones
− In the time that bandwidth doubles, latency improves by
no more than 1.2X to 1.4X
• Lag probably even larger in real systems, as BW gains
multiplied by replicated components
−
−
−
−
Multiple processors in a cluster or in a chip
Multiple disks in a disk array
Multiple memory modules in a large memory
Simultaneous communication in switched LAN
• HW and SW developers should innovate assuming
latency lags bandwidth
10
National Tsing Hua University
11
National Tsing Hua University
Outline
Power Density Trend
• Trends in technology (Sec. 1.4)
• Trends in power and energy (Sec. 1.5)
• Trends in cost (Sec. 1.6)
P = αCVdd f + Vdd I st + Vdd I leak
2
Source: Intel Corp.
12
National Tsing Hua University
13
National Tsing Hua University
Power
Power and Energy
• Intel 80386 consumed ~2 W, but 3.3 GHz Intel Core
i7 consumes 130 W
• Heat must be
dissipated from
the chip
• Today, power is
major limitation
to using
transistors, not
silicon area
• Pavg = Pdynamic + Pstatic
• Energy is related to power through time
• If power dissipation remains constant through time
T, then
E = (Pavg x T)
14
National Tsing Hua University
15
National Tsing Hua University
Dynamic Power and Energy
Static Power
• For CMOS chips, traditional dominant energy
consumption has been in switching transistors,
called dynamic power
• Because leakage current flows even when a
transistor is off, now static power is important too
− Currentstatic x Voltage
− Scales with number of transistors
− Increase as transistors shrink and # transistors increases
− ½ x capacitive load x voltage2 x frequency switched
• For mobile devices, energy is better metric
• With 65nm or better technologies, leakage can account for
50% of total power if not designed properly
− ½ x capacitive load x voltage2
• Reducing clock rate reduces power, but not energy
• Reducing power:
−
−
−
−
− To reduce: power gating
Do nothing well: turn off clock of inactive modules
Dynamic Voltage-Frequency Scaling (DVFS)
Low power state for DRAM, disks
Overclocking, turning off cores
16
National Tsing Hua University
17
National Tsing Hua University
Implications for Computer Architecture
Outline
• Architectural designs for low power using metrics
such as tasks per joule or performance per watt
• Trends in technology (Sec. 1.4)
• Trends in power and energy (Sec. 1.5)
• Trends in cost (Sec. 1.6)
− Use the right power/energy to do the right things
• Sometimes, do things faster but at a higher power
may be better race to halt
− Often techniques for performance also lead to power
saving
18
National Tsing Hua University
19
National Tsing Hua University
VLSI Economics
NRE
• Selling price Stotal
• Engineering cost
− Stotal = Ctotal / (1-m)
− Depends on size of design team, including benefits,
training, computers
− CAD tools:
• m = profit margin
• Ctotal = total cost
• Digital front end: $10K
• Analog front end: $100K
• Digital back end: $1M
− Nonrecurring engineering cost (NRE)
− Recurring cost
− Fixed cost: data sheets and application notes, marketing
and advertising, yield analysis
• Prototype manufacturing
− Mask costs: $500k – 1M in 130 nm process
− Test fixture and package tooling
20
National Tsing Hua University
21
National Tsing Hua University
Recurring Cost of IC
Cost and Computer Architecture
• Sole control of computer architects on IC cost is die
area, and hence a portion of the cost
− What functions should be included or excluded in the
design?
− Number of I/O pins
− Design complexities
− Defects per unit area = 0.016-0.057 defects per cm2 (2010)
− N = process-complexity factor = 11.5-15.5 (40 nm, 2010)
22
National Tsing Hua University
23
National Tsing Hua University
Technology and Architecture
Technology and Architecture
• How to translate technology improvements into
increases in computing performance?
• Increased transistor counts:
− Basic strategies: parallelism, speculation, overlapping,
monitoring/profiling
− Modular and hierarchical architectures
− Constraints on power dissipation, localized
communication, design and verification complexities
• Increasing clock frequency:
− Need to tackle power, heat, clock skew, wire delay
− Gap to memory and I/O devices, PC board design
multi-level cache (with on-chip cache)
− Need scalable design with little complexity, parallelism
e.g., multiple functional units, RICS cores
− Need good locality, avoid long distance and rapid
interaction, e.g., MP on a chip
• Shorter wires, lower complexity, scale with technology
• On-chip cache/DRAM, MP on a chip, multithreading, vector
processing, VLIW
− For monitoring and learning program’s execution and
subsequently recasting it for faster execution
− Self-adapting, self-management, self-healing, …
− More functionalities: multimedia, facilities for I/O and
memory, bandwidth and latency improvement
24
National Tsing Hua University
Recap
• Trends in technology
• Trends in power and energy
• Do you understand the trends of IC technology?
• Can you explain the implications and relate the
trends with computer architecture designs?
26
National Tsing Hua University
25
National Tsing Hua University
Download