Uploaded by Md. Nasim Afroj Taj

Intel Chip Manufacturing Roadmap

advertisement
See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/359862158
Intel Chip Manufacturing Technology Roadmap
Research · February 2022
DOI: 10.13140/RG.2.2.11629.46566
CITATION
READS
1
2,115
1 author:
Md Nasim Afroj Taj
University of Virginia
22 PUBLICATIONS 1 CITATION
SEE PROFILE
All content following this page was uploaded by Md Nasim Afroj Taj on 10 April 2022.
The user has requested enhancement of the downloaded file.
Intel Chip Manufacturing Technology Roadmap
Md. Nasim Afroj Taj
1606037, Department of EEE
Bangladesh University of Engineering and Technology
Dhaka, Bangladesh
1606037@eee.buet.ac.bd
Abstract— This enhanced 14 nm manufacturing technology
node results in a 49% decrease in feature-neutral die area for
Intel CoreTM M and 5th generation CoreTM CPUs
(codenamed Broadwell). 14nm optimized a process flavor for
CoreTM M to increase mobile device energy efficiency.
Techniques and improvements were used to reduce TDP by 2.5x
while increasing graphics performance by 60%. New process
technologies and design strategies lowered the operating voltage
by 50 mV. More droop control, parallel boot LVR, and other
power-saving improvements allow Broadwell to reduce active
and standby power by 35% over the previous generation.
Broadwell's 3DL inductor technology reduces package
thickness by 30% and improves low-load efficiency. Repartitioning the SOC's IO and redesigning the DDR system
reduced I/O power by 30%. Shutting off different SOC die idle
states (C* states) reduced idle power by 60%. New software
driven co-optimization approaches including duty-cycle control
and dynamic display support were added to increase graphics
and display subsystem energy efficiency. Lithography creates
semis. Intel pioneered the industry's transition from DUV to
193nm, 157nm, and EUV. It, too, was abandoned. Intel's
lithography roadmap Plan driven by existing strategy and
tactics. TDP was reduced 2.5x while graphics performance
increased 60%. This reduced the working voltage by 50 mV.
Broadwell's improved droop control and parallel boot LVR save
35% active and standby power. Low-load efficiency through
reducing package thickness. 30 percent less electricity from
redesigned IO and DDR architecture. Improved graphics and
display subsystem energy efficiency by removing SOC die idle
states (C* states).
right CORE-TILES were replaced with MEMORY
CONTROLLER modules. The “NORTHCAP” comprises the
IO agents, serial-IP ports, CGU, global power management,
and the fuse unit. The ondie voltage regulators (FIVR) and
DDR-IO complete the full-chip assembly. Fig. 2.1.2 depicts
the 28-core floorplan progression. The 9 principal VCC
domains of the 28-core SKX CPU are shown in Figure 2.1.3
as 35 VCC planes. Multiple FIVRs and 5 MBVRs service
these VCC partitions. In addition to die area and current
compliance, VCC noise standards, and power supply
efficiency due to MB and package IR loss were considered.
With FIVR and a dedicated all-digital PLL for each core,
percore voltage-frequency tuning is possible. To assist
decrease core voltage droop, workload-based core droop
mitigation was included. Two FIVRs provide the un-CORE
VCC (VCCCLM) for the CORE-TILE LLC and non-CORE
components. One ADPLL serves the un-CORE as a whole.
For important analog circuits, LC filtered VCCs with package
L are also available (e.g. clock circuits in the high-speed IOs).
The SKX processor uses clock distribution algorithms similar
to [2]. SKX now has a block to detect and throttle the peak
current of the FIVR input supply. This reduced the quantity of
VR caps required to handle the present surge.
Keywords— SOC, Fabrication, DDR, SKX, EUV.
I. INTRODUCTION
A new generation of Xeon® server processors, SkyLakeSP (Scalable Performance) is built on Intel® 14nm tri-gate
CMOS technology with 11 metal layers [1,2]. The SKX CPU
family has three core counts. Each SKX core has 1MB of
dedicated L2 and 1.375MB of non-exclusive L3 (3rd level
cache). The SKX CPU provides 28 cores, 6 DDR4 channels
(2666MT/s), 320 UPI processor-to-processor linkages
(10.4GT/s), and x48+4 PCIE links (8GT/s). On-die integrated
voltage regulators (FIVR) allow per-core power-performance
tuning [3, 4]. It uses a revolutionary 2-dimensional
synchronous on-die MESH fabric. 2.1.1 The SKX processor's
general architecture. The SKX's greater maximum core count,
faster frequency, and enhanced IPC enabled generational
performance increases across all relevant server benchmarks.
To meet the power and frequency objectives, aggressive core
dynamic capacitance reduction (Cdyn) was used. SCAN
coverage and analog debug capabilities for crucial analog
circuits were added to ease high-volume manufacturing and
silicon debug. SKX has a flexible floorplan design to support
numerous products/sockets. (1) CORE-TILE and (2) on-diefabric (MESH). The CORE-TILE unit combines the core,
LLC, and core-to-MESH agent into one modular item. With
larger data capacity and lower latency than the RING [3]
predecessor, the MESH is a 2D synchronous fabric. Each
SKX layout began with a 5-by-6 CORE-TILE array. Left and
XXX-X-XXXX-XXXX-X/XX/$XX.00 ©2022 TAJ
Fig. 1. Intel Corporation [1]
2D Global Synchronous Fabric (MESH) linking AGENT,
CORE, and CACHE (Fig. 2.1.4). To join every CORE-TILE
with its four neighbors, the MESH eliminates the RING-toRING bridge logic from the prior RING design. In this case,
data is sent vertically first, then horizontally to the destination.
Due to this dataflow management, vertical MESH latency is
the most important element affecting total MESH
performance. The vertical TILE-to-TILE RC delay makes it
impossible to achieve single-cycle vertical MESH latency
beyond 2GHz with low unCORE power. The answer is to
transfer the V-MESH crucial portion to a higher fixed voltage
VCCIO supply. To achieve single-cycle vertical TILE to
TILE delay, the un-core VCC was raised. The SKX features a
performance-tuned core. Also, the L2 cache (1MB, 1024 sets
by 16 ways) is twice as fast as the predecessor's and the cache
bandwidth is doubled [3]. To support the L2 size, each CORETILE features a 2048 sets by 11 ways L3 and a 2048 sets by
12 ways snoop filter (SF). To increase manufacturability,
SKX caches included column and/or row fixes. To achieve
low VCC in a 28-CORE-TILE SKX with 28MB L2 and
38.5MB L3, significant FUSE resources were used. Silicon
data indicated a minimum VCC over the product's specs. This
figure depicts the CORE-TILE which contains the CORE,
AVX, L2, LLC, and CACHE-HOME AGENT (CHA). The
chip has 128 high-speed IOs, including 48 PCIE, 4 DMI, 16
onpackage PCIE at 2.5/5.0/8.0GT/s, and 60 UPI at 10.4GT/s.
The TX architecture and circuitry are from [5]. The RX
architecture supports the PCIE Separate Refclk Independent
SSC (SRIS) ECN. The RX front-end was re-architected to
exclude the variable gain amplifier (VGA) capability from the
CTLE to reach 10.4GT/s with low power impact. Rather, a
front-end attenuator before the CTLE. To reduce PVT
fluctuation, the CTLE had just two stages and critical
performance factors were device parameters. Fig. 2.1.5
depicts the CTLE topology. Simulations reveal 10.8dB AC
peaking at 5.2GHz Nyquist rate. The entire transceiver is 20%
smaller and uses 17% less power than the previous version [3].
SKX provides 6-channel DDR4 interfaces that can
accommodate 2-DIMM per channel at up to 2666MT/s for
total memory bandwidth of 128GB/s. The 6-channel interface
is physically separated into two portions, each with three
channels, on the die's left and right sides. The channel
configuration has data bytes on the north and south subsections, and command, control, clock, and PVT
compensation circuitry in the centre. To increase signal
integrity, this layout supports package routing escapes and
pin-out order matching between the CPU and DIMM card.
SKX DDR4 receiver (RX) architecture (Fig. 2.1.6). The 14nm
CMOS SKX server CPU is fully working and meets all
standards. Figure depicts the 28-core SkyLake-SP CPU die.
II. 2ND GENERATION FIVR & 3DL TECHNOLOGY
The design of the Fully Integrated Voltage Regulator
(FIVR) [2] has been considerably updated and improved in
order to provide extra value for the low-power Intel CoreTM
M CPU. This technology, known as 3D inductors[3DL], was
developed to allow for smaller and thinner packages by
replacing air-core inductors in the package with a standalone
inductor module that utilizes the space below the package
cavity, which would allow for increased volume for the air
core, and extends in the Z-axis down into the motherboard, as
shown in Fig.2. 3DL increased efficiency over a wider load
range while allowing for a smaller, more compact design.
Additional to this, Broadwell introduced Enhanced PkgC7
(C7+) in order to improve average power even more, which
was possible by the parallel boot Linear Voltage Regulator
(LVR). Due to the lower Vccin (1.3V) operation and the
minimization of FIVR static losses under low load situations,
efficiency rises in these states. In Fig.3, the LVR outputs are
connected to the FIVR rails that correspond to them. The
FIVR Control Module (FCM)/Boot LVR FSM is responsible
for the hand-off between the FIVR and the LVR in order to
power the rail system. When the di/dt ratio is high in the
compute domains, there is substantial output voltage droop.
Because of the advanced vector extensions (AVX) power
virus, the current demand is quite high, and the standard linear
VR control loop in Haswell 22nm silicon was unable to keep
up with it. According to Fig 4, Broadwell modified the FIVR
to include a fast improved Nonlinear Control Loop (NLC) to
decrease droop. Furthermore, the dynamic modification of the
FIVR input supply depending on the loadline, which
Broadwell adopted, led in up to a 10% power savings for
workloads in the 1-2W range for workloads in the 1W range.
Using Broadwell, we were able to decrease the footprint
of our packages by 50%. The following are key enablers:
0.63x die-area (wi 0.5mm ball pitch), 200um package core,
and 170 3DL technology (all in one package). When
comparing the CoreTM M platform to Intel CoreTM CPUs,
these features result in a decrease in power consumption.
III. MOORE’S LAW REVISITED THROUGH INTEL CHIP DENSITY
Fig. 2. 1 Broadwell Die Map (1.3B Transistors)
Introduction of Intel CoreTM M and 5th Generation
CoreTM Family Microprocessors on 14nm Process (Die Map
shown in Fig.1). CoreTM M at 4.5W has a 2.5x lower TDP
than 4th generation coreTM [1]. Fanless 2in1s, smaller PC and
mobile form factors are possible. It includes the following
technologies. The 14nm process optimizes power efficiency.
Area scales by 0.51x and capacitance by 0.65x (feature
neutral). Fanless optimization for Intel CoreTM M CPUs
required a new process flavor with >2x reduced leakage.
Voltage is the single greatest knob to scale the power
effectively (Power scales Voltage3). Broadwell improves
vmin by >10% via design, process, and architectural changes.
A novel method on optimum down-gridding of devices for
lower Vmin and capacitance in various sections of the die,
shifting the Mid-level cache to a separate supply thus allowing
the Core to receive lower voltage without being constrained
by cache are some of the strategies. ECC in graphics L3 cache,
mixed Vt sequential and per-die read-write aid to increase
cache vmin..
The research described here aimed to see whether the
premise of simple exponential trends in computer processor
technology was correct. In Figure 6, the trend lines A-B and
D-E depict the successive patterns in which the technologies
that drive the first logistic curve saturate and are replaced by
new ones [5]. Interestingly, two patterns emerge from the
same place but diverge (Fig 6B, 6C, 6E, and 6F), perhaps
indicating self-propagating performance increase [7, 8].
During each cycle, transistor density expanded tenfold in
around six years, then slowed to a crawl for at least three years.
Only two-thirds of the transistor's existence has seen rapid
transistor shrinking. This makes sense from an economic
standpoint, given the need to raise income by continuing to
produce items based on existing technology while also
introducing innovative ones. This enables economic rewards
to be obtained from the exponentially increasing research and
development expenditures required by each new pulse of
developments [9]. Miniaturization waves (denser and even
physically smaller circuits) may have enlarged markets as
much as the expanding chip size measured in units like
transistors. The waves that make up the process are driven by
technical developments, as seen by the transitory logistics of
CPUs represented above.
Fairchild Semiconductor's first commercial planar
transistor, produced in 1959 [6], was based on Bell Labs'
demonstration of the silicon transistor and adoption of
photolithography methods in 1954 and 1955, respectively [6],
providing the foundation for the first phase (line A). General
Microelectronics patented and marketed the metal-oxidesemiconductor field-effect transistor (MOSFET), the
cornerstone for all subsequent transistor technology, in 1964
[4], perhaps accounting for the start of the second logistic
wavelet (line B). Intel [6] was the first to introduce silicon gate
technology (SGT), which served as the foundation for all
succeeding microprocessors, starting with the 4004 and 8080
produced in 1971 [6], coinciding with the start of the third
wave (line C). Patented in 1977, high-density, short-channel
MOS (HMOS) significantly boosted transistor density for the
8086, which was introduced in 1978 [7]. The 80486, which
launched in 1989, allowed for much more transistors, enabling
complicated hardware like as an 8 kB cache and a floatingpoint math coprocessor to be included (line E). Deep–UV
excimer laser lithography, first developed in 1982 [8], was
commercially used in the 1990s [9], perhaps suggesting the
sixth wavelet (line F), since all CPUs launched after 1998 have
been built using this technique. The technologies that underpin
the third and sixth waves were possibly the most crucial in the
history of transistors, influencing the industry for two decades
each. While this provides a quick overview of some key causal
developments, we recommend Seitz and Einspruch [2] and
Lojek [3], as well as the IEEE article 25 Microchips That
Shook the World [7] and the website Computer History
Museum's The Silicon Engine: A Timeline of Semiconductors
in Computers [7] for more information.
The ability to monitor changes in mean transistor size,
which is the reciprocal of the density function, is an additional
benefit of the technique provided here; this is in contrast to the
traditional technology node procedure, which is defined by the
"minimum feature size" [2]. Even comparatively substantial
breakthroughs, such as Intel's 3D tri-gate technology, have
merely slowed down transistor shrinking since 2000 (Fig 7).
Advances in transistor shrinking have slowed significantly
over the previous two decades, indicating a deviation from the
International Technology Roadmap for Semiconductors,
assuming that this trend continues. This might also explain
why 10nm, 7nm, and smaller technologies are now proving
challenging to fabricate. Because of the "subwavelength gap"
at each technological node [3], manufacturing problems have
worsened. Indeed, strained SiGe [4], high-k metal-gate
transistors [5], Resolution Enhancement Technologies [6],
and FinFET circuits [3] have all permitted further
improvements in transistor density, although at a somewhat
slower linear scaling rate, as illustrated.
IV. INTEL 2019-2029 MANUFACTURING ROADMAP
Introducing the Intel Manufacturing Roadmap for the Next
Ten Years, which includes 7nm in 2021, 5nm in 2023, 3nm in
2025, 2nm in 2027, and 1.4 nm in 2029, as well as brand new
features and back porting.
It is said that the roadmap was initially displayed by Intel
themselves back in September, and that it was disclosed at the
IEEE International Electron Devices Meeting by one of Intel's
partners who indicated that the abovementioned slide was first
showcased by Intel themselves back in September. While Intel
has previously provided us with an in-depth look at their 7nm
process ambitions, the information included in this graphic
goes much farther. So, let's see what Intel has in store for us
in the following years based on this 10-year road plan that
Intel has provided. [5]
Fig. 4. A picture a wafer from Intel's foundry that was fabricated on
the 14nm process.
Starting off with the process roadmap, Intel will be
following a 2-year cadence for each major node update. We
got a soft launch of 10nm (10nm+) in 2019 which will be
followed by 7nm in 2021, 5nm in 2023, 3nm in 2025, 2nm in
2027 and 1.4nm in 2029. What's interesting here is that this 2year cadence is referred to as the optimal cost-performance
path by Intel themselves. So it would be Intel's priority to
follow this path, but there's also a yearly cadence for the + /
++ nodes that offer more performance leverage and scalability
opportunities on an existing node. Before we talk about the
optimized nodes for each process, we should focus on the key
features that each major node update has to offer. For 7nm,
Intel is saying the biggest feature is that it is made using EUV
(Extreme ultraviolet lithography) technology. Similarly, all
other major nodes will come with new features, but Intel hasn't
explicitly stated what new features we could expect.
At the same time as Intel introduces their 10nm++
products, they will also have production and launch planned
for their next-gen 7nm process node. The 10nm and 7nm
nodes were already detailed by Intel during their 2019
Investors Meeting. Starting with the 10nm family, Intel has
confirmed that their 10nm technology node is capable of
delivering significant improvements in performance per watt
over previous generations.
It has been shown that the initial iteration of 10nm is a
significant improvement in efficiency over the previous
iteration of 14nm++, and Intel expects to produce upgraded
variations of 10nm in the future, with 10nm+ in 2019,
followed by 10nm++ in 2020 and 10nm+++ in 2021. Some of
Fig. 3. Decreasing mean transistor size since 2000.
the significant improvements that 10nm would bring about are
as follows:
•
2.7x density scaling vs 14nm
•
Self-aligned Quad-Patterning
•
Contact Over Active Gate
•
Cobalt Interconnect (M0, M1)
•
1st Gen Foveros 3D Stacking
•
2nd Gen EMIB
Fig. 5. Intel's Process and Manufacturing Roadmap for the next 10
years shows 10nm, 7nm, 5nm, 3nm, 2nm, and 1.4nm.
While Intel will be introducing its 10nm+++ products at
the same time, they will also be preparing for the
manufacturing and introduction of their next-generation 7nm
process node. According to Intel, the 7nm manufacturing node
will continue to be optimized, with 7nm+ being introduced in
2022 and 7nm++ being introduced in 2023. Similarly to 10nm,
7nm will provide a long list of improvements over 10nm,
which will include the following:
(2029). (2029). There's no mention of an optimized route for
1.4nm, but this presentation only covers a 10-year plan, so you
may anticipate an optimized node path for 1.4nm at the very
least. So, in the future year, each main node will be followed
by an optimized '+' node, and then a tail-end optimized '++'
node. The '++' or, in the case of 10nm, +++ node, will debut
with the next major node, which is rather fascinating. The
optimized node will have various benefits over the new node,
including increased frequency and scalability from the prior
two upgrades, as well as a greater number of yields.
Intel has numerous pathways to pick from on each node
creation, so they can make some intriguing decisions here.
Given the timeliness of this roadmap, Intel may have already
determined what to do with 10nm and 7nm. Back porting on
an earlier but optimized node is also mentioned by Intel. Back
porting a 7nm product to 10nm+++, a 5nm product to 7nm++,
a 3nm product to 3nm++, and a 2nm product to 3nm++ is
possible. Back porting is not addressed for the 1.4nm node.
Recently, there have been reports and discussions of Intel
backporting a 10nm++ product (Tiger Lake) to 14nm+++
(Rocket Lake). Although substantial proof has been
discovered, Intel has yet to provide an official statement on
the problem, given the device is scheduled to ship in 2021.
However, given that this roadmap mentions back porting, it's
possible that Rocket Lake CPUs will include a back port of
the Willow Cove cores, which will run on a 10nm++ node on
the mobility platform.
VI. REFERENCES
[1]
[2]
•
2x density scaling compared to 10nm
[3]
•
Intra-node optimizations are being planned.
[4]
•
Design restrictions for EUV Next-Gen Feverous
[5]
•
EMIB Packaging have been reduced by fourfold.
Remember that the only process with a optimization since
it is already on 10nm+ in 2019. This is important to remember.
Although 1.4nm in 2029 seems to be a very promising
development, Intel has previously said that 10nm will be
available by 2015 and nm would be available by 2017. In a
recent interview, Intel's CEO, Bob Swan, stated that his
company is prepared to compete with TSMC by releasing
their first 7nm products in Q4 2021, which will compete with
TSMC's 5nm node. He also stated that his company expects to
reach 5nm, which he claims is equivalent to TSMC's 3nm
node, by the latter half of 2024, with product available in 2025.
V. CONCLUSION
The article also discusses back porting, which has been
one of the more fascinating subjects to address in recent
months, given the uproar over 14nm and 10nm nodes. At least
two optimizations have been found in each main node. 7nm
will receive 7nm+ (2022) and 7nm++ (2023), 5nm will get
5nm+ (2024) and 5nm++ (2025), 3nm will get 3nm+ (2026)
and 3nm++ (2027), and 2nm will get 2nm+ (2028) and 2nm++
View publication stats
[6]
[7]
[8]
[9]
[10]
[11]
[12]
S. Natarajan, et al., “A 14nm Logic Technology Featuring 2ndGeneration Finfet, Air-Gapped Interconnects, Self-Aligned
Double Patterning And A 0.0588 μm2 SRAM Cell Size,”
IEDM, pp. 3.7.1-3.7.3, 2014.
E. Fayneh, et al., “14nm 6th-Generation Core Processor SoC
with Low Power Consumption and Improved Performance,”
ISSCC, pp.72-73, 2016.
B. Bowhill, et al., “The Xeon® Processor E5-2600 v3: A 22nm
18-Core Product Family,” ISSCC, pp. 78-79, 2015.
A. Nalamalpu, “Design Optimization of Computing Systems from the Transistor to the Data Center,” ISSCC, 2017.
F. Spagna, et al., “A 78mW 11.8Gb/s Serial Link Transceiver
with Adaptive RX Equalization and Baud-Rate CDR in 32nm
CMOS,” ISSCC, pp. 366-377, 2010Murata, T. (1989). Petri
Nets: Properties, Analysis and Applications, Proceedings of the
IEEE, Vol. 77, No. 4, April 1989, 541-580
Ferain I, Colinge CA, Colinge J-P. Multigate transistors as the
future of classical metal-oxide-semiconductor field-effect
transistors. Nature. 2011;479: 310–316. pmid:22094690
Meyer PS, Ausubel JH. Carrying capacity: a model with
logistically varying limits. Technol Forecast Soc Change.
1999;61: 209–214.
Meyer PS, Yung JW, Ausubel JH. A primer on logistic growth
and substitution: the mathematics of the Loglet Lab software.
Technol Forecast Soc Change. 1999;61: 247–271.
Modis T. Forecasting the growth of complexity and change.
Technol Forecast Soc Change. 2002;69: 377–404.
B. Bowhill, et al., “The Xeon® Processor E5-2600 v3: A 22nm
18-Core Product Family,” ISSCC, pp. 78-79, 2015.
A. Nalamalpu, “Design Optimization of Computing Systems from the Transistor to the Data Center,” ISSCC, 2017
Mujtaba, H. (2019, December 11). Intel Lays Down 2019-2029
Manufacturing Roadmap - 1.4nm In 10 Years, 2 Year Cadence
With Back Porting On Advanced ++ Nodes. Wccftech.
https://wccftech.com/intel-2021-2029-process-roadmap10nm-7nm-5nm-3nm-2nm-1nm-back-porting/..
Download