Intel Chip Manufacturing Roadmap

See discussions, stats, and author profiles for this publication at: https://www.researchgate.net/publication/359862158 Intel Chip Manufacturing Technology Roadmap Research · February 2022 DOI: 10.13140/RG.2.2.11629.46566 CITATION READS 1 2,115 1 author: Md Nasim Afroj Taj University of Virginia 22 PUBLICATIONS 1 CITATION SEE PROFILE All content following this page was uploaded by Md Nasim Afroj Taj on 10 April 2022. The user has requested enhancement of the downloaded file. Intel Chip Manufacturing Technology Roadmap Md. Nasim Afroj Taj 1606037, Department of EEE Bangladesh University of Engineering and Technology Dhaka, Bangladesh 1606037@eee.buet.ac.bd Abstract— This enhanced 14 nm manufacturing technology node results in a 49% decrease in feature-neutral die area for Intel CoreTM M and 5th generation CoreTM CPUs (codenamed Broadwell). 14nm optimized a process flavor for CoreTM M to increase mobile device energy efficiency. Techniques and improvements were used to reduce TDP by 2.5x while increasing graphics performance by 60%. New process technologies and design strategies lowered the operating voltage by 50 mV. More droop control, parallel boot LVR, and other power-saving improvements allow Broadwell to reduce active and standby power by 35% over the previous generation. Broadwell's 3DL inductor technology reduces package thickness by 30% and improves low-load efficiency. Repartitioning the SOC's IO and redesigning the DDR system reduced I/O power by 30%. Shutting off different SOC die idle states (C* states) reduced idle power by 60%. New software driven co-optimization approaches including duty-cycle control and dynamic display support were added to increase graphics and display subsystem energy efficiency. Lithography creates semis. Intel pioneered the industry's transition from DUV to 193nm, 157nm, and EUV. It, too, was abandoned. Intel's lithography roadmap Plan driven by existing strategy and tactics. TDP was reduced 2.5x while graphics performance increased 60%. This reduced the working voltage by 50 mV. Broadwell's improved droop control and parallel boot LVR save 35% active and standby power. Low-load efficiency through reducing package thickness. 30 percent less electricity from redesigned IO and DDR architecture. Improved graphics and display subsystem energy efficiency by removing SOC die idle states (C* states). right CORE-TILES were replaced with MEMORY CONTROLLER modules. The “NORTHCAP” comprises the IO agents, serial-IP ports, CGU, global power management, and the fuse unit. The ondie voltage regulators (FIVR) and DDR-IO complete the full-chip assembly. Fig. 2.1.2 depicts the 28-core floorplan progression. The 9 principal VCC domains of the 28-core SKX CPU are shown in Figure 2.1.3 as 35 VCC planes. Multiple FIVRs and 5 MBVRs service these VCC partitions. In addition to die area and current compliance, VCC noise standards, and power supply efficiency due to MB and package IR loss were considered. With FIVR and a dedicated all-digital PLL for each core, percore voltage-frequency tuning is possible. To assist decrease core voltage droop, workload-based core droop mitigation was included. Two FIVRs provide the un-CORE VCC (VCCCLM) for the CORE-TILE LLC and non-CORE components. One ADPLL serves the un-CORE as a whole. For important analog circuits, LC filtered VCCs with package L are also available (e.g. clock circuits in the high-speed IOs). The SKX processor uses clock distribution algorithms similar to [2]. SKX now has a block to detect and throttle the peak current of the FIVR input supply. This reduced the quantity of VR caps required to handle the present surge. Keywords— SOC, Fabrication, DDR, SKX, EUV. I. INTRODUCTION A new generation of Xeon® server processors, SkyLakeSP (Scalable Performance) is built on Intel® 14nm tri-gate CMOS technology with 11 metal layers [1,2]. The SKX CPU family has three core counts. Each SKX core has 1MB of dedicated L2 and 1.375MB of non-exclusive L3 (3rd level cache). The SKX CPU provides 28 cores, 6 DDR4 channels (2666MT/s), 320 UPI processor-to-processor linkages (10.4GT/s), and x48+4 PCIE links (8GT/s). On-die integrated voltage regulators (FIVR) allow per-core power-performance tuning [3, 4]. It uses a revolutionary 2-dimensional synchronous on-die MESH fabric. 2.1.1 The SKX processor's general architecture. The SKX's greater maximum core count, faster frequency, and enhanced IPC enabled generational performance increases across all relevant server benchmarks. To meet the power and frequency objectives, aggressive core dynamic capacitance reduction (Cdyn) was used. SCAN coverage and analog debug capabilities for crucial analog circuits were added to ease high-volume manufacturing and silicon debug. SKX has a flexible floorplan design to support numerous products/sockets. (1) CORE-TILE and (2) on-diefabric (MESH). The CORE-TILE unit combines the core, LLC, and core-to-MESH agent into one modular item. With larger data capacity and lower latency than the RING [3] predecessor, the MESH is a 2D synchronous fabric. Each SKX layout began with a 5-by-6 CORE-TILE array. Left and XXX-X-XXXX-XXXX-X/XX/$XX.00 ©2022 TAJ Fig. 1. Intel Corporation [1] 2D Global Synchronous Fabric (MESH) linking AGENT, CORE, and CACHE (Fig. 2.1.4). To join every CORE-TILE with its four neighbors, the MESH eliminates the RING-toRING bridge logic from the prior RING design. In this case, data is sent vertically first, then horizontally to the destination. Due to this dataflow management, vertical MESH latency is the most important element affecting total MESH performance. The vertical TILE-to-TILE RC delay makes it impossible to achieve single-cycle vertical MESH latency beyond 2GHz with low unCORE power. The answer is to transfer the V-MESH crucial portion to a higher fixed voltage VCCIO supply. To achieve single-cycle vertical TILE to TILE delay, the un-core VCC was raised. The SKX features a performance-tuned core. Also, the L2 cache (1MB, 1024 sets by 16 ways) is twice as fast as the predecessor's and the cache bandwidth is doubled [3]. To support the L2 size, each CORETILE features a 2048 sets by 11 ways L3 and a 2048 sets by 12 ways snoop filter (SF). To increase manufacturability, SKX caches included column and/or row fixes. To achieve low VCC in a 28-CORE-TILE SKX with 28MB L2 and 38.5MB L3, significant FUSE resources were used. Silicon data indicated a minimum VCC over the product's specs. This figure depicts the CORE-TILE which contains the CORE, AVX, L2, LLC, and CACHE-HOME AGENT (CHA). The chip has 128 high-speed IOs, including 48 PCIE, 4 DMI, 16 onpackage PCIE at 2.5/5.0/8.0GT/s, and 60 UPI at 10.4GT/s. The TX architecture and circuitry are from [5]. The RX architecture supports the PCIE Separate Refclk Independent SSC (SRIS) ECN. The RX front-end was re-architected to exclude the variable gain amplifier (VGA) capability from the CTLE to reach 10.4GT/s with low power impact. Rather, a front-end attenuator before the CTLE. To reduce PVT fluctuation, the CTLE had just two stages and critical performance factors were device parameters. Fig. 2.1.5 depicts the CTLE topology. Simulations reveal 10.8dB AC peaking at 5.2GHz Nyquist rate. The entire transceiver is 20% smaller and uses 17% less power than the previous version [3]. SKX provides 6-channel DDR4 interfaces that can accommodate 2-DIMM per channel at up to 2666MT/s for total memory bandwidth of 128GB/s. The 6-channel interface is physically separated into two portions, each with three channels, on the die's left and right sides. The channel configuration has data bytes on the north and south subsections, and command, control, clock, and PVT compensation circuitry in the centre. To increase signal integrity, this layout supports package routing escapes and pin-out order matching between the CPU and DIMM card. SKX DDR4 receiver (RX) architecture (Fig. 2.1.6). The 14nm CMOS SKX server CPU is fully working and meets all standards. Figure depicts the 28-core SkyLake-SP CPU die. II. 2ND GENERATION FIVR & 3DL TECHNOLOGY The design of the Fully Integrated Voltage Regulator (FIVR) [2] has been considerably updated and improved in order to provide extra value for the low-power Intel CoreTM M CPU. This technology, known as 3D inductors[3DL], was developed to allow for smaller and thinner packages by replacing air-core inductors in the package with a standalone inductor module that utilizes the space below the package cavity, which would allow for increased volume for the air core, and extends in the Z-axis down into the motherboard, as shown in Fig.2. 3DL increased efficiency over a wider load range while allowing for a smaller, more compact design. Additional to this, Broadwell introduced Enhanced PkgC7 (C7+) in order to improve average power even more, which was possible by the parallel boot Linear Voltage Regulator (LVR). Due to the lower Vccin (1.3V) operation and the minimization of FIVR static losses under low load situations, efficiency rises in these states. In Fig.3, the LVR outputs are connected to the FIVR rails that correspond to them. The FIVR Control Module (FCM)/Boot LVR FSM is responsible for the hand-off between the FIVR and the LVR in order to power the rail system. When the di/dt ratio is high in the compute domains, there is substantial output voltage droop. Because of the advanced vector extensions (AVX) power virus, the current demand is quite high, and the standard linear VR control loop in Haswell 22nm silicon was unable to keep up with it. According to Fig 4, Broadwell modified the FIVR to include a fast improved Nonlinear Control Loop (NLC) to decrease droop. Furthermore, the dynamic modification of the FIVR input supply depending on the loadline, which Broadwell adopted, led in up to a 10% power savings for workloads in the 1-2W range for workloads in the 1W range. Using Broadwell, we were able to decrease the footprint of our packages by 50%. The following are key enablers: 0.63x die-area (wi 0.5mm ball pitch), 200um package core, and 170 3DL technology (all in one package). When comparing the CoreTM M platform to Intel CoreTM CPUs, these features result in a decrease in power consumption. III. MOORE’S LAW REVISITED THROUGH INTEL CHIP DENSITY Fig. 2. 1 Broadwell Die Map (1.3B Transistors) Introduction of Intel CoreTM M and 5th Generation CoreTM Family Microprocessors on 14nm Process (Die Map shown in Fig.1). CoreTM M at 4.5W has a 2.5x lower TDP than 4th generation coreTM [1]. Fanless 2in1s, smaller PC and mobile form factors are possible. It includes the following technologies. The 14nm process optimizes power efficiency. Area scales by 0.51x and capacitance by 0.65x (feature neutral). Fanless optimization for Intel CoreTM M CPUs required a new process flavor with >2x reduced leakage. Voltage is the single greatest knob to scale the power effectively (Power scales Voltage3). Broadwell improves vmin by >10% via design, process, and architectural changes. A novel method on optimum down-gridding of devices for lower Vmin and capacitance in various sections of the die, shifting the Mid-level cache to a separate supply thus allowing the Core to receive lower voltage without being constrained by cache are some of the strategies. ECC in graphics L3 cache, mixed Vt sequential and per-die read-write aid to increase cache vmin.. The research described here aimed to see whether the premise of simple exponential trends in computer processor technology was correct. In Figure 6, the trend lines A-B and D-E depict the successive patterns in which the technologies that drive the first logistic curve saturate and are replaced by new ones [5]. Interestingly, two patterns emerge from the same place but diverge (Fig 6B, 6C, 6E, and 6F), perhaps indicating self-propagating performance increase [7, 8]. During each cycle, transistor density expanded tenfold in around six years, then slowed to a crawl for at least three years. Only two-thirds of the transistor's existence has seen rapid transistor shrinking. This makes sense from an economic standpoint, given the need to raise income by continuing to produce items based on existing technology while also introducing innovative ones. This enables economic rewards to be obtained from the exponentially increasing research and development expenditures required by each new pulse of developments [9]. Miniaturization waves (denser and even physically smaller circuits) may have enlarged markets as much as the expanding chip size measured in units like transistors. The waves that make up the process are driven by technical developments, as seen by the transitory logistics of CPUs represented above. Fairchild Semiconductor's first commercial planar transistor, produced in 1959 [6], was based on Bell Labs' demonstration of the silicon transistor and adoption of photolithography methods in 1954 and 1955, respectively [6], providing the foundation for the first phase (line A). General Microelectronics patented and marketed the metal-oxidesemiconductor field-effect transistor (MOSFET), the cornerstone for all subsequent transistor technology, in 1964 [4], perhaps accounting for the start of the second logistic wavelet (line B). Intel [6] was the first to introduce silicon gate technology (SGT), which served as the foundation for all succeeding microprocessors, starting with the 4004 and 8080 produced in 1971 [6], coinciding with the start of the third wave (line C). Patented in 1977, high-density, short-channel MOS (HMOS) significantly boosted transistor density for the 8086, which was introduced in 1978 [7]. The 80486, which launched in 1989, allowed for much more transistors, enabling complicated hardware like as an 8 kB cache and a floatingpoint math coprocessor to be included (line E). Deep–UV excimer laser lithography, first developed in 1982 [8], was commercially used in the 1990s [9], perhaps suggesting the sixth wavelet (line F), since all CPUs launched after 1998 have been built using this technique. The technologies that underpin the third and sixth waves were possibly the most crucial in the history of transistors, influencing the industry for two decades each. While this provides a quick overview of some key causal developments, we recommend Seitz and Einspruch [2] and Lojek [3], as well as the IEEE article 25 Microchips That Shook the World [7] and the website Computer History Museum's The Silicon Engine: A Timeline of Semiconductors in Computers [7] for more information. The ability to monitor changes in mean transistor size, which is the reciprocal of the density function, is an additional benefit of the technique provided here; this is in contrast to the traditional technology node procedure, which is defined by the "minimum feature size" [2]. Even comparatively substantial breakthroughs, such as Intel's 3D tri-gate technology, have merely slowed down transistor shrinking since 2000 (Fig 7). Advances in transistor shrinking have slowed significantly over the previous two decades, indicating a deviation from the International Technology Roadmap for Semiconductors, assuming that this trend continues. This might also explain why 10nm, 7nm, and smaller technologies are now proving challenging to fabricate. Because of the "subwavelength gap" at each technological node [3], manufacturing problems have worsened. Indeed, strained SiGe [4], high-k metal-gate transistors [5], Resolution Enhancement Technologies [6], and FinFET circuits [3] have all permitted further improvements in transistor density, although at a somewhat slower linear scaling rate, as illustrated. IV. INTEL 2019-2029 MANUFACTURING ROADMAP Introducing the Intel Manufacturing Roadmap for the Next Ten Years, which includes 7nm in 2021, 5nm in 2023, 3nm in 2025, 2nm in 2027, and 1.4 nm in 2029, as well as brand new features and back porting. It is said that the roadmap was initially displayed by Intel themselves back in September, and that it was disclosed at the IEEE International Electron Devices Meeting by one of Intel's partners who indicated that the abovementioned slide was first showcased by Intel themselves back in September. While Intel has previously provided us with an in-depth look at their 7nm process ambitions, the information included in this graphic goes much farther. So, let's see what Intel has in store for us in the following years based on this 10-year road plan that Intel has provided. [5] Fig. 4. A picture a wafer from Intel's foundry that was fabricated on the 14nm process. Starting off with the process roadmap, Intel will be following a 2-year cadence for each major node update. We got a soft launch of 10nm (10nm+) in 2019 which will be followed by 7nm in 2021, 5nm in 2023, 3nm in 2025, 2nm in 2027 and 1.4nm in 2029. What's interesting here is that this 2year cadence is referred to as the optimal cost-performance path by Intel themselves. So it would be Intel's priority to follow this path, but there's also a yearly cadence for the + / ++ nodes that offer more performance leverage and scalability opportunities on an existing node. Before we talk about the optimized nodes for each process, we should focus on the key features that each major node update has to offer. For 7nm, Intel is saying the biggest feature is that it is made using EUV (Extreme ultraviolet lithography) technology. Similarly, all other major nodes will come with new features, but Intel hasn't explicitly stated what new features we could expect. At the same time as Intel introduces their 10nm++ products, they will also have production and launch planned for their next-gen 7nm process node. The 10nm and 7nm nodes were already detailed by Intel during their 2019 Investors Meeting. Starting with the 10nm family, Intel has confirmed that their 10nm technology node is capable of delivering significant improvements in performance per watt over previous generations. It has been shown that the initial iteration of 10nm is a significant improvement in efficiency over the previous iteration of 14nm++, and Intel expects to produce upgraded variations of 10nm in the future, with 10nm+ in 2019, followed by 10nm++ in 2020 and 10nm+++ in 2021. Some of Fig. 3. Decreasing mean transistor size since 2000. the significant improvements that 10nm would bring about are as follows: • 2.7x density scaling vs 14nm • Self-aligned Quad-Patterning • Contact Over Active Gate • Cobalt Interconnect (M0, M1) • 1st Gen Foveros 3D Stacking • 2nd Gen EMIB Fig. 5. Intel's Process and Manufacturing Roadmap for the next 10 years shows 10nm, 7nm, 5nm, 3nm, 2nm, and 1.4nm. While Intel will be introducing its 10nm+++ products at the same time, they will also be preparing for the manufacturing and introduction of their next-generation 7nm process node. According to Intel, the 7nm manufacturing node will continue to be optimized, with 7nm+ being introduced in 2022 and 7nm++ being introduced in 2023. Similarly to 10nm, 7nm will provide a long list of improvements over 10nm, which will include the following: (2029). (2029). There's no mention of an optimized route for 1.4nm, but this presentation only covers a 10-year plan, so you may anticipate an optimized node path for 1.4nm at the very least. So, in the future year, each main node will be followed by an optimized '+' node, and then a tail-end optimized '++' node. The '++' or, in the case of 10nm, +++ node, will debut with the next major node, which is rather fascinating. The optimized node will have various benefits over the new node, including increased frequency and scalability from the prior two upgrades, as well as a greater number of yields. Intel has numerous pathways to pick from on each node creation, so they can make some intriguing decisions here. Given the timeliness of this roadmap, Intel may have already determined what to do with 10nm and 7nm. Back porting on an earlier but optimized node is also mentioned by Intel. Back porting a 7nm product to 10nm+++, a 5nm product to 7nm++, a 3nm product to 3nm++, and a 2nm product to 3nm++ is possible. Back porting is not addressed for the 1.4nm node. Recently, there have been reports and discussions of Intel backporting a 10nm++ product (Tiger Lake) to 14nm+++ (Rocket Lake). Although substantial proof has been discovered, Intel has yet to provide an official statement on the problem, given the device is scheduled to ship in 2021. However, given that this roadmap mentions back porting, it's possible that Rocket Lake CPUs will include a back port of the Willow Cove cores, which will run on a 10nm++ node on the mobility platform. VI. REFERENCES [1] [2] • 2x density scaling compared to 10nm [3] • Intra-node optimizations are being planned. [4] • Design restrictions for EUV Next-Gen Feverous [5] • EMIB Packaging have been reduced by fourfold. Remember that the only process with a optimization since it is already on 10nm+ in 2019. This is important to remember. Although 1.4nm in 2029 seems to be a very promising development, Intel has previously said that 10nm will be available by 2015 and nm would be available by 2017. In a recent interview, Intel's CEO, Bob Swan, stated that his company is prepared to compete with TSMC by releasing their first 7nm products in Q4 2021, which will compete with TSMC's 5nm node. He also stated that his company expects to reach 5nm, which he claims is equivalent to TSMC's 3nm node, by the latter half of 2024, with product available in 2025. V. CONCLUSION The article also discusses back porting, which has been one of the more fascinating subjects to address in recent months, given the uproar over 14nm and 10nm nodes. At least two optimizations have been found in each main node. 7nm will receive 7nm+ (2022) and 7nm++ (2023), 5nm will get 5nm+ (2024) and 5nm++ (2025), 3nm will get 3nm+ (2026) and 3nm++ (2027), and 2nm will get 2nm+ (2028) and 2nm++ View publication stats [6] [7] [8] [9] [10] [11] [12] S. Natarajan, et al., “A 14nm Logic Technology Featuring 2ndGeneration Finfet, Air-Gapped Interconnects, Self-Aligned Double Patterning And A 0.0588 μm2 SRAM Cell Size,” IEDM, pp. 3.7.1-3.7.3, 2014. E. Fayneh, et al., “14nm 6th-Generation Core Processor SoC with Low Power Consumption and Improved Performance,” ISSCC, pp.72-73, 2016. B. Bowhill, et al., “The Xeon® Processor E5-2600 v3: A 22nm 18-Core Product Family,” ISSCC, pp. 78-79, 2015. A. Nalamalpu, “Design Optimization of Computing Systems from the Transistor to the Data Center,” ISSCC, 2017. F. Spagna, et al., “A 78mW 11.8Gb/s Serial Link Transceiver with Adaptive RX Equalization and Baud-Rate CDR in 32nm CMOS,” ISSCC, pp. 366-377, 2010Murata, T. (1989). Petri Nets: Properties, Analysis and Applications, Proceedings of the IEEE, Vol. 77, No. 4, April 1989, 541-580 Ferain I, Colinge CA, Colinge J-P. Multigate transistors as the future of classical metal-oxide-semiconductor field-effect transistors. Nature. 2011;479: 310–316. pmid:22094690 Meyer PS, Ausubel JH. Carrying capacity: a model with logistically varying limits. Technol Forecast Soc Change. 1999;61: 209–214. Meyer PS, Yung JW, Ausubel JH. A primer on logistic growth and substitution: the mathematics of the Loglet Lab software. Technol Forecast Soc Change. 1999;61: 247–271. Modis T. Forecasting the growth of complexity and change. Technol Forecast Soc Change. 2002;69: 377–404. B. Bowhill, et al., “The Xeon® Processor E5-2600 v3: A 22nm 18-Core Product Family,” ISSCC, pp. 78-79, 2015. A. Nalamalpu, “Design Optimization of Computing Systems from the Transistor to the Data Center,” ISSCC, 2017 Mujtaba, H. (2019, December 11). Intel Lays Down 2019-2029 Manufacturing Roadmap - 1.4nm In 10 Years, 2 Year Cadence With Back Porting On Advanced ++ Nodes. Wccftech. https://wccftech.com/intel-2021-2029-process-roadmap10nm-7nm-5nm-3nm-2nm-1nm-back-porting/..

Intel Chip Manufacturing Roadmap

Related documents

Products

Support

Intel Chip Manufacturing Roadmap

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib