2019 IEEE 69th Electronic Components and Technology Conference (ECTC) Multilayer RDL Interposer for Heterogeneous Device and Module Integration Yi-Hang Lin, M.C.Yew, M.S. Liu ,S.M. Chen, T.M. Lai, P.N. Kavle, C.H. Lin, T.J. Fang, C.S. Chen, C.T. Yu, K.C. Lee, C.K. Hsu, P.Y. Lin, F.C Hsu and Shin-Puu Jeng* Taiwan Semiconductor Manufacturing Company, No.6, Creation Rd. II, Hsinchu Science Park, Hsinchu, Taiwan (R.O.C.) 30077 Email: *spjeng@tsmc.com Abstract—in this paper, we demonstrate a high density heterogeneous large package using a RDL interposer with six interconnection layers. Four Si chiplets and two HBM modules are connected with fine pitch copper lines to deliver a complete system-in-package solution for high performance computation. The multilayer interconnections provide excellent design flexibility to optimize signal, power, and ground planes. The RDL interposer has generic structural advantages in interconnection integrity and bump joint reliability, which allows further scaling up of the package size for more complicated functional integration. packages in the references [2,3Şġ integrate three FPGA chiplets and two HBMs and one big GPU and four HBMs, respectively. Higher computing efficiency can be achieved with more HBMs – for example, the package in reference [4] uses four HBMs to gain >2.7 TFLOPS and reference [5] employs six HBMs. The data rate of HBM continues to increase: the HBM1 data rate is 1 Gbps, HBM2 goes up to 2.4Gbps, and HBM3 plans to reach about 3.2Gbps [6, 7]. The high-performance package needs to provide good SI/PI performance to support such high data rates. Keywords- Fanout and Heterogeneous Interconnections; Chiplets; System in package B. CPU cores integration : To achieve higher performance computing, thread numbers and the number of CPU cores increase year by year. In the HPC processor in reference [8], the main blocks are divided into 3 parts: the system agent part for I/O, the multicore part for central computing, and the GPU part. The size of the central computing part expands for high performance computing. In order to increase core numbers and reduce cost, the package in references [9,10] divides one big processor into four chiplets. I. Integration; INTRODUCTION In high performance computing applications, one of the key enabling components is the fine-pitch RDL, which provides connection between logic and high-bandwidth memory (HBM) or between chiplets. The interconnection density determines the electrical performance of the packages. As the connection length between logic and JEDEC standard HBM memory is over 4mm, long interconnects of this type require not only insertion loss reduction with low impedance, but also strong crosstalk protection in both horizontal and vertical directions. Fine pitch Cu vias and traces allow finer power mesh, which reduces power delivery network ĩPDN) impedance, and reduces noise. In the chiplet scheme, multiple RDL interconnections are required to connect electrical interfaces, especially for high pin counts and to enhance design flexibility. Si interposers have been successfully adopted for chiplets and HBM integration. [1] These packages exhibit excellent performance, which meet increasing bandwidth demands and unveil various important applications in network and artificial intelligence computation. In this paper, we demonstrate a large package that integrates four Si chiplets and two HBM modules on a RDL interposer with six layers of interconnections. The benefits of electrical performance using six RDL interconnections are analyzed. Furthermore, the generic mechanical advantages in RDL integrity and bump joint reliability of the new type of package are presented. C. System-level heterogeneous integration From a system computation efficiency point of view, the seamless integration between FPGA, CPU, GPU, NPU, IO interfaces, SRAM, and HBM is critical. It is a challenge for package technology to integrate such diverse functional components. The package in reference [11], which integrates CPU, GPU and HBM, is likely the beginning of such a trend. Here, we will demonstrate the potential of our multilayer RDL interposer for system level heterogeneous integration. II. As shown in Figure 1, the basic integration scheme of the RDL fan-out interposer resembles that of the familiar Si interposer. Si chips and memory modules are attached to the interposer with protective molding compound, and the “chipon-RDL interposer” structure is then jointed unto a PCB substrate with C4 bumps. Figure 2 shows the detailed constituents of the structure, including Si chips, memory modules, micro bumps, RDL interposer, C4 bumps, PCB substrate and BGA. The key components of RDL interposer are listed in Table 1. The line width/spacing of baseline RDL interposer is 2/2 um, which is larger than that of a A. HBM integration High-performance computing requires high density onpackage integration with a high data rate. For example, the 2377-5726/19/$31.00 ©2019 IEEE DOI 10.1109/ECTC.2019.00145 MULITIPLE RDL INTERPOSER FABRICATION 931 typical Si interposer. The vertical interconnection is composed of fine pitch stacking vias and stagger vias, which allows flexible routing design without extra parasitic capacitance. Figure 5(a) shows the cross-sectional view of the package. Both Si chips and HBM are jointed onto a thin RDL interposer. The molded interposer structure is assembled to a PCB substrate with C4 bumps. Figure 5(b) shows the 6 RDL structure. The high density fine-pitch Cu RDL is for the connection between the PHY of chiplets and HBM. Ground RDL mesh is used as shielding for good SI/PI performance. Figure 5 (c), (d) and (e) show the SEM pictures of stagger vias, two stacking vias, and four stacking vias, respectively. The use of stacking via can reduce the RDL routing distance and increase design flexibility. Finally, Figure 6 shows the OM and SEM images of fine pitch Culines with minimal 2um width. Figure 1 Schematic cross-section of SOC and HBM modules on multilayer RDL interposer. The interposer stack is attached to a PCB substrate. ((a)) ((b)) (c) (d) Figure 2 Schematic drawing showing the details and process sequence of heterogeneous integration of SOC and HBM on RDL interposer. TABLE 1 KEY COMPONETS IN RDL INTERPOSER INTEGRATION RDL interposer Heterogeneous integration Yes Si chip, module I/O Cu, solder bumps Dielectric Interposer Organic (Polyimide) RDL Fine pitch Cu lines Vertical Interconnect Cu via (staggered, stacking or mixture of both) C4 Figure 3 The interposer and PCB substrate dimensions are 32 x35 mm2 and 55x55mm2, respectively. (a), (b) The package has four Si chiplets and two HBM modules, (c) the backside of interposer with C4 bumps, (d) HBM module on interposer. Cu, solder bumps Figure 3 shows a fully assembled RDL interposer package. There are four Si chips and two HBMs in this package. The sizes of RDL interposer and PCB substrate are 32x35mm2 and 55x55mm2, respectively. Figure 4 shows the X-ray images of good self-aligned micro bump joints. The minimal bump pitch is 55um here. Figure 4 X-ray images that shows good micro-bump joints. (d), (e) are the micro-bump joints of HBM. The images of the bumps inside HBM overlap with the ones on interposer. 932 ( ) (a) ((b)) the height and jitter noise of eye diagrams, is simulated using HFSS and ADS. Figure 8 (a), (b) and (c) compare the eye diagrams of these different isolation configurations. Both the eye height and jitter noise of signal lines are significantly improved with additional ground isolation. The electrical performance of the six-RDL scheme has a superior performance as compared to that of the three-RDL scheme. Table 2 summarizes the values of improvement: eye height of six-RDL design is 6%-16% better than that of three-RDL scheme. For jitter noise, the six-RDL design is 2-3 times better than the three-RDL design. (c) (a) (d) (b) (e) ((c)) Figure 5 Cross-sectional views of (a) RDL interposer package on a PCB substrate, (b) six layers of Cu interconnections, (c) six layers of Cu interconnections with stagger vias, (d) Cu interconnections with two stacking vias, (e) Cu interconnections with four stacking vias. (a) ( ) ((b)) Figure 7 Signal routing arrangements of three-RDL and sixRDL (a) coplanar GSSG with three RDL interconnections, (b) coplanar GSGSG with three RDL interconnections, (c) coplanar GSGSG and interlayer ground shielding with six RDL interconnections. ((a)) Figure 6 (a) Optical microscope and (b) SEM images of 2 um Cu lines. III. ELECTRICAL PERFORMANCE OF MULTILAYER RDL The electrical performance of eye diagrams and the insertion loss (S parameter) of three different RDL arrangements, co-planar GSSG structure in the three RDL scheme [Fig 7(a)], a co-planar GSGSG structure in the three RDL scheme [Fig. 7(b)], and the co-planar GSGSG structure shielded by three extra ground traces in the six RDL scheme [Fig 7(c)], is studied. The signal integrity performance, i.e., 933 ( ) (b) The insertion loss performance is important for high frequency operation. High insertion loss degrades the signal intensity and increases the operational power. The insertion loss (S21 parameter) performance of these three signal routing structures is compared in Figure 9. Comparable performance is observed due to equal line width and thickness for all signal routings. The crosstalk performance of the two adjacent co-planar signal lines is compared in Figure 10(a). The coplanar GSSG and GSGSG structures with three RDL interconnections exhibit larger crosstalk than that of coplanar GSGSG with six RDL interconnections. Additional interlayer ground shielding with six RDL interconnections provides significant performance improvement. For the layer-to-layer crosstalk, the inserted ground plane in the six-RDL scheme is capable of completely isolating the signal lines, and produces nearly zero crosstalk, as shown in Fig. 10 (b). ( ) (c) ((a)) Figure 8 Simulated eye diagrams of (a) coplanar GSSG with three RDL interconnections, (b) coplanar GSGSG with three RDL interconnections, (c) coplanar GSGSG and interlayer ground shielding with six RDL interconnections. ((b)) Table 2 the Signal integrity RDL type Signal Integrity Eye height Jitter(rms) 3RDLGSSG 0.84x 1x 3RDLGSGSG 0.94x 0.49x 6RDLGSGSG 1x 0.34x Figure 10 (a) Simulated crosstalk of two adjacent co-planar signal lines, (b) simulated layer-to-layer crosstalk of two adjacent signal lines in vertical direction. IV. STRUCTURAL ADVANTAGES OF RDL INTERPOSER AND RELIABILITY ASSESSMENT The four-chips-plus-two-HBM RDL interposer package successfully passes the stringent reliability torture without failures. There are generic structural advantages of the RDL Figure 9 Simulated insertion loss of different configurations is compared for HBM-SOC PHY connections. 934 interposer, particularly in RDL integrity and bump joints reliability. B. Micro bump, C4 Joint Reliability The micro solder joint reliability is investigated through mechanical stress simulation. The temperature cycling (TC) loading ranges from -40°C to 125°C with a 1-hour cycle duration. Figure 13 shows the accumulated strain energy density (SED) of the corner micro bump on Si chip for both RDL interposer and flip chip packages. The SED on micro bump is significantly reduced by the RDL layer and the underfill layer for C4 bumps. The normalized maximum delta SED within 1 TC of corner micro bump is 0.52, which is lower than experimentally proven safe delta SED level. A. Mutilayer RDL Integrigy Compared to the RDL layer elsewhere on the package, the fine pitch Cu lines underneath the gaps between Si chip and HBM have a relatively lower structural stiffness support. These lines can be deformed and broken during the reliability test. The stress on RDL interposer with a temperature loading from room temperature to 250C is characterized with finite element analysis. Due to its shortest distance to Si and HBM, RDL1 has the highest stress from CTE mismatch as shown in the contour plot in Figure 11. Fortunately, the underfill material between RDL and Si chip/HBM serves as good stress buffer layer, which significantly reduces the stress to below the risk level, as shown in Figure 12. (a) ( ) (b) Figure 11 Cross sectional schematic of RDL interposer, and P1 stress contours of the RDL below the SoC-to-HBM gap. Figure 13 Comparison of (a) micro-bump strain contour, (b) normalized micro-bump strain energy density between RDL interposer package and flip chip package with same boundary condition. Similarly, the accumulated SED of the corner C4 bump on Si chip can be reduced by the flexible RDL layer, as shown in Figure 14. The C4 joint reliability, i.e., the chip-packageinteraction (CPI), window is substantially larger than the Figure 12 Normalized P1 stress each RDL layers. 935 typical flip chip type package. This is the primary reason why the RDL interposer is scalable to large sizes. [3] Jack Choquette, "Volta: Performance and Programmability", IEEE Hot Chips Symposium, 2017. [4] Toshio Yoshida, "Fujitsu High Performance CPU for the Post-K Computer", IEEE Hot Chips Symposium, 2018. [5] Yohei Yamada, "Vector Engine Processor of NEC’s Brand-New Supercomputer SX-Aurora TSUBASA", IEEE Hot Chips Symposium, 2018. [6] Jin Hee Cho et al., "A 1.2V 64Gb 341GB/s HBM2 Stacked DRAM with Spiral Point-to-Point TSV Structure and Improved Bank Group Data Control”, ISSCC, 2018. [7] Hongshin Jun et al.,"HBM (High Bandwidth Memory) DRAM Technology and Architecture", International Memory Workshop, 2017. [8] https://en.wikipedia.org/wiki/Intel_Core [9]Kevin Lepak et al., "The next generation amd enterprise server product architecture", IEEE Hot Chips Symposium, 2017. [10] Noah Beck et al., "Zeppelin: An SoC for Multichip Architectures", ISSCC, 2018. [11] Srinivas Chennupaty, "Thin & Light & high performance graphics", IEEE Hot Chips Symposium, 2018. (a) ( ) ( ) (b) Figure 14 Comparison of (a) C4 strain contour, (b) normalized C4 strain energy density between RDL interposer package and flip chip package with same boundary condition. V. CONCLUSION The multilayer RDL interposer package is an excellent heterogeneous integration platform. Six layers of interconnection provide design flexibility for chiplets and HBM integration with good electrical performance, such as large eye height, low jitter, and nearly zero layer-to-layer crosstalk performance. This unique scheme, due to the flexible organic RDL layers used as a stress buffer layer to protect fine pitch Cu lines and bump joints, offers good package reliability and scalability to larger package sizes. REFERENCES [1] Suresh Ramalingam, “HBM package Integration: Technology Trends, Challenges and Applications”, IEEE Hot Chip Symposium, 2016. [2]Gaurav Singh et al., "Xilinx 16nm Datacenter Device Family with In-Package HBM and CCIX Interconnect ", IEEE Hot Chips Symposium, 2017. 936