Uploaded by pc chen

Multilayer RDL Interposer for Heterogeneous Device and Module Integration

advertisement
2019 IEEE 69th Electronic Components and Technology Conference (ECTC)
Multilayer RDL Interposer for Heterogeneous Device and Module Integration
Yi-Hang Lin, M.C.Yew, M.S. Liu ,S.M. Chen, T.M. Lai, P.N. Kavle, C.H. Lin, T.J. Fang, C.S. Chen, C.T. Yu, K.C. Lee, C.K.
Hsu, P.Y. Lin, F.C Hsu and Shin-Puu Jeng*
Taiwan Semiconductor Manufacturing Company, No.6, Creation Rd. II,
Hsinchu Science Park, Hsinchu, Taiwan (R.O.C.) 30077
Email: *spjeng@tsmc.com
Abstract—in this paper, we demonstrate a high density
heterogeneous large package using a RDL interposer with six
interconnection layers. Four Si chiplets and two HBM
modules are connected with fine pitch copper lines to deliver a
complete system-in-package solution for high performance
computation.
The multilayer interconnections provide
excellent design flexibility to optimize signal, power, and
ground planes. The RDL interposer has generic structural
advantages in interconnection integrity and bump joint
reliability, which allows further scaling up of the package size
for more complicated functional integration.
packages in the references [2,3Şġ integrate three FPGA
chiplets and two HBMs and one big GPU and four HBMs,
respectively. Higher computing efficiency can be achieved
with more HBMs – for example, the package in reference [4]
uses four HBMs to gain >2.7 TFLOPS and reference [5]
employs six HBMs. The data rate of HBM continues to
increase: the HBM1 data rate is 1 Gbps, HBM2 goes up to
2.4Gbps, and HBM3 plans to reach about 3.2Gbps [6,
7]. The high-performance package needs to provide good
SI/PI performance to support such high data rates.
Keywords- Fanout and Heterogeneous
Interconnections; Chiplets; System in package
B. CPU cores integration :
To achieve higher performance computing, thread
numbers and the number of CPU cores increase year by year.
In the HPC processor in reference [8], the main blocks are
divided into 3 parts: the system agent part for I/O, the
multicore part for central computing, and the GPU part. The
size of the central computing part expands for high
performance computing. In order to increase core numbers
and reduce cost, the package in references [9,10] divides one
big processor into four chiplets.
I.
Integration;
INTRODUCTION
In high performance computing applications, one of the
key enabling components is the fine-pitch RDL, which
provides connection between logic and high-bandwidth
memory (HBM) or between chiplets. The interconnection
density determines the electrical performance of the
packages. As the connection length between logic and
JEDEC standard HBM memory is over 4mm, long
interconnects of this type require not only insertion loss
reduction with low impedance, but also strong crosstalk
protection in both horizontal and vertical directions. Fine
pitch Cu vias and traces allow finer power mesh, which
reduces power delivery network ĩPDN) impedance, and
reduces noise. In the chiplet scheme, multiple RDL
interconnections are required to connect electrical interfaces,
especially for high pin counts and to enhance design
flexibility.
Si interposers have been successfully adopted for chiplets
and HBM integration. [1] These packages exhibit excellent
performance, which meet increasing bandwidth demands and
unveil various important applications in network and
artificial intelligence computation. In this paper, we
demonstrate a large package that integrates four Si chiplets
and two HBM modules on a RDL interposer with six layers
of interconnections. The benefits of electrical performance
using six RDL interconnections are analyzed. Furthermore,
the generic mechanical advantages in RDL integrity and
bump joint reliability of the new type of package are
presented.
C. System-level heterogeneous integration
From a system computation efficiency point of view, the
seamless integration between FPGA, CPU, GPU, NPU, IO
interfaces, SRAM, and HBM is critical. It is a challenge for
package technology to integrate such diverse functional
components. The package in reference [11], which integrates
CPU, GPU and HBM, is likely the beginning of such a
trend.
Here, we will demonstrate the potential of our multilayer
RDL interposer for system level heterogeneous integration.
II.
As shown in Figure 1, the basic integration scheme of the
RDL fan-out interposer resembles that of the familiar Si
interposer. Si chips and memory modules are attached to the
interposer with protective molding compound, and the “chipon-RDL interposer” structure is then jointed unto a PCB
substrate with C4 bumps. Figure 2 shows the detailed
constituents of the structure, including Si chips, memory
modules, micro bumps, RDL interposer, C4 bumps, PCB
substrate and BGA. The key components of RDL interposer
are listed in Table 1. The line width/spacing of baseline
RDL interposer is 2/2 um, which is larger than that of a
A. HBM integration
High-performance computing requires high density onpackage integration with a high data rate. For example, the
2377-5726/19/$31.00 ©2019 IEEE
DOI 10.1109/ECTC.2019.00145
MULITIPLE RDL INTERPOSER FABRICATION
931
typical Si interposer.
The vertical interconnection is
composed of fine pitch stacking vias and stagger vias, which
allows flexible routing design without extra parasitic
capacitance.
Figure 5(a) shows the cross-sectional view of the
package. Both Si chips and HBM are jointed onto a thin
RDL interposer. The molded interposer structure is
assembled to a PCB substrate with C4 bumps. Figure 5(b)
shows the 6 RDL structure. The high density fine-pitch Cu
RDL is for the connection between the PHY of chiplets and
HBM. Ground RDL mesh is used as shielding for good SI/PI
performance. Figure 5 (c), (d) and (e) show the SEM pictures
of stagger vias, two stacking vias, and four stacking vias,
respectively. The use of stacking via can reduce the RDL
routing distance and increase design flexibility. Finally,
Figure 6 shows the OM and SEM images of fine pitch Culines with minimal 2um width.
Figure 1 Schematic cross-section of SOC and HBM modules on
multilayer RDL interposer. The interposer stack is attached to a
PCB substrate.
((a))
((b))
(c)
(d)
Figure 2 Schematic drawing showing the details and process
sequence of heterogeneous integration of SOC and HBM on RDL
interposer.
TABLE 1 KEY COMPONETS IN RDL INTERPOSER INTEGRATION
RDL interposer
Heterogeneous integration
Yes
Si chip, module I/O
Cu, solder bumps
Dielectric
Interposer
Organic (Polyimide)
RDL
Fine pitch Cu lines
Vertical
Interconnect
Cu via (staggered,
stacking or mixture
of both)
C4
Figure 3 The interposer and PCB substrate dimensions are 32 x35
mm2 and 55x55mm2, respectively. (a), (b) The package has four Si
chiplets and two HBM modules, (c) the backside of interposer with
C4 bumps, (d) HBM module on interposer.
Cu, solder bumps
Figure 3 shows a fully assembled RDL interposer
package. There are four Si chips and two HBMs in this
package. The sizes of RDL interposer and PCB substrate are
32x35mm2 and 55x55mm2, respectively. Figure 4 shows the
X-ray images of good self-aligned micro bump joints. The
minimal bump pitch is 55um here.
Figure 4 X-ray images that shows good micro-bump joints. (d),
(e) are the micro-bump joints of HBM. The images of the bumps
inside HBM overlap with the ones on interposer.
932
( )
(a)
((b))
the height and jitter noise of eye diagrams, is simulated using
HFSS and ADS. Figure 8 (a), (b) and (c) compare the eye
diagrams of these different isolation configurations. Both
the eye height and jitter noise of signal lines are significantly
improved with additional ground isolation. The electrical
performance of the six-RDL scheme has a superior
performance as compared to that of the three-RDL scheme.
Table 2 summarizes the values of improvement: eye height
of six-RDL design is 6%-16% better than that of three-RDL
scheme. For jitter noise, the six-RDL design is 2-3 times
better than the three-RDL design.
(c)
(a)
(d)
(b)
(e)
((c))
Figure 5 Cross-sectional views of (a) RDL interposer package on
a PCB substrate, (b) six layers of Cu interconnections, (c) six layers
of Cu interconnections with stagger vias, (d) Cu interconnections
with two stacking vias, (e) Cu interconnections with four stacking
vias.
(a)
( )
((b))
Figure 7 Signal routing arrangements of three-RDL and sixRDL (a) coplanar GSSG with three RDL interconnections, (b)
coplanar GSGSG with three RDL interconnections, (c) coplanar
GSGSG and interlayer ground shielding with six RDL
interconnections.
((a))
Figure 6 (a) Optical microscope and (b) SEM images of 2 um Cu
lines.
III.
ELECTRICAL PERFORMANCE OF MULTILAYER RDL
The electrical performance of eye diagrams and the
insertion loss (S parameter) of three different RDL
arrangements, co-planar GSSG structure in the three RDL
scheme [Fig 7(a)], a co-planar GSGSG structure in the three
RDL scheme [Fig. 7(b)], and the co-planar GSGSG structure
shielded by three extra ground traces in the six RDL scheme
[Fig 7(c)], is studied. The signal integrity performance, i.e.,
933
( )
(b)
The insertion loss performance is important for high
frequency operation. High insertion loss degrades the signal
intensity and increases the operational power. The insertion
loss (S21 parameter) performance of these three signal
routing structures is compared in Figure 9. Comparable
performance is observed due to equal line width and
thickness for all signal routings.
The crosstalk performance of the two adjacent co-planar
signal lines is compared in Figure 10(a). The coplanar GSSG
and GSGSG structures with three RDL interconnections
exhibit larger crosstalk than that of coplanar GSGSG with
six RDL interconnections. Additional interlayer ground
shielding with six RDL interconnections provides significant
performance improvement. For the layer-to-layer crosstalk,
the inserted ground plane in the six-RDL scheme is capable
of completely isolating the signal lines, and produces nearly
zero crosstalk, as shown in Fig. 10 (b).
( )
(c)
((a))
Figure 8 Simulated eye diagrams of (a) coplanar GSSG with
three RDL interconnections, (b) coplanar GSGSG with three RDL
interconnections, (c) coplanar GSGSG and interlayer ground
shielding with six RDL interconnections.
((b))
Table 2 the Signal integrity
RDL type
Signal
Integrity
Eye height
Jitter(rms)
3RDLGSSG
0.84x
1x
3RDLGSGSG
0.94x
0.49x
6RDLGSGSG
1x
0.34x
Figure 10 (a) Simulated crosstalk of two adjacent co-planar
signal lines, (b) simulated layer-to-layer crosstalk of two adjacent
signal lines in vertical direction.
IV. STRUCTURAL ADVANTAGES OF RDL INTERPOSER AND
RELIABILITY ASSESSMENT
The four-chips-plus-two-HBM RDL interposer package
successfully passes the stringent reliability torture without
failures. There are generic structural advantages of the RDL
Figure 9 Simulated insertion loss of different configurations is
compared for HBM-SOC PHY connections.
934
interposer, particularly in RDL integrity and bump joints
reliability.
B. Micro bump, C4 Joint Reliability
The micro solder joint reliability is investigated through
mechanical stress simulation. The temperature cycling (TC)
loading ranges from -40°C to 125°C with a 1-hour cycle
duration. Figure 13 shows the accumulated strain energy
density (SED) of the corner micro bump on Si chip for both
RDL interposer and flip chip packages. The SED on micro
bump is significantly reduced by the RDL layer and the
underfill layer for C4 bumps. The normalized maximum
delta SED within 1 TC of corner micro bump is 0.52, which
is lower than experimentally proven safe delta SED level.
A. Mutilayer RDL Integrigy
Compared to the RDL layer elsewhere on the package,
the fine pitch Cu lines underneath the gaps between Si chip
and HBM have a relatively lower structural stiffness support.
These lines can be deformed and broken during the
reliability test.
The stress on RDL interposer with a temperature loading
from room temperature to 250C is characterized with finite
element analysis. Due to its shortest distance to Si and HBM,
RDL1 has the highest stress from CTE mismatch as shown
in the contour plot in Figure 11. Fortunately, the underfill
material between RDL and Si chip/HBM serves as good
stress buffer layer, which significantly reduces the stress to
below the risk level, as shown in Figure 12.
(a)
( )
(b)
Figure 11 Cross sectional schematic of RDL interposer, and P1
stress contours of the RDL below the SoC-to-HBM gap.
Figure 13 Comparison of (a) micro-bump strain contour, (b)
normalized micro-bump strain energy density between RDL
interposer package and flip chip package with same boundary
condition.
Similarly, the accumulated SED of the corner C4 bump on
Si chip can be reduced by the flexible RDL layer, as shown
in Figure 14. The C4 joint reliability, i.e., the chip-packageinteraction (CPI), window is substantially larger than the
Figure 12 Normalized P1 stress each RDL layers.
935
typical flip chip type package. This is the primary reason
why the RDL interposer is scalable to large sizes.
[3] Jack Choquette, "Volta: Performance and
Programmability", IEEE Hot Chips Symposium, 2017.
[4] Toshio Yoshida, "Fujitsu High Performance CPU for the
Post-K Computer", IEEE Hot Chips Symposium, 2018.
[5] Yohei Yamada, "Vector Engine Processor of NEC’s
Brand-New Supercomputer SX-Aurora TSUBASA", IEEE
Hot Chips Symposium, 2018.
[6] Jin Hee Cho et al., "A 1.2V 64Gb 341GB/s HBM2
Stacked DRAM with Spiral Point-to-Point TSV Structure
and Improved Bank Group Data Control”, ISSCC, 2018.
[7] Hongshin Jun et al.,"HBM (High Bandwidth Memory)
DRAM Technology and Architecture", International
Memory Workshop, 2017.
[8] https://en.wikipedia.org/wiki/Intel_Core
[9]Kevin Lepak et al., "The next generation amd enterprise
server product architecture", IEEE Hot Chips Symposium,
2017.
[10] Noah Beck et al., "Zeppelin: An SoC for Multichip
Architectures", ISSCC, 2018.
[11] Srinivas Chennupaty, "Thin & Light & high
performance graphics", IEEE Hot Chips Symposium, 2018.
(a)
( )
( )
(b)
Figure 14 Comparison of (a) C4 strain contour, (b) normalized
C4 strain energy density between RDL interposer package and flip
chip package with same boundary condition.
V.
CONCLUSION
The multilayer RDL interposer package is an excellent
heterogeneous integration platform.
Six layers of
interconnection provide design flexibility for chiplets and
HBM integration with good electrical performance, such as
large eye height, low jitter, and nearly zero layer-to-layer
crosstalk performance. This unique scheme, due to the
flexible organic RDL layers used as a stress buffer layer to
protect fine pitch Cu lines and bump joints, offers good
package reliability and scalability to larger package sizes.
REFERENCES
[1] Suresh Ramalingam, “HBM package Integration:
Technology Trends, Challenges and Applications”, IEEE
Hot Chip Symposium, 2016.
[2]Gaurav Singh et al., "Xilinx 16nm Datacenter Device
Family with In-Package HBM and CCIX Interconnect ",
IEEE Hot Chips Symposium, 2017.
936
Download
Study collections