An Error-free 1 Tbps WDM Optical I/O Chiplet and Multi-wavelength Multi-port Laser M. Wade† , E. Anderson, S. Ardalan, W. Bae, B. Beheshtian, S. Buchbinder, K. Chang, P. Chao, H. Eachempatti, J. Frey, E. Jan, A. Katzin, A. Khilo, D. Kita, U. Krishnamoorthy, C. Li, H. Lu, F. Luna, C. Madden, L. Okada, M. Patel, C. Ramamurthy, M. Raval, R. Roucka, K. Robberson, M. Rust, D. Van Orden, R. Zeng, M. Zhang, V. Stojanovic, F. Sedgwick, R. Meade, N. Chan, J. Fini, B. Kim, S. Liu, C. Zhang, D. Jeong, P. Bhargava, M. Sysak, C. Sun Ayar Labs Inc., 3351 Olcott Street, Santa Clara, CA 95054, USA † mark@ayarlabs.com Abstract: We demonstrate 128 Gbps/port (8-λ × 16 Gbps/λ ) natively error-free transmission across eight optical ports using a 8-port, 8-λ /port WDM remote laser source and a pair of monolithically integrated CMOS optical I/O chiplets with 4.96-5.56 pJ/bit optical Tx+Rx chiplet energy efficiency. © 2021 The authors. SSC (a) 8 port, 8-λ/port SuperNova Laser Module 16-λ DFB Array (b) 8-λ Tx/Rx 8-λ Tx/Rx 8-λ Tx/Rx 8-λ Tx/Rx 8-λ Tx/Rx 8-λ Tx/Rx 8-λ Tx/Rx Wide Parallel Electrical I/O 8x4 Passive SSC Mux and SSC Split 8-λ Tx/Rx 24 fiber coupler array SSC 24 fiber v-groove array SSC 2x12 MPO Connector 8x4 Passive Mux and SSC Split SSC 1x12 MPO Connector 16-λ DFB Array SSC 8 fiber v-groove array Electrical link technologies face challenges in scaling bandwidth while achieving compelling energy efficiency, bandwidth density, latency, and reach. While digital processing performance continues to scale in how much compute can be achieved per unit area of silicon, for many high-value workloads, useful compute power is limited by the ability to fuel processing cores with I/O bandwidth. For these applications, existing optical solutions that have evolved from the ethernet networking market are a mismatch in terms of power, bandwidth density, and cost. Thus, a new I/O technology is needed that can match the performance requirements for computing fabrics and have a viable path to scale to high-volumes. In this paper, we present link characterization results from a new optical I/O architecture for computing fabrics. The architecture is comprised of an external multi-port WDM laser source (SuperNova) that is disaggregated from the bandwidth dense, CMOS optical I/O chiplet (TeraPHY) [1, 2]. SuperNova aggregates multiple wavelengths of continuous-wave light onto multiple output fibers, and the TeraPHY chiplet contains multiple full-duplex WDM Tx/Rx optical ports that contain all of the electronics circuitry (SerDes, drivers, TIA, equalization, digital control, etc.) and photonic components (waveguides, modulators, photodetectors, etc.) required for full WDM link operation. Additionally, there is a wide-parallel electrical interface on the chip to connect to host ASICs. TeraPHY Photonic Chiplet (f ) (d) 8 port, 8-λ/port TeraPHY Chiplet Photonic Chiplet 24-fiber (under lid) Output Fiber Array 10 mm 16x8 Mux and Split PLC 2x12 MPO Connector SuperNova Laser Module Lid (e) v-groove array 10 mm (c) Fig. 1: (a) Block diagram of the multi-port, multi-wavelength laser module. (b) Picture of the assembled laser module without lid and (c) with lid. (d) Block diagram of the optical I/O CMOS chiplet. (e) Picture of the assembled optical I/O chiplet package. (f) Picture of the optical chiplet test board. Rel. Insertion Loss [dB] Laser Port 4 Laser Port 8 0 Tx Port 6 Tx Port 7 Tx Port 8 -5 Port 2 (mW) 5.4 4.9 8.5 4.2 6.1 5.4 9.9 5.2 5.3 3.6 8.7 5.3 5.8 4.1 Port 4 (mW) 5.3 6.6 3.6 4.6 5.7 7.4 3.7 5.3 1267.5 1268.9 1271.2 1273.0 1275.1 1276.7 1279.0 Port 5 (mW) 5.1 3.3 6.0 4.6 5.0 3.2 6.6 5.2 Port 6 (mW) 3.3 5.0 4.0 7.2 3.3 5.1 4.4 7.8 Port 7 (mW) 7.3 5.2 4.3 3.2 7.3 5.3 4.9 3.7 Port 8 (mW) 5.6 6.7 3.3 4.1 5.4 6.7 3.7 4.8 12 12 16 Gbps 8.1 Tx (c) Wavelength [nm] Rx Tx Rx Tx Rx 6.5% (0.32 pJ/b) 47.2% (2.34 pJ/b) (d) 25 Gbps Port 3 (mW) 80 12 60 12 65 12 70 12 75 12 80 12 60 12 65 12 70 12 75 12 80 12 60 12 65 12 70 12 75 12 80 5.9 75 7.3 12 1279.4 3.6 70 1275.0 1277.2 5.4 12 1273.2 5.0 60 1271.5 -20 65 80 75 12 12 70 65 12 80 12 12 12 75 Laser Port 7 70 65 12 12 80 75 12 12 Laser Port 6 70 12 65 12 Tx Port 5 -5 -15 6.7 12 Tx Port 4 -10 3.2 λ (nm) 1265.4 Tx Port 3 -15 1267.3 1269.4 3.9 Tx Port 2 -10 (b) λ (nm) 1265.5 Port 1 (mW) 5.3 Wavelength [nm] Tx Port 1 12 Laser Port 3 Laser Port 2 Laser Port 1 Laser Port 5 (a) 0 -20 65 12 70 12 75 12 80 Power (dBm) 10 0 -10 -20 -30 -40 -50 10 0 -10 -20 -30 -40 -50 Tx Rx 7.8% (0.39 pJ/b) 43.8% (2.17 pJ/b) Clk 46.4% (2.30 pJ/b) Clk 48.4% (2.40 pJ/b) (e) Fig. 2: (a) Measured optical power per port. The dashed line is 5dBm. (b) Tabulated laser wavelengths and power across all ports. (c) Measured relative insertion loss through the transmitter. (d) 16 Gbps and (e) 25 Gbps characterization. The 25 Gbps Rx eye is after on-chip decision feedback equalization (DFE) integrated in each receiver. Figure 1(a) shows the main functional blocks of the laser source, and Fig. 1(b),(c) shows a picture of the assembled SuperNova laser without and with the lid. The laser combines a 16-λ DFB laser array with a passive planar lightwave circuit (PLC) that does the multiplexing and splitting to an 8-fiber output array, with each fiber containing 8 wavelengths. The fiber array connects to an off-the-shelf 1x12 MPO connector. Figure 1(d) shows a simplified functional block diagram of the TeraPHY optical I/O chiplet. Figure 1(e) shows a picture of the assembled TeraPHY chiplet package. Figure 1(f) shows a picture of the test CMOS optical I/O chiplet test board. Figure 2 shows a summary of characterization data for the SuperNova laser and the TeraPHY optical I/O chiplet. The output optical spectra for all eight optical ports is shown in Fig. 2(a) and is tabulated in Fig. 2(b). The laser spectra are taken at ≈30o C case temperature. The average optical power per laser wavelength is ≈ 5.3 mW across all wavelengths and all output ports. The variability in the output powers comes mainly from variability in the optical insertion losses through PLC. Figure 2(c),(d),(e) shows a summary of key characterization data from the TeraPHY optical I/O chiplet. The untuned optical spectra for all eight transmitter ports are shown in Fig. 2(c) showing eight microring resonances per port. Note that the spectra shows an untuned channel spacing. Each microring is thermally tuned upon link initialization to match the incoming laser grid. The nominal microring resonance spacing is designed to span the full free spectral range (FSR) of one microring, which in this design is ≈ 3.2 THz. Each optical channel can be configured to run at 16 to 25 Gbps per wavelength, as shown in the transmitter and receiver optical eye diagrams in Fig. 2(d),(e). The Tx/Rx characterization data is acquired using built-in self-test (BIST) modes of operation that use on-chip pseudo-random bit sequence bit-error rate checkers. The energy efficiency breakdown for the optical macros is shown in Figs. 2(d),(e). To characterize the energy efficiency, all eight optical ports are enabled, and we measure the current draw of all voltage rails to arrive at a measured total power. We then divide the measured power by the aggregate data rate to arrive at pJ/bit. We measure 4.96 pJ/bit for both 16 Gbps and 25 Gbps operation per optical channel with small differences in the power breakdown. The power is dominated by the receiver and high-speed clock distribution. Although the thermal tuning power was negligible in the direct measurement, the thermal tuning power will vary depending on the operating conditions when co-packaged with a host ASIC. We estimate that the maximum thermal tuning power that would ever be required is ≈ 5 mW per optical channel. This results in an additional maximum 0.4-0.6 pJ/bit for 25 Gbps and 16 Gbps, respectively, which brings the estimated range of energy efficiencies to 4.9-5.5 pJ/bit. Chip 1 SuperNova Laser 8 ports, 8-λ/port 1x8-λ laser in (a) Optical Port 2 Optical Port 3 Optical Port 4 Optical Port 6 Optical Port 7 Optical Port 8 log10(BER) 0 -2 -4 -6 -8 -10 Optical Port 5 Total Bits Sent/Received Errors Ch. 2 Optical Port # 1 14.386 Terabits 0 Ch. 3 2 15.106 Terabits 0 Ch. 4 3 11.675 Terabits 0 4 14.106 Terabits 0 5 14.502 Terabits 0 6 10.488 Terabits 0 7 13.585 Terabits 0 8 10.317 Terabits 0 Ch. 1 Ch. 5 Ch. 6 Ch. 7 -0.4 -0.2 0.0 0.2 0.4 -0.4 -0.2 0.0 0.2 0.4 -0.4 -0.2 0.0 0.2 0.4 -0.4 -0.2 0.0 0.2 0.4 UI UI (b) UI UI 8 Tx ports, 8 Rx ports, 8 Laser ports 1x8-λ Tx Out / Rx In log10(BER) 0 -2 -4 -6 -8 -10 Optical Port 1 8 Tx ports, 8 Rx ports, 8 Laser ports Chip 2 Ch. 8 (c) Fig. 3: (a) Experimental setup for characterization. One port at a time was characterized. (b) Measured BER bathtub curves for all channels, all ports. (c) Table summarizing error-free operation. A total of 104.167 Terabits was sent and received across all ports. Figure 3 summarizes the full link results. An isolator is used on the output of the SuperNova laser, and all fibers are polarization maintaining, single-mode fibers. Each optical port is characterized individually with all eight WDM channels operational such that there is 128 Gbps of simultaneous bandwidth. During initial link bring-up, the microrings on the Tx and Rx undergo automated optimization that establishes thermal lock to the incoming wavelength grid. Additionally, since the absolute frequency microring resonance frequency is sensitive to manufacturing variation, the initialization routine also establishes Tx microring to Rx microring mapping. Figure 3(b) shows measured bathtub curves for all eight optical ports. For each plot, there are 1e12 bits sent per channel such that the center region of the plots represents error free transmission at an aggregate 1 Terabit (1e12 bits) across all channels. All channels operate simultaneously during the measurements for each BER bathtub curve. Figure 3(c) tabulates the measured results and confirms error-free operation. In addition to the link characterization data shown in Fig. 3, we also confirmed simultaneous (all ports, all channels) 1.024 Tbps error-free operation using a reference benchtop laser with 4.5 mW average power per wavelength. In conclusion, we have demonstrated error-free communication at 128/Gbps per optical port using an 8-λ microringbased WDM architecture. We measured the optical Tx/Rx chiplet energy efficiency to be 4.96 pJ/bit with an estimated additional < 0.6 pJ/bit from thermal tuning depending on the data rate and the environmental operating conditions. Near-term future work will focus on quantifying the laser power contribution, enabling simultaneous operation of all optical ports to match the reference laser performance, increasing the data rate per optical channel during multiport operation, and enabling full-duplex links. We expect further improvements in optical link budget to come from improved consistency of insertion loss across the 24 fiber v-groove array connected to TeraPHY. Additionally, better optimization of the assembly of the SuperNova module is expected to obviate the need of an external isolator. We believe this demonstration shows the viability of microring-based WDM optical I/O architectures with remote laser sources to meet the emerging energy efficiency, bandwidth density, and latency (FEC-free) requirements of multi-Tbps computing fabrics. Acknowledgments: We would like to acknowledge Dr. Gordon Keeler and the DARPA PIPES program. This research was developed with funding from the Defense Advanced Research Projects Agency (DARPA). The views, opinions and/or findings expressed are those of the author and should not be interpreted as representing the official views or policies of the Department of Defense or the U.S. Government. References 1. M. Wade et al. “TeraPHY: A Chiplet Technology for Low-Power, High-Bandwidth In-Package Optical I/O,” in IEEE Micro, vol. 40, no. 2, pp. 63-71, 1 March-April 2020 2. C. Sun, et al., “TeraPHY: An O-Band WDM Electro-Optic Platform for Low-Power, Terabit/s Optical I/O,” VLSI Technology Symposium, 2020