CXL MEMORY INTERCONNECT INITIATIVE White Paper CXL Memory Interconnect Initiative: Enabling a New Era of Data Center Architecture ©Rambus Inc. Table of Contents CXL Memory Interconnect Initiative: Enabling a New Era of Data Center Architecture.................... 3 The CXL Protocols.................................................................... 4 CXL Memory Expansion........................................................... 5 CXL Memory Pooling Via Switching......................................... 6 CXL Memory Pooling Via Direct Connect................................ 7 CXL Security............................................................................. 8 The Rambus CXL Memory Interconnect Initiative................... 8 CXL Memory Interconnect Initiative: Enabling a New Era of Data Center Architecture 2 CXL Memory Interconnect Initiative: Enabling a New Era of Data Center Architecture In response to an exponential growth in data, the industry is on the threshold of a groundbreaking architectural shift that will fundamentally change the performance, efficiency and cost of data centers around the globe. Server architecture, which has remained largely unchanged for decades, is taking a revolutionary step forward to address the growing demand for data and the voracious performance requirements of advanced workloads with artificial intelligence/machine learning (AI/ML) being the marquee example. The data center is moving from a model where each server has dedicated processing and memory, as well as other resources like networking and accelerators, to a disaggregated model that employs pools of shared resources which can be efficiently composed to the needs of specific workloads whatever their requirements. Disaggregation and composability tailor computing resources to the workload, bringing many benefits including higher performance, greater efficiency and reduced total cost of ownership (TCO) for the data center. While the concept of disaggregation, or rack-scale architectures, and universal interfaces have been around for many years, the industry’s convergence on Compute Express Link (CXL) as a cache-coherent interconnect for processors, memory and accelerators, provides the critical enabler and opportunity to make those concepts a reality. Server architectures coming in the months ahead will include CXL interfaces, opening up the ability for memory expansion through CXL interconnects, and planning is underway for fully disaggregated architectures in the years ahead. CXL Memory Interconnect Initiative: Enabling a New Era of Data Center Architecture 3 The CXL Protocols CXL, now at the 3.0 specification level, provides low-latency links and memory coherency between computing devices. It builds on the enormous momentum of PCI Express (PCIe) technology by adopting the PCIe PHY as its physical interface. CXL 1.1/2.0 use the PCIe 5.0 PHY operating at 32 Gigatransfer per second (GT/s). CXL 3.0 scales signaling to 64 GT/s using PCIe 6.0. To support a broad number of computing use cases, the CXL standard defines three protocols: CXL.io, CXL.cache and CXL.memory. CXL.io provides a non-coherent load/store interface for IO devices and is used for discovery, enumeration, and register accesses. It is functionally equivalent to the PCIe protocol. CXL.io is the foundational communication protocol and as such is applicable to all use cases. CXL.cache enables devices such as accelerators to efficiently access and cache host memory for improved performance. As an example, using CXL.io plus CXL.cache, the performance of workloads shared between an accelerator-based NIC and host CPU can be improved with local caching of data in the accelerator’s attached memory. Protocol CXL.io PCIe-based Yes Function Discovery, register access, interrupts, initialization, I/O virtualization, DMA (direct memory access) CXL.cache Supports device caching of host memory with host processor orchestrating the coherency management CXL.memory Supports host management and utilization of device attached memory (memory node) similar to host (main) memory The CXL protocols are combined to support a range of computing use cases The CXL.memory protocol enables a host, such as a processor, to access device attached memory using load/store commands. This enables some very compelling use cases which we’ll describe in the CXL memory expansion and pooling sections that follow. The use case of coherently sharing memory resources between computing devices, such as a CPU host and an AI accelerator, can be enabled by using all three of the CXL protocols. For instance, a server with a CXL-connected accelerator would enable the accelerator to use the CPU’s direct-attached memory for workloads which required greater memory capacity. Absent CXL, the accelerator would need to access SSD or hard disk storage for these workloads incurring a significant latency and bandwidth penalty. The upshot is that CXL will dramatically increase the performance of these workloads over the legacy alternative. CXL Memory Interconnect Initiative: Enabling a New Era of Data Center Architecture 4 CXL Memory Expansion Memory cache coherency and low-latency connection make possible use cases that can have a dramatic impact on computing performance, efficiency and TCO. The first and most straightforward use case is CXL memory expansion. In this instance, significant additional capacity, above and beyond that of main memory, can be added to a CPU Host via a CXL-attached device (device attached memory). Host PHY Controller CXL Interface Subsystem CXL Interconnect Interconnect Device (Memory Node) Controller PHY DDR Memory Interface Subsystem CXL Single-Host Memory Expansion Chip DDR Memory Devices CXL memory expansion provides more capacity to the host processor If the device attached memory is DRAM (such as DDR), the low-latency performance of the CXL link would allow the host to use this memory as an extension of its main memory. As we described earlier, for workloads requiring a large memory space that exceeded main memory capacity, the CXL solution would deliver a dramatic improvement in performance. This memory expansion use case is supported at the 1.0 and all higher revisions levels of the CXL standard. CXL Memory Interconnect Initiative: Enabling a New Era of Data Center Architecture 5 CXL Memory Pooling Via Switching CXL 2.0 introduced the capability for switching, similar to that supported by the PCIe standard, making possible the use case of CXL memory pooling. With a CXL 2.0 switch, a host could access one or more devices from the pool. Since the hosts are responsible for coherency management, they would necessarily need to be CXL 2.0 enabled. The memory pool devices, however, could be a mix of CXL 1.0, 1.1 and 2.0-enabled hardware. At 1.0/1.1, a device is limited to behaving as a single logical device accessible by only one host at a time. A 2.0 level device can be partitioned as multiple logical devices, allowing up to 16 hosts to simultaneously access different portions of the memory. As an example from the image below, a host 1 (H1) could use half the memory in device 1 (D1) and a quarter of the memory in device 2 (D2) to finely match the memory requirements of its workload to the available capacity in the memory pool. The remaining capacity in devices D1 and D2 could be used by one or more of the other hosts up to a maximum of 16. Devices D3 and D4, CXL 1.0 and 1.1-enabled respectively, could be used by only one host at a time. Hosts H1 H2 H3 H4 H# CXL 2.0 Switch D1 H1 D2 D3 D4 D# H1 Devices CXL 1.0 CXL 1.1 CXL 2.0 CXL memory pooling with a switch enables multi-host access to CXL 2.0 memory devices CXL 2.0 supports only a single-tier switch architecture. Upstream ports connect only to hosts, downstream only to devices; a CXL switch cannot be connected to another CXL switch. CXL 3.0 introduces fabric switching allowing for greatly increased architectural flexibility and scale. All switching architectures will introduce additional latency above and beyond that of direct attached DRAM or memory expansion devices. With switching, multiple memory tiers could be provisioned where hosts could access larger and larger pools of memory at the cost of increased latency with each tier. CXL Memory Interconnect Initiative: Enabling a New Era of Data Center Architecture 6 CXL Memory Pooling Via Direct Connect A memory tier between CXL expansion and switched memory can be achieved with a CXL direct-connect architecture, achieving the performance benefits of main memory expansion and the efficiency and total cost of ownership (TCO) benefits of pooled memory. This architecture requires that all hosts and devices are CXL 2.0-enabled or above. In this model, “switching” is incorporated into the memory devices via a crossbar in the CXL memory pooling chip. This keeps latency low but requires a more powerful chip since it is now responsible for the control plane functionality performed by the switch in the previous case. Hosts H1 H2 H3 H4 H# CXL 2.0 Interconnects CXL 2.0 Interface Subsystem PHY PHY Controller Controller Crossbar D1 D2 D3 D4 D# Controller PHY Devices DDR Memory Interface Subsystem CXL Multi-Host Memory Pooling Chip DDR Memory Devices CXL memory pooling via direct connect keeps latency low so attached devices can act as main memory expansion With low-latency direct connections, attached memory devices can employ DDR DRAM to provide expansion of host main memory. This can be done on a very flexible basis, since a host is able to access all or portions of the capacity of as many devices as needed to tackle its workload. Analogous to ridesharing, memory is available to hosts on an “as needed” basis, delivering greater utilization and efficiency of memory. And this architecture would provide the option to provision server main memory for nominal workloads, rather than worst case, with the ability to access the pool when needed for high-capacity workloads, offering further benefits to TCO. Ultimately, the CXL memory pooling models from direct connect to fabric switching support the fundamental shift to server disaggregation and composability. In such an architecture, discrete units of compute, memory and storage can be composed on-demand to efficiently meet the needs of any workload. CXL Memory Interconnect Initiative: Enabling a New Era of Data Center Architecture 7 CXL Security Hand-in-hand with the rapid rise in the volume of data, comes the concurrent rise in the value of data. We’ve discussed the benefits of disaggregation, but by separating the components of server architectures, we increase the attack surface which nefarious actors can attempt to exploit. Understanding this, CXL is specified with a secure by design approach. All three CXL protocols are secured via Integrity and Data Encryption (IDE) which provides confidentiality, integrity and replay protection. To meet the high speed data rate requirements of CXL without introducing additional latency, IDE is implemented in hardware-level secure protocol engines instantiated in the CXL host and device chips. In addition to protecting the data, CXL chips and systems themselves require safeguards against tampering and cyberattack. A hardware Root of Trust implemented in the CXL chips can provide this basis for security and support requirements for Secure Boot and Secure Firmware Download. The Rambus CXL Memory Interconnect Initiative Through the CXL Memory Interconnect Initiative, Rambus is researching and developing solutions to enable a new era of data center performance and efficiency. Announced on June 16, 2021, this initiative is the latest chapter in the 30-year history of Rambus leadership in the development of breakthrough memory and chip-to-chip interconnect technology and products. Realization of the chip solutions needed for the memory expansion and pooling use cases described earlier, and further those needed for server disaggregation and composability, will require the synthesis of a number of critical technologies. Rambus has been researching memory solutions for disaggregated architectures for close to a decade and leverages a system-aware design approach to solve next-generation challenges. This initiative brings together our expertise in memory and SerDes subsystems, semiconductor and network security, high-volume memory interface chips and compute system architectures to develop breakthrough CXL interconnect solutions for the future data center. For more information, visit rambus.com/cxlmemoryinterconnects ©Rambus Inc. • rambus.com rev_2022OCT10