Uploaded by Manoel Fonseca Neto

CXL Memory Interconnect Initiative white paper

advertisement
CXL MEMORY INTERCONNECT INITIATIVE
White Paper
CXL Memory
Interconnect Initiative:
Enabling a New Era of Data
Center Architecture
©Rambus Inc.
Table of Contents
CXL Memory Interconnect Initiative:
Enabling a New Era of Data Center Architecture.................... 3
The CXL Protocols.................................................................... 4
CXL Memory Expansion........................................................... 5
CXL Memory Pooling Via Switching......................................... 6
CXL Memory Pooling Via Direct Connect................................ 7
CXL Security............................................................................. 8
The Rambus CXL Memory Interconnect Initiative................... 8
CXL Memory Interconnect Initiative: Enabling a New Era of Data Center Architecture
2
CXL Memory Interconnect
Initiative: Enabling a New Era of
Data Center Architecture
In response to an exponential growth in data, the industry is on the threshold of a
groundbreaking architectural shift that will fundamentally change the performance, efficiency
and cost of data centers around the globe. Server architecture, which has remained largely
unchanged for decades, is taking a revolutionary step forward to address the growing demand
for data and the voracious performance requirements of advanced workloads with artificial
intelligence/machine learning (AI/ML) being the marquee example.
The data center is moving from a model where each server has dedicated processing and
memory, as well as other resources like networking and accelerators, to a disaggregated model
that employs pools of shared resources which can be efficiently composed to the needs of
specific workloads whatever their requirements. Disaggregation and composability tailor
computing resources to the workload, bringing many benefits including higher performance,
greater efficiency and reduced total cost of ownership (TCO) for the data center.
While the concept of disaggregation, or rack-scale architectures, and universal interfaces have
been around for many years, the industry’s convergence on Compute Express Link (CXL) as a
cache-coherent interconnect for processors, memory and accelerators, provides the critical
enabler and opportunity to make those concepts a reality. Server architectures coming in
the months ahead will include CXL interfaces, opening up the ability for memory expansion
through CXL interconnects, and planning is underway for fully disaggregated architectures in
the years ahead.
CXL Memory Interconnect Initiative: Enabling a New Era of Data Center Architecture
3
The CXL Protocols
CXL, now at the 3.0 specification level, provides low-latency links and memory coherency
between computing devices. It builds on the enormous momentum of PCI Express (PCIe)
technology by adopting the PCIe PHY as its physical interface. CXL 1.1/2.0 use the PCIe 5.0 PHY
operating at 32 Gigatransfer per second (GT/s). CXL 3.0 scales signaling to 64 GT/s using PCIe 6.0.
To support a broad number of computing use cases, the CXL standard defines three protocols:
CXL.io, CXL.cache and CXL.memory. CXL.io provides a non-coherent load/store interface for IO
devices and is used for discovery, enumeration, and register accesses. It is functionally equivalent
to the PCIe protocol. CXL.io is the foundational communication protocol and as such is applicable
to all use cases.
CXL.cache enables devices such as accelerators to efficiently access and cache host memory
for improved performance. As an example, using CXL.io plus CXL.cache, the performance of
workloads shared between an accelerator-based NIC and host CPU can be improved with local
caching of data in the accelerator’s attached memory.
Protocol
CXL.io
PCIe-based
Yes
Function
Discovery, register access, interrupts, initialization, I/O
virtualization, DMA (direct memory access)
CXL.cache
Supports device caching of host memory with host processor
orchestrating the coherency management
CXL.memory
Supports host management and utilization of device attached
memory (memory node) similar to host (main) memory
The CXL protocols are combined to support a range of computing use cases
The CXL.memory protocol enables a host, such as a processor, to access device attached memory
using load/store commands. This enables some very compelling use cases which we’ll describe in
the CXL memory expansion and pooling sections that follow.
The use case of coherently sharing memory resources between computing devices, such as
a CPU host and an AI accelerator, can be enabled by using all three of the CXL protocols. For
instance, a server with a CXL-connected accelerator would enable the accelerator to use the
CPU’s direct-attached memory for workloads which required greater memory capacity. Absent
CXL, the accelerator would need to access SSD or hard disk storage for these workloads incurring
a significant latency and bandwidth penalty. The upshot is that CXL will dramatically increase the
performance of these workloads over the legacy alternative.
CXL Memory Interconnect Initiative: Enabling a New Era of Data Center Architecture
4
CXL Memory Expansion
Memory cache coherency and low-latency connection make possible use cases that can have
a dramatic impact on computing performance, efficiency and TCO. The first and most straightforward use case is CXL memory expansion. In this instance, significant additional capacity, above
and beyond that of main memory, can be added to a CPU Host via a CXL-attached device (device
attached memory).
Host
PHY
Controller
CXL Interface
Subsystem
CXL
Interconnect
Interconnect
Device
(Memory
Node)
Controller
PHY
DDR Memory
Interface
Subsystem
CXL Single-Host Memory
Expansion Chip
DDR Memory Devices
CXL memory expansion provides more capacity to the host processor
If the device attached memory is DRAM (such as DDR), the low-latency performance of the
CXL link would allow the host to use this memory as an extension of its main memory. As we
described earlier, for workloads requiring a large memory space that exceeded main memory
capacity, the CXL solution would deliver a dramatic improvement in performance. This memory
expansion use case is supported at the 1.0 and all higher revisions levels of the CXL standard.
CXL Memory Interconnect Initiative: Enabling a New Era of Data Center Architecture
5
CXL Memory Pooling Via Switching
CXL 2.0 introduced the capability for switching, similar to that supported by the PCIe standard,
making possible the use case of CXL memory pooling. With a CXL 2.0 switch, a host could access
one or more devices from the pool. Since the hosts are responsible for coherency management,
they would necessarily need to be CXL 2.0 enabled. The memory pool devices, however, could
be a mix of CXL 1.0, 1.1 and 2.0-enabled hardware. At 1.0/1.1, a device is limited to behaving as
a single logical device accessible by only one host at a time. A 2.0 level device can be partitioned
as multiple logical devices, allowing up to 16 hosts to simultaneously access different portions of
the memory.
As an example from the image below, a host 1 (H1) could use half the memory in device 1 (D1)
and a quarter of the memory in device 2 (D2) to finely match the memory requirements of its
workload to the available capacity in the memory pool. The remaining capacity in devices D1 and
D2 could be used by one or more of the other hosts up to a maximum of 16. Devices D3 and D4,
CXL 1.0 and 1.1-enabled respectively, could be used by only one host at a time.
Hosts
H1
H2
H3
H4
H#
CXL 2.0 Switch
D1
H1
D2
D3
D4
D#
H1
Devices
CXL 1.0
CXL 1.1
CXL 2.0
CXL memory pooling with a switch enables multi-host
access to CXL 2.0 memory devices
CXL 2.0 supports only a single-tier switch architecture. Upstream ports connect only to hosts,
downstream only to devices; a CXL switch cannot be connected to another CXL switch. CXL 3.0
introduces fabric switching allowing for greatly increased architectural flexibility and scale.
All switching architectures will introduce additional latency above and beyond that of direct
attached DRAM or memory expansion devices. With switching, multiple memory tiers could be
provisioned where hosts could access larger and larger pools of memory at the cost of increased
latency with each tier.
CXL Memory Interconnect Initiative: Enabling a New Era of Data Center Architecture
6
CXL Memory Pooling
Via Direct Connect
A memory tier between CXL expansion and switched memory can be achieved with a CXL
direct-connect architecture, achieving the performance benefits of main memory expansion and
the efficiency and total cost of ownership (TCO) benefits of pooled memory. This architecture
requires that all hosts and devices are CXL 2.0-enabled or above. In this model, “switching” is
incorporated into the memory devices via a crossbar in the CXL memory pooling chip. This keeps
latency low but requires a more powerful chip since it is now responsible for the control plane
functionality performed by the switch in the previous case.
Hosts
H1
H2
H3
H4
H#
CXL 2.0 Interconnects
CXL 2.0 Interface
Subsystem
PHY
PHY
Controller
Controller
Crossbar
D1
D2
D3
D4
D#
Controller
PHY
Devices
DDR Memory
Interface
Subsystem
CXL Multi-Host Memory
Pooling Chip
DDR Memory Devices
CXL memory pooling via direct connect keeps latency low
so attached devices can act as main memory expansion
With low-latency direct connections, attached memory devices can employ DDR DRAM to
provide expansion of host main memory. This can be done on a very flexible basis, since a host is
able to access all or portions of the capacity of as many devices as needed to tackle its workload.
Analogous to ridesharing, memory is available to hosts on an “as needed” basis, delivering
greater utilization and efficiency of memory. And this architecture would provide the option to
provision server main memory for nominal workloads, rather than worst case, with the ability to
access the pool when needed for high-capacity workloads, offering further benefits to TCO.
Ultimately, the CXL memory pooling models from direct connect to fabric switching support the
fundamental shift to server disaggregation and composability. In such an architecture, discrete
units of compute, memory and storage can be composed on-demand to efficiently meet the
needs of any workload.
CXL Memory Interconnect Initiative: Enabling a New Era of Data Center Architecture
7
CXL Security
Hand-in-hand with the rapid rise in the volume of data, comes the concurrent rise in the value of
data. We’ve discussed the benefits of disaggregation, but by separating the components of server
architectures, we increase the attack surface which nefarious actors can attempt to exploit.
Understanding this, CXL is specified with a secure by design approach. All three CXL protocols are
secured via Integrity and Data Encryption (IDE) which provides confidentiality, integrity and replay
protection. To meet the high speed data rate requirements of CXL without introducing additional
latency, IDE is implemented in hardware-level secure protocol engines instantiated in the CXL
host and device chips.
In addition to protecting the data, CXL chips and systems themselves require safeguards against
tampering and cyberattack. A hardware Root of Trust implemented in the CXL chips can provide
this basis for security and support requirements for Secure Boot and Secure Firmware Download.
The Rambus CXL Memory
Interconnect Initiative
Through the CXL Memory Interconnect Initiative, Rambus is researching and developing
solutions to enable a new era of data center performance and efficiency. Announced on June
16, 2021, this initiative is the latest chapter in the 30-year history of Rambus leadership in the
development of breakthrough memory and chip-to-chip interconnect technology and products.
Realization of the chip solutions needed for the memory expansion and pooling use cases
described earlier, and further those needed for server disaggregation and composability, will
require the synthesis of a number of critical technologies. Rambus has been researching memory
solutions for disaggregated architectures for close to a decade and leverages a system-aware
design approach to solve next-generation challenges.
This initiative brings together our expertise in memory and SerDes subsystems, semiconductor
and network security, high-volume memory interface chips and compute system architectures to
develop breakthrough CXL interconnect solutions for the future data center.
For more information, visit
rambus.com/cxlmemoryinterconnects
©Rambus Inc. • rambus.com
rev_2022OCT10
Download