Practical Implementation of PCI Express Gen3 across Optical Cabling The PCIe Gen3 electrical standard presents certain challenges involved with adapting commercially available optical technologies for use in low-cost PCIe Gen3 optical links. A test bed developed to explore these issues produced data that illustrate a solution with a full 64 Gbit/s capacity in commercial applications. by Christopher Wong, Avago Technologies Fiber optic technology can provide a better alternative to copper coaxial cabling for PCI Express 3.0 (PCIe Gen3) inter-chassis connections. The serializer/deserializer (SERDES) technologies originally developed to carry PCIe’s Gen1/Gen2 bus signals across a PC’s motherboard can be adapted to drive copper coaxial cabling for inter-chassis connections in data centers and server farms. Unfortunately, the faster 8 Gbit/s signals specified in the recently adopted PCIe Gen3 standard require a much more complex transceiver to achieve a successful connection across even a few feet of coax, making it difficult for electrical solutions to meet the market’s price, performance and size/weight requirements. Fiber optic technology provides an attractive alternative to high channel count PCIe Gen3 interconnects, with dramatically longer link distances, lower size/weight/power, higher performance and competitive pricing. While standards efforts for fiber-based PCIe Gen3 interconnects are still in their initial stages, there are already commercial products available to provide an interim solution. PCIe Gen3 in a Nutshell The PCI Express (PCIe) bus is a high-speed serial I/O technology intended to provide connections between a central processing unit (CPU) and its peripherals (graphics cards, memory/disk drives, external I/O cards). It has also gained popularity as a passive backplane interconnect in larger systems. At the physical layer (PHY), PCIe is implemented as one or more point-to-point connections, called lanes, between two endpoint devices (Figure 1), composed of two low-voltage AC-coupled differential signal pairs that form a high-speed, full-duplex byte stream between the link’s endpoint devices. When the PCIe 1.0a standard was introduced in 2003 it specified a link speed of 2.5 Gbit/s for each lane although its 8b/10b line coding scheme reduces its useable capacity by 20%. PCIe 2.0 doubles the speed to 5 Gbit/s, enabling a 32-lane (x32) PCIe connector to support an aggregate bit of up to 160 Gbit/s. The PCIe Gen3 specification (finalized in 2010) doubles channel capacity once again. It replaces the 8b/10b line encoding used by Gen1 and Gen2 with 128b/130b encoding, which reduces the channel overhead to approximately 1.5%. PCIe Gen3’s improved efficiency gives its 8 Gbit/s serial lanes two times the useful capacity of an equivalent 5 Gbit/s PCIe 2.0 connection. Because PCIe technology’s high-frequency signals require an impedance-controlled channel and have relatively short “reach,” it is best suited for making “inside-box” connections where both the central processor and peripherals are co-located. Extending PCIe’s Reach Thanks to its speed and efficiency, there is also growing interest in the use of native PCIe connections for inter-chassis applications, such as links between servers, switches and storage elements. The External PCI Express (ePCIe) specification was developed, which enables transport of PCIe Gen1’s 2.5 Gbit/s signals across multimeter lengths of coaxial cabling and is already in use in storage systems, high-performance computers and other products that require high-capacity multi-chassis system interconnects. Work is underway to develop a practical solution for a PCIe Gen2 cabling specification, but any electrical solution that moves from Gen1 (2.5 Gbit/s) to Gen2 (5 Gbit/s) data rates will face signal integrity issues that shorten its reach. The higher cable losses resulting from Gen 3’s higher 8 Gbit/s line rate will further limit the practical reach of a copper cable interconnect. Consequently, implementing Gen 3 PCIe over cable media may necessitate the move to a fiber optic solution in order to support the longer distances needed for multi-chassis interconnects. Once implemented in commercial volumes, optical PCIe interconnect is expected to consume fewer watts and cost less per Gbit/s of capacity than an equivalent copper-based solution. Using PCIe across the entire I/O connection also reduces or eliminates the need for intermediate protocol conversion chips, which, in turn, lowers overall system costs, power consumption and channel latency. Pre-Standards PCIe 3.0 Solutions Available Today Although it will be several years before the PCIe SIG releases standards for fiber-based PCIe Gen3 interconnects, there are already commercial products available that can provide interim solutions for critical markets that cannot afford to wait for the PCIe standards process. Since the interface between PCIe’s MAC and PHY layers is simple and well documented (Figure 2), it is relatively easy to use off-the-shelf PCIe 3.0 switches or other endpoint components to drive a parallel optical transceiver module instead of a multi-channel electrical SerDes driver IC. Multi-lane optical endpoints can be easily implemented using vertical cavity surface emitting laser (VCSEL) arrays housed in commercially available parallel optical Transmit/Receive (Tx/Rx) modules from several vendors, including Avago Technologies. They support as many as 12 parallel channels, operate at 8 Gbit/s per lane or more, and provide up to 150 meters of connectivity. In order to evaluate the feasibility of using commercial products, a proof of concept demonstration system was constructed. It consists of a host PC housing a PLX designed adapter card, employing the PEX8748, 48-lane Gen3 switch (Figure 3). The switche drive Avago Technologies 12-lane, 10 Gbit/s MiniPOD optical modules (AFBR-81/82 Series), where 8 of the optical lanes are made active and 4 lanes are left unused. Optical Domain Challenges Constructing a proof of concept system proved the feasibility of adapting commercially available components for use in optical PCIe Gen3 links. The project also uncovered several issues that must be addressed by products serving these applications including: Receiver Detection: Where proper loading exists, the transmitter is triggered to operate in one of several modes based on what is detected at the device receiver. In particular, it is used as a queue to begin sending a series of line probing signals, which allows the receiver to calculate the settings for its equalizer. In optical applications that use a standard PCIe MAC, the line probing and equalizer functions must somehow be disabled. Electrical Idle Mode: The PCIe protocol defines an optional low-power Electrical Idle (EIDLE) mode that the link may enter into when there is no data to transmit. Today’s optical links have problems with the entry and exit into PCIe’s low-power modes because the transceiver’s longer warm-up times can produce line chatter or improper bias, which can lead to false EIDLE detection and/or exit from the EIDLE state. Clocking: Optical PCIe endpoints must be capable of supporting asynchronous clock operation. This is because most optical PCIe links will not have both ends of the connection in the same enclosure and will not share the Reset or system clock signals required to implement a synchronous reset or clock across the link. Remote Reset: In most applications, a PCIe link’s remote optical card is powered ahead of the main system box (Server/PC). In these applications, the remote card must be configured to undergo an autonomous reset upon power up so that it is fully initialized and ready for link training once the host box becomes active. External/Out-of-Band Signals: The current PCIe external cabling specification for copper coaxial cable defines extra signals that will not be carried in the AFBR-81/82 Series optical solution. For instance, CREFCLK, the 100 MHz Cable Reference Clock, is not needed since the clock is recovered from the data stream by the PCIe transceivers. In addition, the SB_RTN, CPRSNT#, CPWRON, CWAKE# and CPERST pins are not applicable when using an optical cable. Component Selection Selecting the most suitable optical module for test bed application involved consideration of several factors including lane width, form factor and compatibility. An 8-lane configuration was chosen because it is commonly used in high-performance PCIe 2.0 designs. The CXP and MiniPOD form factors were the two most attractive options because of their wide availability and good performance. The MiniPOD form factor was chosen because its embedded parallel optics configuration mounts directly onto the PCB, enabling a better electrical and mechanical design (Figure 4). Unlike the board edge mounts used by CXP modules, a MiniPOD optical module can be easily located mid-board, within five inches of the high-speed driver electronics to minimize the loss and distortion PCIe Gen3’s 8 Gbit/s signals experience due to capacitive skin effects. The PEX8748, 48-lane Gen3 switch, manufactured by PLX Technologies, was selected to serve as the PCIe controller for both endpoints because it incorporates features that can be used to support optical domain operation. The key issues addressed by the switch include: Switching devices in the PEX series have the ability to mask receiver detection and perform link speed negotiation through decoding of the incoming data stream. The device used in this experiment solves potential EIDLE issues because it can be configured to ignore the changes in the data stream that would normally initiate electrical idle but continue to watch for the specific data symbols that signal a request for link speed negotiation. The PEX switch supports an asynchronous clocking mode for data recovery, allowing each end of the PCIe optical link to operate independently. Implementation The proof of concept demonstration consisted of a host PC housing a PLX designed adapter card, employing the PEX8748, 48-lane Gen3 switch. Shown in Figure 5, the card contains a daughter mounting assembly for which the AFBR-81/82 Series optical transmitter and receiver modules are mated to the PEX8748 switch. At the opposite end of the optical link, a second switch card with another set of Tx/Rx modules resides on a distribution board, which can provide fan-out and upstream data aggregation for express peripherals, such as SSD drives and Ethernet HBA cards. For this proof of concept demonstration, only 8 of the MiniPOD’s 12 optical lanes are powered, with the remaining 4 lanes left unused. Each end of the physical link is terminated using a PLX PCIe Gen3 Switch IC. PLX PCIe switches include both clock/data recovery and Tx/Rx equalization for each high-speed port. Because the switch IC’s transceiver runs in its optional asynchronous mode, clock and data recovery (CDR) are not required in the optical module, thus preserving PCIe’s latency advantage. A simple AC coupling circuit is used to tie the Avago Technologies modules to the PLX switch IC’s Tx/Rx signals. The MiniPOD module’s electrical interface also includes a twowire serial control channel that can be used to set the equalization/emphasis and amplitude circuit in each SERDES lane’s transceiver for optimum performance. Demonstration Test Results In this demonstration, a PCIe Gen3 x8 link was successfully implemented over 30 meters of OM3 low-cost multi-mode optical fiber. As implemented, the link supports the following PCIe functionality: Asynchronous operation (no native SSC, but SSC isolation provisions) L0 active state only (Link enable/disable functional under controlled operating system) PCIe normal link speed negotiation Configurable for PCIe standard link width down training As a result of the technical issues discussed earlier, the link does not presently support PCIe active state power management or in-band synchronous resets. Only out-of-band independent reset is supported. As seen in the representative example of eye quality plot (Figure 6), taken at the PLX receiver driving a 30 meter cable, the links demonstrate good signal integrity and errorfree data recovery. It should also be noted that for this demonstration, the MiniPOD optical modules support PCIe 3.0, operating at 8.0 Gbit/s per lane, but are capable of operation over a wide range of line rates from 1 Gbit/s to over 10.3125 Gbit/s. As a result, these optical devices can operate at PCIe 2.x at 5.0 Gbit/s and PCIe 1.x at 2.5 Gbit/s operation without configuration changes and without any trade-off in performance. This wide speed range is encouraging evidence that besides providing an excellent option for implementing PCIe Gen3-compatible optical links today, the same technologies can serve as the foundation for backward-compatible multi-speed optical links in upcoming generations of application-specific products. Avago Technologies, San Jose, CA. (800) 235-0312. [www.avagotech.com]. PLX Technology, Sunnyvale, CA. (408) 774-9060. [www.plxtech.com].