Practical Implementation of PCI Express Gen3 across Optical

advertisement
Practical Implementation of PCI Express Gen3 across Optical Cabling
The PCIe Gen3 electrical standard presents certain challenges involved with adapting
commercially available optical technologies for use in low-cost PCIe Gen3 optical links. A test
bed developed to explore these issues produced data that illustrate a solution with a full 64 Gbit/s
capacity in commercial applications.
by Christopher Wong, Avago Technologies
Fiber optic technology can provide a better alternative to copper coaxial cabling for PCI Express
3.0 (PCIe Gen3) inter-chassis connections. The serializer/deserializer (SERDES) technologies
originally developed to carry PCIe’s Gen1/Gen2 bus signals across a PC’s motherboard can be
adapted to drive copper coaxial cabling for inter-chassis connections in data centers and server
farms. Unfortunately, the faster 8 Gbit/s signals specified in the recently adopted PCIe Gen3
standard require a much more complex transceiver to achieve a successful connection across
even a few feet of coax, making it difficult for electrical solutions to meet the market’s price,
performance and size/weight requirements.
Fiber optic technology provides an attractive alternative to high channel count PCIe Gen3
interconnects, with dramatically longer link distances, lower size/weight/power, higher
performance and competitive pricing. While standards efforts for fiber-based PCIe Gen3
interconnects are still in their initial stages, there are already commercial products available to
provide an interim solution.
PCIe Gen3 in a Nutshell
The PCI Express (PCIe) bus is a high-speed serial I/O technology intended to provide
connections between a central processing unit (CPU) and its peripherals (graphics cards,
memory/disk drives, external I/O cards). It has also gained popularity as a passive backplane
interconnect in larger systems. At the physical layer (PHY), PCIe is implemented as one or more
point-to-point connections, called lanes, between two endpoint devices (Figure 1), composed of
two low-voltage AC-coupled differential signal pairs that form a high-speed, full-duplex byte
stream between the link’s endpoint devices.
When the PCIe 1.0a standard was introduced in 2003 it specified a link speed of 2.5 Gbit/s for
each lane although its 8b/10b line coding scheme reduces its useable capacity by 20%. PCIe 2.0
doubles the speed to 5 Gbit/s, enabling a 32-lane (x32) PCIe connector to support an aggregate
bit of up to 160 Gbit/s. The PCIe Gen3 specification (finalized in 2010) doubles channel capacity
once again. It replaces the 8b/10b line encoding used by Gen1 and Gen2 with 128b/130b
encoding, which reduces the channel overhead to approximately 1.5%. PCIe Gen3’s improved
efficiency gives its 8 Gbit/s serial lanes two times the useful capacity of an equivalent 5 Gbit/s
PCIe 2.0 connection.
Because PCIe technology’s high-frequency signals require an impedance-controlled channel and
have relatively short “reach,” it is best suited for making “inside-box” connections where both
the central processor and peripherals are co-located.
Extending PCIe’s Reach
Thanks to its speed and efficiency, there is also growing interest in the use of native PCIe
connections for inter-chassis applications, such as links between servers, switches and storage
elements. The External PCI Express (ePCIe) specification was developed, which enables
transport of PCIe Gen1’s 2.5 Gbit/s signals across multimeter lengths of coaxial cabling and is
already in use in storage systems, high-performance computers and other products that require
high-capacity multi-chassis system interconnects.
Work is underway to develop a practical solution for a PCIe Gen2 cabling specification, but any
electrical solution that moves from Gen1 (2.5 Gbit/s) to Gen2 (5 Gbit/s) data rates will face
signal integrity issues that shorten its reach. The higher cable losses resulting from Gen 3’s
higher 8 Gbit/s line rate will further limit the practical reach of a copper cable interconnect.
Consequently, implementing Gen 3 PCIe over cable media may necessitate the move to a fiber
optic solution in order to support the longer distances needed for multi-chassis interconnects.
Once implemented in commercial volumes, optical PCIe interconnect is expected to consume
fewer watts and cost less per Gbit/s of capacity than an equivalent copper-based solution. Using
PCIe across the entire I/O connection also reduces or eliminates the need for intermediate
protocol conversion chips, which, in turn, lowers overall system costs, power consumption and
channel latency.
Pre-Standards PCIe 3.0 Solutions Available Today
Although it will be several years before the PCIe SIG releases standards for fiber-based PCIe
Gen3 interconnects, there are already commercial products available that can provide interim
solutions for critical markets that cannot afford to wait for the PCIe standards process. Since the
interface between PCIe’s MAC and PHY layers is simple and well documented (Figure 2), it is
relatively easy to use off-the-shelf PCIe 3.0 switches or other endpoint components to drive a
parallel optical transceiver module instead of a multi-channel electrical SerDes driver IC.
Multi-lane optical endpoints can be easily implemented using vertical cavity surface emitting
laser (VCSEL) arrays housed in commercially available parallel optical Transmit/Receive
(Tx/Rx) modules from several vendors, including Avago Technologies. They support as many as
12 parallel channels, operate at 8 Gbit/s per lane or more, and provide up to 150 meters of
connectivity.
In order to evaluate the feasibility of using commercial products, a proof of concept
demonstration system was constructed. It consists of a host PC housing a PLX designed adapter
card, employing the PEX8748, 48-lane Gen3 switch (Figure 3). The switche drive Avago
Technologies 12-lane, 10 Gbit/s MiniPOD optical modules (AFBR-81/82 Series), where 8 of the
optical lanes are made active and 4 lanes are left unused.
Optical Domain Challenges
Constructing a proof of concept system proved the feasibility of adapting commercially available
components for use in optical PCIe Gen3 links. The project also uncovered several issues that
must be addressed by products serving these applications including:
Receiver Detection: Where proper loading exists, the transmitter is triggered to operate in one of
several modes based on what is detected at the device receiver. In particular, it is used as a queue
to begin sending a series of line probing signals, which allows the receiver to calculate the
settings for its equalizer. In optical applications that use a standard PCIe MAC, the line probing
and equalizer functions must somehow be disabled.
Electrical Idle Mode: The PCIe protocol defines an optional low-power Electrical Idle (EIDLE)
mode that the link may enter into when there is no data to transmit. Today’s optical links have
problems with the entry and exit into PCIe’s low-power modes because the transceiver’s longer
warm-up times can produce line chatter or improper bias, which can lead to false EIDLE
detection and/or exit from the EIDLE state.
Clocking: Optical PCIe endpoints must be capable of supporting asynchronous clock operation.
This is because most optical PCIe links will not have both ends of the connection in the same
enclosure and will not share the Reset or system clock signals required to implement a
synchronous reset or clock across the link.
Remote Reset: In most applications, a PCIe link’s remote optical card is powered ahead of the
main system box (Server/PC). In these applications, the remote card must be configured to
undergo an autonomous reset upon power up so that it is fully initialized and ready for link
training once the host box becomes active.
External/Out-of-Band Signals: The current PCIe external cabling specification for copper coaxial
cable defines extra signals that will not be carried in the AFBR-81/82 Series optical solution.
For instance, CREFCLK, the 100 MHz Cable Reference Clock, is not needed since the clock is
recovered from the data stream by the PCIe transceivers. In addition, the SB_RTN, CPRSNT#,
CPWRON, CWAKE# and CPERST pins are not applicable when using an optical cable.
Component Selection
Selecting the most suitable optical module for test bed application involved consideration of
several factors including lane width, form factor and compatibility. An 8-lane configuration was
chosen because it is commonly used in high-performance PCIe 2.0 designs. The CXP and
MiniPOD form factors were the two most attractive options because of their wide availability
and good performance. The MiniPOD form factor was chosen because its embedded parallel
optics configuration mounts directly onto the PCB, enabling a better electrical and mechanical
design (Figure 4). Unlike the board edge mounts used by CXP modules, a MiniPOD optical
module can be easily located mid-board, within five inches of the high-speed driver electronics
to minimize the loss and distortion PCIe Gen3’s 8 Gbit/s signals experience due to capacitive
skin effects.
The PEX8748, 48-lane Gen3 switch, manufactured by PLX Technologies, was selected to serve
as the PCIe controller for both endpoints because it incorporates features that can be used to
support optical domain operation. The key issues addressed by the switch include:



Switching devices in the PEX series have the ability to mask receiver detection and
perform link speed negotiation through decoding of the incoming data stream.
The device used in this experiment solves potential EIDLE issues because it can be
configured to ignore the changes in the data stream that would normally initiate electrical
idle but continue to watch for the specific data symbols that signal a request for link
speed negotiation.
The PEX switch supports an asynchronous clocking mode for data recovery, allowing
each end of the PCIe optical link to operate independently.
Implementation
The proof of concept demonstration consisted of a host PC housing a PLX designed adapter card,
employing the PEX8748, 48-lane Gen3 switch. Shown in Figure 5, the card contains a daughter
mounting assembly for which the AFBR-81/82 Series optical transmitter and receiver modules
are mated to the PEX8748 switch. At the opposite end of the optical link, a second switch card
with another set of Tx/Rx modules resides on a distribution board, which can provide fan-out and
upstream data aggregation for express peripherals, such as SSD drives and Ethernet HBA cards.
For this proof of concept demonstration, only 8 of the MiniPOD’s 12 optical lanes are powered,
with the remaining 4 lanes left unused.
Each end of the physical link is terminated using a PLX PCIe Gen3 Switch IC. PLX PCIe
switches include both clock/data recovery and Tx/Rx equalization for each high-speed port.
Because the switch IC’s transceiver runs in its optional asynchronous mode, clock and data
recovery (CDR) are not required in the optical module, thus preserving PCIe’s latency
advantage. A simple AC coupling circuit is used to tie the Avago Technologies modules to the
PLX switch IC’s Tx/Rx signals. The MiniPOD module’s electrical interface also includes a twowire serial control channel that can be used to set the equalization/emphasis and amplitude
circuit in each SERDES lane’s transceiver for optimum performance.
Demonstration Test Results
In this demonstration, a PCIe Gen3 x8 link was successfully implemented over 30 meters of
OM3 low-cost multi-mode optical fiber.
As implemented, the link supports the following PCIe functionality:




Asynchronous operation (no native SSC, but SSC isolation provisions)
L0 active state only (Link enable/disable functional under controlled operating
system)
PCIe normal link speed negotiation
Configurable for PCIe standard link width down training
As a result of the technical issues discussed earlier, the link does not presently support PCIe
active state power management or in-band synchronous resets. Only out-of-band independent
reset is supported. As seen in the representative example of eye quality plot (Figure 6), taken at
the PLX receiver driving a 30 meter cable, the links demonstrate good signal integrity and errorfree data recovery.
It should also be noted that for this demonstration, the MiniPOD optical modules support PCIe
3.0, operating at 8.0 Gbit/s per lane, but are capable of operation over a wide range of line rates
from 1 Gbit/s to over 10.3125 Gbit/s. As a result, these optical devices can operate at PCIe 2.x at
5.0 Gbit/s and PCIe 1.x at 2.5 Gbit/s operation without configuration changes and without any
trade-off in performance. This wide speed range is encouraging evidence that besides providing
an excellent option for implementing PCIe Gen3-compatible optical links today, the same
technologies can serve as the foundation for backward-compatible multi-speed optical links in
upcoming generations of application-specific products.
Avago Technologies, San Jose, CA. (800) 235-0312. [www.avagotech.com].
PLX Technology, Sunnyvale, CA. (408) 774-9060. [www.plxtech.com].
Download