® June 2001, ver. 1.0 Introduction Increasing System Bandwidth with CDS Application Note 162 As system speeds have increased, semiconductor and board designers have turned to source-synchronous clocking and differential signaling to improve chip-to-chip data transfer rates. While source-synchronous clocking does meet this need, it is not very flexible. Designers must closely match the clock and data line lengths, complicating board design. Every chip-to-chip data transfer must have a clock as well as data lines, so every connection introduces a new clock domain. A device that receives data from several devices must have dedicated circuitry for each connection and manage data flow among several clock domains. A new clocking technique called clock-data synchronization (CDS) combines the advantages of traditional synchronous clocking and sourcesynchronous clocking by providing high-speed data transfer without the need to closely match clock and data lines. Unlike clock-data recovery (CDR), there is no need to encode or scramble data to meet any kind of run-length requirement. This application note discusses how CDS works and how it can be used in a variety of systems. The look-up table (LUT)-based APEXTM II device family incorporates CDS circuitry in its differential I/O circuitry. These devices offer four banks of high-speed differential I/O pins: two output banks and two input banks. Each bank contains 18 channels and one clock and supports LVDS, LVPECL, PCML, and HyperTransport I/O standards at up to one gigabit per second (Gbps). The two input banks incorporate CDS, providing the advantages described below. SourceSynchronous Clocking Altera Corporation A-AN-162-01 Source-synchronous clocking has become a popular technique for highspeed designs. With this technique, the transmitting device sends a clock along with the data. The advantage of this approach is that the maximum performance is no longer computed from the clock-to-out delay, propagation delay, and setup times of the devices and board. Instead, the maximum performance is related to the maximum edge rate of the driver and the skew between the data signals and the clock signals. Using this technique, data can be transferred at a 1-Gbps rate (1-ns bit period) even though the propagation delay from transmitter to receiver may exceed 1 ns. Figure 1 shows an example of source-synchronous transfer. 1 AN 162: Increasing System Bandwidth with CDS Figure 1. Source-Synchronous Transfer In a source-synchronous system, trace lengths must be matched to minimize skew between data traces and the clock trace. Transmitter Receiver Data1 Data2 Clock Clock However, there are some drawbacks to the source-synchronous clocking technique. The board design must be tightly controlled so that there is minimal skew between the data and clock paths. Additionally, each set of data driven from a device must be sent with a clock signal. Therefore, if a device receives data from four other devices, that device must also receive four clocks. These clocks can complicate the design of the receiver, as the design now has to manage four clock domains using first-in first-out (FIFO) buffers. Clock-Data Synchronization 2 CDS is a new solution to this design challenge. With CDS, the receiving device can synchronize multiple incoming streams of data to its own system clock. This technique simplifies board design because skew between data channels and the clock is no longer an issue. A receiver can use CDS to correct any amount of clock-to-channel or channel-to-channel skew. CDS allows designers to easily implement various system topologies. Multiple devices can now feed into one receiving device, which processes all incoming data in one clock domain. Figure 2 shows an example of a system using CDS. Altera Corporation AN 162: Increasing System Bandwidth with CDS Figure 2. System Using CDS APEX II Device 1 1 to 36 APEX II Device 2 1 to 36 APEX II Device 3 1 to 36 APEX II Device 4 Clock Signal CDR has been used to address similar skew and topology requirements. CDR has an advantage over CDS because the data transmitters can operate on multiple crystals as the receiver recovers individual clocks from each incoming data channel. Every channel can have phase variation as well as frequency variation within a specified limit. Although CDR provides flexibility, the receiver design is more complicated because every data channel has its own clock domain. With CDS, the data channels may vary in phase, but must all be precisely the same frequency. To ensure that all channels are the same frequency, all transmitters must be clocked from the same system clock. Altera Corporation 3 AN 162: Increasing System Bandwidth with CDS Compared to CDR, CDS has an advantage in data transmission efficiency. For a CDR receiver to recover the clock and data, the data channel must periodically toggle. This requirement is known as the maximum run length. For example, a common CDR technique is to use 8B/10B encoding, which ensures that more than five ones or five zeros are never transmitted consecutively. However, this encoding scheme creates inefficiency on the data channel. A 1.25-gigabit data channel can only transmit a 1-gigabit 8B/10B-encoded data stream. CDS does not have a run length requirement, so there is no need to encode the data stream. Therefore, the entire bandwidth of the transmission channel can be used for the system data; a 1.25-gigabit data channel can transmit 1.25-gigabits of data. CDS Implementation The APEX II CDS receiver works by aligning itself to a known training pattern the transmitter sends over the data channels. When sending the training pattern, the transmitter also enables a CDS pin on the receiving device to synchronize the data to the system clock. The receiver’s circuit captures the pattern with multiple phases of the system clock and then selects whichever clock phase correctly captured the pattern. After the training pattern is sent, the receiver uses the selected clock phase to capture the actual data. Figure 3 shows the circuitry that selects which phase of the clock captures the data. 4 Altera Corporation AN 162: Increasing System Bandwidth with CDS Figure 3. CDS Implementation Input Data D D Synchronized Data D Control Logic Selects Register D 0˚ Output System Clock PLL (1) 90˚ Output Note to Figure 3: (1) PLL: phase-locked loop. When using source-synchronous clocking, the data stream can be automatically byte-aligned. For example, if the data stream is eight times as fast as the clock, the most significant bit (MSB) of each byte is the data transmitted during the third bit period after the clock. This relationship holds because skew between clock and data is limited. There is no limit on skew between clock and data in a CDS system. Therefore, the designer cannot use the relationship between clock and data to byte-align the two signals. However, in a CDS system, a byte alignment pattern is sent to the receiver after the training pattern. The receiving device uses this pattern to byte-align the data. Altera Corporation 5 AN 162: Increasing System Bandwidth with CDS It only takes a few clock cycles to transmit and process this training and byte-alignment sequence, and this is performed once upon system powerup. If multiple transmitting devices are on the same board, they are subject to the same voltage and temperature variation, so skews between them will not vary and retraining is not necessary. All transmitting devices send the training pattern simultaneously so that the receiver can self-adjust for all skews simultaneously. However, if the transmitting devices are on different boards or subsystems, they may experience different voltage and temperature variation, and the design may need to periodically resend the training pattern depending on the variation that the system sees. Although additional clock cycles are necessary to resend the training pattern, a CDS system is still more efficient than CDR systemencoding schemes. CDS System Applications CDS improves system efficiency in many ways. It can correct for skew that cables and connectors introduce to data channels. CDS also adds flexibility to overall system designs. Two examples are implementing a switched backplane and breaking up large designs into multiple devices. Many systems, including communications and storage systems, incorporate a backplane to transmit data from one subsystem to another. Historically, these designs have used a shared backplane (such as PCI). However, the need for faster data transfer has revealed limitations of this approach. A shared backplane can only support one transaction at a time, and the bus speed cannot increase fast enough to support the data requirements. The switched backplane approach is a solution to higher data transfer requirements. Rather than sharing a common bus, each card communicates on a point-to-point link to a master switch. The switch transfers the data to the destination point. Differential I/O standards are well-suited to this architecture, as each point-to-point link can run at very high speeds. Furthermore, since the bus is not shared, multiple transactions can be executed simultaneously, as shown in Figure 4. 6 Altera Corporation AN 162: Increasing System Bandwidth with CDS Figure 4. Switched Backplane Application APEX II Device 1 APEX II Device 2 APEX II Device 3 APEX II Device 4 Clock Signal With source-synchronous clocking, every point-to-point link must have its own clock. The master switch must implement multiple clock domains and manage data and clock skew across the backplane. CDS is a good solution to these concerns because all cards use a system clock. The master switch can use CDS to correct for any skews caused by system clock skew, device-to-device variation, or data skew. Using CDS for this architecture simplifies the overall system design by keeping the entire system synchronized to one clock. The CDS circuitry in the APEX II device family provides the flexibility necessary to easily implement a switched backplane system. Altera Corporation 7 AN 162: Increasing System Bandwidth with CDS Another example of a CDS application is design partitioning. Many complex designs, such as packet processing, cannot easily fit into one device or are partitioned for other reasons. For example, while software running on network processors is useful for general packet processing, ASICs or programmable logic devices (PLDs) are often used for accelerating specific functions. Network processors and PLDs implement different functions within the system. For example, classification and queuing control are important to assure quality of service, and encryption is important for security. These functions can be implemented at a higher speed in a PLD than in a network processor. The size of these functions may prevent them all from being incorporated into one PLD. Historically, partitioning these functions into multiple devices has resulted in very inefficient use of the devices. Each individual device would usually use up all its I/O pins before using all of its logic. High-speed differential interconnects in conjunction with CDS enable a very high bandwidth data transfer from device to device so the required data transfer from chip to chip can be implemented using only a few I/O pins. Figure 5 shows a block diagram of an OC-192 data path. In this design, the packet processing is divided between a network processor and multiple PLDs. CDS is used to implement high-speed data transfer among the multiple devices that make up the packet-processing function. 8 Altera Corporation AN 162: Increasing System Bandwidth with CDS Figure 5. OC-192 Design Partitioning SRAM and SDRAM Blocks CDR Circuitry PMD Device (1) Transceiver Framer Packet Processing CDS System APEX II Device APEX II Device APEX II Device APEX II Device Packet Processing Switch Fabric Host Processor Note to Figure 5: (1) PMD: physical medium dependent. Because CDS enables easier design partitioning, it is also useful for ASIC prototyping. In many cases, a designer takes advantage of the flexibility and easy reconfiguration of programmable logic to prototype a design, and then moves a very large or extremely high-volume design to an ASIC. Since the ultimate capacity of a standard-cell device is larger than that of a programmable logic device, the designer will partition this design into multiple PLDs. As discussed earlier, this may lead to inefficient use of the logic within these devices. By using CDS, the designer can implement the required data transfer between devices and use the full logic capacity of the PLDs. Altera Corporation 9 AN 162: Increasing System Bandwidth with CDS Summary Increasing demand for data services has driven higher bandwidth requirements for system designers. Differential signaling has been successfully used to address this need. CDS builds on the success of differential signaling, giving designers more flexibility in the design of their boards and of their overall systems. By using CDS in APEX II devices, designers can enhance their systems to provide flexibility and performance. 101 Innovation Drive San Jose, CA 95134 (408) 544-7000 http://www.altera.com Applications Hotline: (800) 800-EPLD Customer Marketing: (408) 544-7104 Literature Services: lit_req@altera.com Altera, APEX, APEX II, and specific device designations are trademarks and/or service marks of Altera Corporation in the United States and other countries. Altera acknowledges the trademarks of other organizations for their respective products or services mentioned in this document. Altera products are protected under numerous U.S. and foreign patents and pending applications, maskwork rights, and copyrights. Altera warrants performance of its semiconductor products to current specifications in accordance with Altera’s standard warranty, but reserves the right to make changes to any products and services at any time without notice. Altera assumes no responsibility or liability arising out of the application or use of any information, product, or service described herein except as expressly agreed to in writing by Altera Corporation. Altera customers are advised to obtain the latest version of device specifications before relying on any published information and before placing orders for products or services. 10 Copyright 2001 Altera Corporation. All rights reserved. Altera Corporation Printed on Recycled Paper.