Elsevier Science 1 Measuring propagation delay over a coded serial communication channel using FPGAs P.P.M. Jansweijer,a* H.Z. Peek,a a Nikhef, Science Park 105, 1098 XG Amsterdam, Netherlands Elsevier use only: Received date here; revised date here; accepted date here Abstract Measurement and control applications are increasingly using distributed system technologies. In such applications, which may be spread over large distances, it is often necessary to synchronize system timing and know with great precision the time offsets between parts of the system. Measuring the propagation delay over a coded serial communication channel using Serializer/Deserializer (SerDes) functionality in FPGAs is described. The propagation delay between transmitter and receiver is measured with a resolution of a single Unit Interval (i.e. a serial link running at 3.125 Gbps provides a 320 ps resolution). The technique has been demonstrated to work over 100 km fibre to verify the feasibility for application in the future KM3NeT telescope. . © 2010 Elsevier Science. All rights reserved PACS: Enter PACS number(s) here "Keywords: Timing;Network Timing;Timing calibration;Propagation delay;FPGA;8B10B;8B/10B;Ethernet;IEEE1588;Neutrino Telescope" 1. Introduction Measurement and control applications are increasingly using distributed system technologies. In such applications, which may be spread over large distances, it is often necessary to synchronize system timing and know with great precision the time offsets between parts of the system. An example of an application is a Very Large Volume neutrino Telescope to study astrophysical sources. Such a telescope consists of a large threedimensional array of optical modules which contain light-sensitive Photo Multiplier Tubes (PMTs). The photon detection time, with a sub nanosecond resolution, is sent via a serial communication channel to a place where data is accumulated and further processed into a format suitable for physics analysis. ——— * Corresponding author. Tel.: +0031-20-5922039; e-mail: peterj@nikhef.nl. Elsevier Science 2 For a serial communication channel the obvious time resolution is as small as a single symbol which is defined as one bit-time (Unit Interval). Often the time resolution is degraded, because available hardware supports only the boundary of a code– group or even a couple of code-groups defining an ordered set. The technique described here can be efficiently implemented in FPGAs. It operates on a serial point to point communication channel and has a resolution of a single Unit Interval. For example a 3.125 Gbps serial link provides a 320 ps resolution. 2. Communication channel 2.1. Serial communication channel coding The industry widely adopted 8B/10B coding [1, 2] as a standard for serial data transport. This coding scheme solves three important properties that are a concern for any serial connection. 1. Bit synchronization: It combines “clock” and “data” in one single serial stream by limiting the maximum run length. This results in enough edges in the serial stream, such that the bit-clock can be recovered at the receiver. 2. DC-Balance: To minimize transmission errors, DC-Balance is achieved by mapping 8 bit characters into 10 bit code-groups. There are not enough DC balanced code-groups to map each 8 bit character directly onto one 10 bit code-group, therefore many of the 8 bit characters are mapped onto two 10 bit code-groups; one positive with more ones, the other negative with more zeros. Running disparity balances the number of positive and negative code-groups. 3. Special code-groups: For word synchronization and out-of-band information there are a number of special 10 bit code-groups available that are not used for in-band data transmission. 2.2. Transfer of timing information For timing information transfer, property 1 and 3 are of special interest. The recovered bit clock is used as the local clock at the receiving side (RX) and locked (property 1) to the transmitter clock (TX), ensuring a high accuracy. The two locked clock domains are isochronous, which means that there is an uncalibrated timing offset between them. This offset can be measured by sending a timing marker signal forth and back. A special code-group (property 3) can be used as a timing marker signal. 2.3. Serial communication synchronisation The serial communication is based on transferring words (built out of 1, 2 or 4 code–groups) at the system clock speed. The transmitter serialization architecture contains a PLL that multiplies the system clock to the serial bit rate. Initially the receiver is using a local free running clock which is locked onto the transmitter system clock after synchronisation. To completely understand the lock mechanism of a coded serial communication channel one needs to understand the steps that take place during synchronisation. First, a local free running clock at the receiver is used to set the centre frequency of the PLL in the clock data recovery circuit at the nominal bit-rate of the serial data stream. Second, when a stable serial bit stream is received the phase detector input of the PLL in the clock data recovery circuit switches from the local free running clock to the edges of the received bit stream. After the PLL acquired lock to the received bit stream, it has locked at a random bit position. The next step is to accomplish word synchronization. During synchronization the transmitter sends special code-groups from which the receiver can recognize the word boundaries. The deserialization architecture in the receiver aligns the data in the serial to parallel register. This process is called bit-slip which is achieved by a barrel shifter. Word synchronization is reached when the synchronization code–groups are recognized and have the proper position within the word. The phase relation between the transmitter system clock, the communication channel and the receiver recovered clock is determined by the number of barrel shifts needed to properly align the receiver. Elsevier Science 3 3. Verification of the propagation delay measurement principle 3.1. Test setup description In order to demonstrate the propagation delay measurement principle a test setup was build that is able to measure the propagation delay of a serial communication channel. The data-link layer (OSImodel [3]) of the communication channel uses 8B/10B coding and transfers 20 bit words (two 10-bit code-groups; figure 1) at a speed of 3.125 Gbps therefore the system speed is 156.25 MHz (6.4 ns period). figure 1: Timing marker over an 8B/10B coded Channel A “Start” pulse is generated when the transmitter sends a marker signal, which initiates a time-interval measurement. A “Stop” pulse is generated when the receiver has decoded the marker signal. In order to simplify the test setup an oscilloscope is used to measure the time between “Start” and “Stop” pulse. The test setup (figure 2) consists of two separate FPGA evaluation boards connected by a test-bed to create a 100 km long communication channel. figure 3: Propagation delay measurement using the test-bed The “Start”/”Stop” delay measurement has a resolution of 6.4 ns. The “Stop” signal at the receiver is in the receiver recovered clock domain. The link propagation delay can be computed by the number of barrel shifts multiplied by the Unit Interval (320 ps) added to the measured “Start”/”Stop” delay. Thus the link propagation delay has a resolution of one Unit Interval. figure 4 shows the resynchronization behaviour of three “Stop” signal measurements. The oscilloscope is triggered on the “Start” signal. The picture shows a measured propagation delay of 491.547.440 ps at the rightmost trace where the number of barrel shifts is zero (“00000”) as shown by the LEDs. The leftmost trace where the number of barrel shifts is three (“00011”) shows the “Start”/”Stop” delay that corresponds to the link delay minus 960 ps (three times a Unit Interval of 320 ps). figure 2: Propagation delay measurement test setup The transmitter is implemented in a Lattice LFSCM25 FPGA [4, 5] and the receiver is implemented in a Xilinx Virtex-5 FPGA [6, 7]. figure 3 shows a picture of the real test setup where the various sub parts are labelled for reference. Note the two spools that contain 50 km of fibre each. figure 4: Results for a connection over 100 km of fibre Elsevier Science 4 The measured link delay of ~ 491 s for 100 km of fibre corresponds to a propagation speed of 4.91 ns/m. In most systems the main factors that determine the propagation delay of the communication channel only vary slightly and slowly. To trace slow varying processes a calibration sequence can be done every now and then. In the test setup, the propagation delay varied slowly over time because of temperature fluctuations of the 100 km fibre and the stability of the transmitter crystal oscillator. After a short warm-up period the propagation delay stayed at least stable within 320 ps for a couple of minutes; enough to do a repeated measurement of serial link re-synchronization sequences to verify the measured “Start”/”Stop” delay and the number of barrel shifts. 3.2. Implementation remarks Not all SerDes de-serialization architectures facilitate user controllable bit-slip. However by using a SerDes in transparent mode (bypassing code-group alignment) the bit-slip facility as well as the 8B/10B decoder could be implemented in FPGA fabric. The propagation delay measuring principle using SerDes barrel shift was verified in Xilinx Virtex-5 (using “RxSlide”) [8] and Altera Stratix-IV-GX (using “Rx_BitSlip”) [9, 10, 11] devices. In general new FPGA families support manual bit-slip. Older devices such as the Lattice “SC/M” family [12] do not provide manual bit-slip. 4. Network hierarchy Standard IEEE 1588 [13] describes a protocol for network based measurement and control applications. The time resolution of standard IEEE 1588 can be as high as a single Unit Interval when the described time stamping technique is implemented on the data-link layer. Acknowledgment This work is part of the research program of the 'Stichting voor Fundamenteel Onderzoek der Materie (FOM)', which is financially supported by the 'Nederlandse Organisatie voor Wetenschappelijk Onderzoek (NWO)'. References [1] A.X. Widmer, P.A. Franaszek. A DC-Balanced, PartitionedBlock, 8B/10B Transmission Code. IBM J. Res. Develop, Vol. 27, No 5, September 1983. [2] IEEE Std 802.3, “Carrier sense multiple access with collision detection (CSMA/CD) access method and physical layer specifications”, chapter 36 [3] International Standard ISO/IEC 7498-1. Open Systems Interconnection, Basic Reference Model [4] http://www.latticesemi.com/products/fpga/sc/index.cfm [5] http://www.latticesemi.com/products/developmenthardware/f pgafspcboards/scpciexpressx1evaluationb.cfm?source=sideba r [6] http://www.xilinx.com/products/virtex5/ [7] http://www.xilinx.com/products/devkits/HW-V5-ML507UNI-G.htm [8] Xilinx UG198 Virtex-5 FPGA RocketIO GTX Transceiver, User Guide [9] http://www.altera.com/products/devices/stratix-fpgas/stratixiv/stxiv-index.jsp [10] http://www.altera.com/products/devkits/altera/kitsignal_integrity_sivgx.html [11] http://www.altera.com/literature/hb/stratixiv/stx4_5v2_01.pdf [12] http://www.latticesemi.com/documents/DS1005.pdf [13] http://ieee1588.nist.gov/