1 SerDes Transceivers for High-speed Serial Communications Dianyong Chen, Shoujun Wang, and Tad Kwasniewski 1 Introduction In order to process and redistribute digital information, data are constantly exchanged between different systems and also between different functional blocks inside a system. Serial communications and parallel communications currently and historically coexist and serve various requirements of intrasystem and intersystem data exchange. In parallel communications several symbols are sent at one time over a communications link, while in serial communications only one symbol is sent at one time. The choice of one method over another is usually a tradeoff on factors such as speed, cost of materials, power consumption, and difficulty of physical realization. In modern telecommunication systems and computer systems, paralell communications and serial communications are often used simulataneously. Therefore, it is an important task to serialize and deserialize data stream. 2 SerDes Transceiver for High-Speed Serial Communications In principle, parallel communications are instrinsically faster than serial communications, because the speed of a parallel data link is equal to the number of symbols sent in parallel times the symbol rate of each individual path; doubling the number of symbols sent at once doubles the data rate. For this reason, parallel commuications are widely used in internal buses of integrated circuits and short distance chip to chip links. However, contradicting to superficial instincts, parallel communications are being replaced by serial communications in high-speed data links. These links include chip to chip communications on backplanes, computer networks, computer peripheral buses, long-haul communications, and etc. A conventional reason to choose serial communications instead of parallel communications is cost. In long-haul communication systems, copper cables or optical fibers are expensive; doubling the number of cables or fibers doubles the cost. In chip to chip communications, paralell data links require more pins, which increases the cost of packaging. According to [1] packaging already represents 25% of the total system cost in some electronic products. Nevertheless, the main challenges that deprecate parallel communications in these applications are clock skew [2], [3], data skew, and crosstalk. Skew is the difference in arrival time of symbols transmitted at the same time. Symbols are basically electromagnetic pulses. Because no electromagentic wave can travel faster than the free space light, the time it takes for a signal to travel from the transmitter to the receiver is determined by the length of the electrical trace or optical trace and the group velocity of the signal. Although the difference of arrival time of signals along different pathes is usually very small, it can lead to considerable phase difference in high- speed data links, since the frequency is very high. For example, 1 centimeter path difference causes 240 degree phase difference for 10 GHz clock signals traveling with a velocity that is a half of the free space light speed. Capacitive coupling, components delay, and process, voltage, temperature (PVT) variation deteriorate clock skew and data skew. Clock skew can be corrected by delay-locked-loop (DLL) composed of a variable delay line (VDL) and a control loop due to the periodical nature of the clock signal [4]. In principle, data skew can also be corrected. However, due to the large number of links and analog nature of the received signals, data skew is much more troublesome in parallel data links. As a consequence the system has to slow down to wait for the path with the largest delay. Crosstalk is the interference between adjacent data links. When data rate and the number of links increase, crosstalk also tends to increase. In addition, connectors and vias break the continuity of electromagnetic fileds, and increase the chance of crosstalk [5]. Therefore, in high-speed data links, serial communications are replacing parallel communications rapidly. High-speed serial data links include backplane links such as PCI express, computer networking such as ethernet, computer to peripheral devices such USB, multimedia interface such as HDMI, computer to storage interface such as serial ATA, serial attached SCSI, higsh speed telecommunications such as SONET and SDH. On the other hand, internal buses of integrated circuits and short distance chip to chip data links use parallel communications to increase data transfer rate and signal processing speed. In addition, massive data are usually stored in slow devices such as RAM; they have to be accessed in parallel to achieve high-speed. A SerDes transmitter serves to transmit those parallel data to the receiver through a highspeed serial data link; the SerDes receiver receives data from the serial data link and delivers parallel data to next stage electronic circuits for further signal processing. A simplified SerDes transceiver is shown in Fig.2.1. Physical Channel Data Source Source Encoder Channel Encoder Tx Buffer PISO Clock Physical Channel Rx Buffer Bit Detection SIPO Channel Decoder Source Decoder Data Sink Clock Fig. 2.1. A block diagram of a simplified SerDes transceiver The data source is basically a set of user information to be transmitted. It may be files, or audio video streams, etc. The 2 data source usually has some kind of paralellism and the unit is often byte, or word, or double-word. Before transmssion, source encoding is usually performed. Tasks in the source encoding include framing, synchronization (SYNC) patterning, forward error correction (FEC) encoding and etc. These tasks sometimes involve very complicated algorithms and paralellism must be used to increase processing speed. The channel encoding is usually arranged in such a way that after channel encoding the data spectrum becomes a better fit to the physical channel and the bit detection in the receiving end becomes easier. For example, the prevailing 8B/10B channel coding achieves a maximum run length of 5 and maximum digital sum variation of 6 [6]. The output of the channel encoder is then fed to a parallel-in-serial-out (PISO) block to generate a serial symbol stream. Symbols in the stream may be binary or multilevel. The representation of symbols is sometimes called line code, or signaling, or channel pulse modulation. Typical binary line codes are non-return-to-zero (NRZ), non-return-to-zero-inverted (NRZI), Manchester code (return-to-zero), etc. Typical multilevel line codes are duobinary (3 levels), 4 level pulse amplitude modulation (PAM-4), etc. The serial symbol stream is sent to a transmitter driver, or Tx buffer to convert to proper electrical pulses or optical pulses that can travel through the physical channel to the receiving end. The PSIO and Tx buffer are usually controlled by a symbol rate (or baud rate) clock. In very high-speed serial data links, this clock is generated by a clock multiplier unit (CMU) out of a crystal oscillator. In some applications, a Tx equalizer (pre-emphasis) is implemented before the Tx buffer to counteract some channel impairments. There are many sources of impairments in the physical channel. Those impairments include frequency dependent attenuation, frequency dependent delay (dispersion), crosstalk, reflection, and noise. Those impairments may be nonlinear and time-variant. However, in engineering practice, a linear time-invariant mathematic model describing the transceiver is still very useful. Fig.2.2 shows a linear mathematic model of SerDes Transceivers. noise n(t) ak c(t) f(t) Pulse Modulator Transmission Line S r(t) rk Fig. 2. 2. A linear mathematical model describing the transceiver The input data stream ak is the output of the PISO. It is a digital sequence which is discrete both in time and amplitude. When NRZ signaling is used, ak {0,1} or ak {-1,+1}. The sampling rate is the baud rate fbaud or clock frequency. The received signal r(t) is a continuous time analog signal. r (t ) a h(t iT ) n(t ) i i (2.1) where h(t) is the time domain channel impulse response. Here the channel includes the pulse shaping modulator of the transmitter c(t), the physical channel response f(t), and the Tx equalizer response and Rx equalizer response if they are involved. The receiver use a clock to sample the received signal and make a decision to know what is received. rk r (kT ) a h(k i)T n(kT ) i i (2.2) The input data sequence ak is usually quite random. Assume there is no additive noise n(t), the received signal rk can be rewritten as k 1 rk ak h( ) ai h(k i )T i i k 1 (2.3) It can be seen from equation (2.3) that the sampling phase is very important. If we sampled at a phase where h( ) is zero, it would be very difficult to recover data. It is desirable to sample at where h( ) is at its maximum. The second part of equation (2.3) depends on the sequence before and after ak. It is rather unpredictable because the input sequence is random. This part is usually called inter-symbol-interference (ISI). It has been shown that if the channel response is a Nyquist N-1 pulse, the ISI can be eliminated completely [7]. A Nyquist N-1 pulse is defined as 1, when n 0 h(nT ) 0, when n 0 (2.4) where n is an integer. However, the response of a physical channel is usually not a Nyquist N-1 pulse, especially when there is little exess bandwidth. The task of an equalizer is to shape the combinational channel response to a Nyquist N-1 pulse. Unfortunately, this is not always possible if noise is taken into account. A Nyquist N-1 pulse has a flat spectrum across the passband. The additive noise is somewhat white. A physical channel response however tends to have large attenuation at high frequency. It even has zeros at some frequencies because of reflection and resonance of parasitics. If the target channel response was a Nyquist N-1 pulse, the equalizer would have enormous gain at frequencies with large attenuation (or even infinite gain at zeros). As a consequence, the noise is boosted much more than the signal at those frequencies. This is called noise enhancement. In practice, linear least-mean-square (LMS) equalization method or nonlinear decision feedback equalization (DFE) is used. Ironically there is no signal power at the clock frequency, although the sampling clock is so important to data recovery. Most high-speed serial data links are baseband system despite of the very high baud rate. Fiber optical systems such as SONET are indeed modulated system. However, its modulation method and demodulation method are noncoherent. It is usually treated as a baseband system. In Fig.2.2 the pulse shaping modulator is usually a hold function of one symbol period. The power spectral density (PSD) of the received signal is expressed as Pr (e jT ) C (e jT ) F (e jT ) 2 (2.5) 3 where A is the PSD of the input sequence, C and F are the Fourier Transform of c(t) and f(t). Since c(t) is just a hold function of one symbol period. Its Fourier Transform is C (e jT ) T sin c(fT ) 1 f baud sin c( f f baud ) (2.6) Equation (2.5) and (2.6) show that the received signal power is zero at integer multiples of the clock frequency. Therefore, it is not possible to recover the clock information with linear method. However, in high-speed serial data links, there is no exclusive clock signal. The clock signal has to be derived from the received data waveform. A clock and data recovery (CDR) circuit is a nonlinear circuit that can recover the clock from the received data waveform which has zero power at the clock frequency. The tasks of a CDR are roughly the follows [8]: Generate a clock whose frequency is the baud rate. Adjust the phase of the clock so that it can sample the received data waveform at the time instants when the waveform has maximum signal to noise ratio (SNR). Recover data with high accuracy in the presence of noise and ISI. Since there is no signal power at the clock frequency and its harmonics, a CDR can lock to the clock frequency or its harmonics. It does not know which one is correct if without some preknowledge of the data sequence. One method to prevent this is standardization. For example, a single rate CDR for SONET STS-48 should lock to 2488.32 MHz. Therefore, a predefined reference clock can be used. Another method is to have preknowledge of the shortest runlength and the longest runlength of the data source. A counter is used to detect the runlength of a sequence. If the detected shortest runlength is shorter than the predefined shortest runlength, the clock is slower than the data rate; if the detected longest runlength is longer than the predefined longest runlength, the clock is faster than the data rate [9]. Implementation of SerDes transceiver in monolithic microelectronics integrated circuits has much more to consider than the issues mentioned above. In this chapter, we review the representative implementations of main building blocks of SerDes transceivers for high-speed serial data links. 3 Design challenges It is desirable to implement SerDes transceiver in mainstream CMOS technologies because of their low cost and low power consumption. CMOS circuits are typically slower than circuits implemented in non-mainstream technologies [10]. Although the speed of CMOS circuits is constantly getting faster and the power-consumption becomes lower when scaling down, new circuit styles and architectures enabling low power and high-speed are still very critical for high-speed serial data links. SerDes transceivers are predominantly mixed signal circuits. With the scaling down of feature size, the supply voltage and threshold voltage drop accordingly, leaving less voltage headroom for signal processing. In the meantime, substrate noise tends to increase [11]. Furthermore, most recent technologies are usually optimized for digital baseband signal processing and lack of accurate models for analog signal processing. Therefore, mixed signal processing is usually vulnerable to PVT variations. Many recent attempts in SerDes transceiver design are to replace analog blocks with digital ones to make the transceiver more robust against PVT variations and noise. 4 Circuit Styles A high-speed SerDes transceiver is a mixed-signal system as shown in Fig.2.2. On the one hand, it takes a digital sequence ak from the host and passes a digital sequence rk to the client, performing digital signal processing when it is necessary; On the other hand, it must be able to drive the physical channel. The physical channcel distorts and adds noise to the digital signals that travel through it. The received signals become analog. Data recovery relies on a locally regenerated clock and proper sampling. In addition, a high-speed SerDes transceiver is usually a sub-system of a large system, or/and it is used for portable devices. Thus low-power consumption is very critical. Therefore, circuit style must be tailored to meet these requirements. 4.1 High-Speed and Low-Power Digital Circuits In a high-speed SerDes transceiver, not only source encoder/decoder and channel encoder/decoder are digital circuits, but also phase detector, phase frequency detector, and frequency divider are usually digital ones. Some of these digital circuits run at baud-rate. Therefore, digital circuits must be sufficiently fast while keeping power consumption sufficiently low. The main factor that limits the speed of a digital circuit is the capacitive load and parasitic capacitive load. The voltage change on a capacitor is be written as V I T Q 1 t0 T i (t ) dt a t C C 0 C (4.1) where C is the total load capacitance, T is the time it takes to change the voltage on the load capacitor with an amplitude of V, and Ia is the average current to charge or discharge the load capacitor. To make a circuit fast, the time T must be sufficiently small. T V C Ia (4.2) It is quite straightforward how to make a circuit faster: to reduce the voltage swing V, or/and make the load capacitance C smaller, or/and increase the charging current Ia. A large logic family exploits these fundamental methods to make digital circuit faster. For example, Pseudo-nMOS and Domino-Logic exclude pMOS capacitance from the input, because pMOS input capacitance is usually 23 times as large 4 as nMOS input capacitance if they provide the same current. Technology scaling down reduces capacitance as well. 4.1.1 Static Rail-to-Rail CMOS Logic Static CMOS rail-to-rail logic is by far the most commonly used type of logic ciruit. Despite the very high speed, CMOS rail-to-rail logic is still extensively used in some high-speed SerDes transceivers. The reasons are technology scaling down, reduced power supply voltage, and simplicity, maturity, and robustness of static CMOS logic. Static CMOS logic has a pull-up network and a pull-down network. At any time except transitions, either pull-up network is turned on to pull the output to the power supply voltage or pull-down network is turn on to pull down the output to ground. Since pull-up and pull-down can not be turned on simultaneously except during transitions, in principle satic CMOS logic consumes zero static power. Therefore, static CMOS logic exhibits extremely low power consumption at low frequency applications. The speed and power consumption of static CMOS logic in high speed applications can be roughly estimated by using two serially connected inverters as shown in Fig.4.1. VDD VDD P1 Vin Vo1 P2 IP1 RP1 Vo2 Parasitic 1 N1 N2 IN1 Cgp2 Cgn1 Parasitic 2 RN1 Fig.4.1. Two serially connected inverters and the equivalent circuit of the first stage Assume the initial input signal Vin is high, thus pMOS transistor P1 is switched off, nMOS transistor N1 is switched on, and the voltage Vo1 is low. Let’s further assume the input signal has very sharp edges and sufficiently large driving capability, then when Vin jumps from high to low, the time it takes to turn on P1 and turn off N1 is negligible. The voltage Vo1 is pulled up to VDD through transistor P1. However, it can not change abruptly as it has to drive the gate capacitance of P2 and N2, as well as parastic capacitance from the four transistors. When Vo1 increases, VDS of transistor P1 reduces until the channel is not pinched off. Transistor , P1 falls into triode region and the charging current reduces. When Vo1 reaches VDD, the energy (W) stored in the gate capacitors and parasitic capacitors (Ctotal) is 1 1 I T W Ctotal VDD 2 2 2 Ctotal 2 a 2 (4.3) Since VDD charges the capacitors through the channel of P1, some engergy is consumed by the channel resistance. When the input Vin goes from low to high, pMOS transistor P1 is switched off, and nMOS transistor N1 is switched on. Assume this process is sufficiently fast, then power supply VDD is cut off from the capacitors abruptly so that the power consumption caused by short-circuit effect is negligible, and it provides no energy during the process of discharging the capacitors. However, the stored energy W is completely consumed by the channel resistance of transistor N1 when the capacitors are discharged to the ground. The energy consumption for an input cycle is the sum of W and the energy dissipated in charging process. The average power consumption of an inverter can be estimated as P VDD I a T f Ctotal VDD 2 f (4.4) Some conclusions can be drawn on the basis of the simple analysis. Firstly, static CMOS logic has to drive the input capacitance of the pull-up network and the input capacitance of the pull-down network simultaneously. The pull-up network is composed with pMOS transistors and has larger capacitance. According to equation (4.2), this slows down the circuit because of the large capacitance; Secondly, static CMOS realizes a rail-to-rail output. According to equation (4.2), this also slows down the circuit because of the large swing. According to equation (4.4), this greatly increases the power consumption because of the large swing; Thirdly, according to equation (4.4), static CMOS logic consumes much power at high frequencies because the power consumption is proportional to switching frequency. Last but not least, static CMOS logic is sensitive to common mode noise because it is not differential. Therefore, high-speed CMOS digital design favors current mode logic (CML). An important observation from the two serially connected inverters is that the output of the first inverter has finite slew rate. This is different from our previously assumption that the input to an inverter has very sharp edges and infinite driving capability. Therefore, the second inverter will not switch instaneously, and additional delay is added. This realistic consideration applies to all digital circuits. 4.1.2 CML Logic CMOS CML logic is based on differential pairs which is shown in Fig.4.2. At the first glance, we find there are no pMOS transistors. Therefore, it is potentially faster than static CMOS logic. It is fully differential. Therefore, it has excellent immunity to common mode noise. When the input voltage vin is sufficiently large, one of the two branches can be switched off, while the other takes all the tail current I0. The minimum input voltage can be derived using the following equations. I1, 2 Cox W 2 Vgs1, 2 Vth 2 L 2 (4.5) I1 I 2 I 0 (4.6) Vin Vgs1 Vgs 2 (4.7) Solving equation (4.5)(4.7) leads to an expression of I1 (or I2) [F.3]. The minimum voltage that can fully switch the differential pairs is given when this current is equal to I0. It is min( Vin ) 2I 0 Cox (W / L) The voltage swing is (4.8) 5 V V (i 0) V (i I 0 ) RI 0 (4.9) The voltage swing is the product of the load resistance and the tail current. Therefore, it is possible to reduce the voltage swing to improve the speed of the circuit. However, excessive reduction of voltage swing reduces the noise margin. In addition, it may not be able to fully switch the next stage differential pairs. P VDD I 0 Obviously CML logic consumes static power. However, to the first order estimation, the power consumption is independent on frequency. Therefore, CML is suitable for high frequency applications in terms of speed and power consumption. 4.2 VDD R I2 I0 R 1 Vout I1 I2 N1 N2 0.5 Vin I0 Vin Fig.4.2. Differential pairs in CMOS CML Logic Similar to static CMOS logic, the speed and power consumption of CMOS CML logic can be estimated by using two serially connected inverters. We still assume the input to the first inverter has very sharp edges and sufficient driving capability, then to the first order approximation, the output of the first inverter is essentially a step response of charging or discharging a capacitor with a current source of finite internal impedance. The change of the output voltage in a branch is written as Vo1, 2 (t ) RI 0 (1 e t RC ) (4.11) Driving Circuits and Impedance Matching Generally a high-speed SerDes transceiver must drive a high-speed channel that is usually much longer than the representative wavelength of the signal. If the impedance of the driver does not match the characteristic impedance of the channel, the driver is unable to provide maximum power to the channel because of reflection in the transmitter side; If the characteristic impedance of the channel does not match the impedance of the terminal, the channel is unable to deliver maximum power to the terminal because of reflection in the receiver side. If there is mismatch in both sides, then some energy reflected from the terminal will experience another reflection in the transmitter side. It tooks some time (T) for this energy to complete this round-trip and suffer some loss. When it comes back, it addes to the signal that is sent at T later. Therefore, impedance matching is important to highspeed SerDes transceivers. However, this increases power consumption because practical channels have low impedance. Industrial efforts on these topics lead to many standards. Low voltage differential signaling (LVDS) and CML are the two most popular standards. VDD VDD V- V+ V+ V- VDD (4.10) where R is the load resistance and C is the load capacitance of the first inverter. Fast switching only relies on small RC. However, to maintain the voltage swing, the tail current I0 has to increase. The speed of differential pairs can be enhanced by using inductive peaking technique. Physically this is quite straightforward. When all tail current is switched to one arm, the additional current is provided by the load resistor and the load capacitor. Since the voltage on a capacitor can not change abruptly, at the very beginning the additional current is almost sololy provided by the load capacitor and the output voltage changes quickly. However, when the output voltage drops, the current provided by the load resistor increases and the current provided by the load capacitor decreases. The slew rate of the output voltage drops. Inductive peaking technique connects an inductor and the load resitor in cascade. The nature of an inductor is that it always tries to hold back the change of current. Therefore, the current provided by the load resistor decreases and the current provided by the load capacitor increase, which helps to quickly drain off the charges stored on the load capacitor, leading to fast change of the output voltage. The power consumption of a CMOS CML inverter to the first order can be estimated in a quite simple way. Receiver I0 Ib Fig.4.3. A diagram of representative LVDS signaling The main advantage offered by LVDS is its low voltage swing of 250–400 mV, which allows for high-speed interface operation at a very low level of power consumption. In addition, true differential signalling increases the interface’s tolerance to ground mismatch between transmitter and receiver. It also improves signal EMI immunity and compliance [E.1]. Fig.4.3 shows a representative LVDS configuration. In the transmitter side, the driver is configured in a push-pull topology. Matched impedance is added in front of the receiver buffer. In high-speed SerDes transceivers, the signals traveling through the channels are broadband signals. It becomes harder and harder to achieve full band impedance matching when parasitics are considered. Therefore, impedance matching in only one side can not guarantee small reflection in very high speed. For this reason, impedance matching is desirable in both the transmitter side and the receiver side, and the topology of LVDS signaling becomes very similar to CML signaling. 6 A respresentative configuration of CML signaling is shown in Fig.4.4. Impedance matching is added to both the transmitter and the receiver. Since CML consumes static power, it is quite popular to switch off the driver if not in use. VDD R VDD R R R VDD RL RL Vin I0 I1 logic circuits can used for those stages not requiring very high-speed. Shift Register is very useful for PISO and SIPO with a large ratio. In PISO parallel data are loaded to shift registers when a selecting signal is enabled. The parallel data are shifted out in every baud clock cycle when the selecting signal is disabled. In SIPO, serial incoming data are sampled and shifted in every baud rate clock. A lower frequency clock is used to sample the output of each register. Therefore, the outputs of all registers are synchronized. In order to make PISO and SIPO work, high-frequency clock (baud-rate clock) and low-frequency clock (clock for parallel data) are needed. Therefore, a frequency divider and a clock multiplier are needed. Fig.4.4 A block diagram of CML signaling If channel impairments are negligible and clock and data can be recovered without receiver equalization, high-speed SerDes receiver buffers can be constructed using nonlinear amplifiers such as limiting amplifiers and sense-amplifiers. If receiver equalization is needed, then linear amplifiers are wanted. 5 PISO and SIPO As discussed in section 2, in the transmitter side, user data are in parellel, a PISO block is needed to convert these parallel data into serial ones to make it possible to transmit them via a high-speed channel. In the receiving end, a SIPO block is needed to reduce the bit rate for the back-end circuit to perform further signal processing. It is straightforward that SIPO has a tree type structure and PISO has a reversed tree structure. Control Logic (a) One Stage Control Logic Control Logic (b) Heterogeneous (c) Binary T ree Fig.5.1 Typical PISO and SIPO topologies Typical topologies of PISO and SIPO are shown in Fig.5.1. The one stage structure is slow due to large parasitic capacitance at the converging node. The heterogeous structure is faster, the binary-tree topology is the fastest. For this reason, 2:1 multiplier (MUX) and 1:2 demultipliers (DEMUX) are important elements for high-speed SerDes transceivers. In high-speed SerDes transceivers, there are usually tens of input ports of PISO and tens of output ports of SIPO. Therefore, in a binary-tree topology, some stages require very high-speed circuits; while some stages do not require very high-speed circuits. Therefore, for those stages requiring very high-speed, CML logic circuits are used; while static CMOS 6 Clock Multiplier Unit In a high-speed SerDes transceiver, high-speed clock is very important. Usually in the transmitter side, each symbol is generated under the control of a baud-rate clock; in the receiver side, a clock whose frequency is the baud-rate is needed to sample the received data at where SNR is the maximum. Real implementations may vary in some aspects. For example, the clock frequency can be lower than the baudrate if a multi-phase clock is used. Even though, a very high frequency clocks is still needed in a multi-gigahertz transceiver. It is quite often that the transmitter and the receiver share a high-speed clock. Phase of this clock should be adjusted in the receiver side, because the delay of the channel is usually a prior unknown and timing jitter and noise are added to the received data through the channel, making the sampling phase very critical. The quality of the high-speed clock greatly influences the transceiver performance. Therefore, it should be clean and accurate. A free-running microelectronic integrated oscillator can not meet the requirements. Therefore, the high-speed clock is synthesized from an accurate low-frequency oscillator such as a quartz oscillator whose frequency accuracy is within e.g. 20 ppm. Usually this frequency synthesizer is not required to generate a clock of any frequency within a frequency band. Instead a clock of a fixed frequency or a clock that can be programmed to a few discrete frequencies is wanted. It is called clock multiplier unit (CMU) in a high-speed SerDes transceiver. A CMU can be a common integer-N frequency synthesizer. A dominant and mature technique that is used in the design of a CMU is PLL. 6.1 Basic PLL-Based CMU Structure and Performance A representative structure of a PLL-based CMU is shown in Fig.6.1 (a). It is composed of a phase-detector, a charge-pump, a loop filter, a voltage controlled oscillator (VCO), and a divider. The charge-pump provides the necessary loop filter action. In classic PLL, the combination of the charge-pump and the RC network is usually replaced with an operational amplifier (op-amp). Although it is highly non-linear in practice, it is customary to assume linearity when analyzing loops that have achieved lock [H.14]. A Linearized model is 7 shown in Fig.6.1. In the model all variables are phases rather than the actual inputs and outputs. VCO Phase Detector Kd1 Clock Out Reference in S ´ e Kd2 s tz +1 s Charge Pump and Loop Filter ¸N KD KO/s out ¸N Divider (a) (b) Fig.6.1 (a) a block diagram of a PLL-based CMU (b) a linear model 6.1.1 Second-Order PLL Dynamics A PLL circuit is a feedback system that is designed to bring the phase error signal e to zero. For several reasons, a secondorder PLL is a good choice. The first reason is that theoretically a second-order PLL is unconditionally stable [F.1]. Higher order PLL however may lead to instability. In practice, a second-order PLL is not absolutely stable, because a practical phase-detector and divider are sampled systems [H.14]. In addition, there are many parasitic poles. The second reason to choose a second-order PLL is that a first-order PLL can not reduce the phase error e to zero unless the forward loop gain is infinitely large. The closed-loop transfer function of a second-order PLL-based CMU is written as: F (s) out 1 t zs 1 in N s 2 /( K D K O / N ) t z s 1 (6.1) where 1/tz is the frequency of the loop zero. For the convenience of analysis, we can omit N so that equation (6.1) is the close-loop transfer function for a classic second-order PLL. The phase error can be written as: e ( s / n ) 2 in ( s / n ) 2 s / z 1 (6.2) Phase noise or timing jitter is an unwanted input variation. We generally want it is attenuated by the loop. However, this idea contradicts our original purpose to force the VCO to track the input (reference) at low frequencies. Therefore, there is a tradeoff between them. In fact, the design of a PLL is a tradeoff among many contradicting requirements. A desirable feature is that a PLL genuinely copies the input to the output at low frequency but rejects the input at high frequency. However, jitter peaking is an unwanted feature of a secondorder PLL. The close loop transfer function or equation (6.1) has two poles and one zero. In a Bode-Plot, the transfer function magnitude goes up with a slope of 20 decibels per decade after the zero location. It exceeds unity if both poles are located at higher frequencies than the zero. A flat area with a magnitude above 0 dB appears after the first pole as the first pole contributes -20 dB/dec. The flat area ends and goes down to below 0 dB with a slope of -20 dB/dec after the second pole. In the frequency range where the magnitude exceeds one, jitter peaking appears. If the input jitter frequency is within this range, this jitter will be magnified in the output. Rejection of Noise Input jitter (reference noise) is only one noise source of a second-order PLL. Noise is generated in every component in a real circuit. In many linear models, noise is treated as additive. It is useful to look into the transfer function of those noise sources to know if the PLL attenuates or amplifies them. Seen from equation (6.1), the transfer function of the reference noise is of a lowpass nature but may have jitter peaking in a specific frequency range. The transfer function of phasedetector noise and charge-pump noise are similar to equation (6.1). The transfer function of the loop filter noise is out s / KD 2 in s /( K D K O ) t z s 1 where the natural frequency n is defined as: n K D K O / N Jitter Peaking (6.3) We can look into the properties of a second-order PLL by investigating the frequency-domain response and time-domain response. Response to Input Variations and Input Noise Equation (6.2) has two poles and one 2-fold zero, which represents a high-pass filter. The phase error e will be sufficiently small if the frequency of the input variations is significantly smaller than the natural frequency n. In another word, the difference between the input phase signal in and the output phase signal out is small. Therefore, the PLL loop tracks the input variations. At frequency much higher than the natural frequency, the phase error e will be the input phase signal in, which means the PLL loop does not response to the input; it almost rejects the input variations completely. Therefore, a fundamental property of a second-order PLL is that it tracks the change of input at low frequencies but rejects the change at high frequencies. (6.4) It exhibits a band pass nature. A time-domain step response can reveal much information. Assume this noise causes a frequency error of i, the time-domain phase-error can be expressed as [H.14] i e (t ) n 1 2 exp( nt ) sinh( n 2 1 t ) (6.5) where is the damping ratio defined as nt z (6.6) 2 The maximum phase-error max appears at tmax. max t max 2 1 exp tanh 1 ( ) sinh(tanh 2 1 1 i n 2 ( 2 1 ) 2 1 tanh ( ) 2 1 1 n 1 1 In the case of high dampening, they can be simplified. (6.7) (6.8) max t max 8 i (6.9) c 2 ln 2 (6.10) c where c is the crossover frequency (frequency where the open loop transfer function is 0 dB). It is intuitively obvious that the VCO disturbance (jitter) gets integrated over a period of time before the loop takes action to correct it. Jitter integration is inversely proportional to the loop bandwidth (roughly the crossover frequency). Therefore, rejection of loop filter noise requires high loop bandwidth. The transfer function for VCO noise is of a high-pass nature and is analytically expressed as out ( s / n ) 2 in ( s / ) 2 t z s 1 (6.11) The noise of a VCO is not white. In a LC resonator, the broadband thermal noise of passive components is shaped by the resonator Q and the normalized phase noise is inversely proportional to the square of frequency offset (). When the frequency offset is sufficiently large, the noise spectra flatten due to active elements (such as buffers). At sufficiently small frequency offset, the phase noise spectra possess a 1/()3 region. Leeson has proposed an empirical equation for VCO phase noise [F.2] 1/ f 3 2 FkT 0 2 L( ) 10 1 ( ) (1 ) (6.12) 2Q Psig Control Line Ripple and Higher-Order Poles A real PLL circuit is not linear at all. Therefore, a linearized second-order PLL model fails to represent a real circuit in some important aspects. One issue needs to be addressed is the ripple on the VCO control line. In the linearized model, it is assumed the phase-detector is a linear subtractor. Under lock condition, the phase-error is zero. Therefore, VCO remains undisturbed when the loop is locked. However, in a real circuit phase-detector or the combination of charge-pump and phasedetector may be highly nonlinear. For example, if the phasedetector is a multiplier-type, there will be higher-order mixing products; if the phase-detector is a bang-bang type, there are always pulses in the phase-detector output. Therefore, in some PLL circuits, there are ripple components on the VCO control line even under lock condition. In general, the ripple components are at high frequencies than the reference. Reducing bandwidth leads to higher attenuation at those high frequencies thus can be helpful. However, rejection of VCO noise and fast acquisition require a high bandwidth. A better solution is to introduce additional poles. Determining how many poles should be added and where those poles should be placed needs careful design. Too many poles can easily degrade the phase margin. It has been shown that one or two higher-order poles can attenuate the ripple by about an order of magnitude or more if they are placed around a factor of 4-7 above crossover [F.4]. Fig.6.2 (a) shows a charge-pump loop filter with one extra pole. Fig.6.2 (b) shows a classic loop filter with an extra pole. VCO Phase Detector Up Down Reference Charge sharing, current mismatch, and reference feedthrough of charge-pump can cause spurs in the PLL output [F.3]. The spurs directly go into the jitter budget of VCO. Therefore, they need careful design. Acquisition Time A rough definition of acquisition time is the time it takes for a free-running VCO to lock to the input. In some literatures, acquisition time is divided into frequency acquisition time and phase acquisition time. Assume a free running VCO is running at angular frequency and its phase is zero at the time instant t=0. The input signal has a frequency of + and its phase is 0 at t=0. According to the linearized second-order PLL model, the time-domain expression of the phase-error is ( ) sinh( 2 1 t ) 0 n i n e (t ) exp( n t ) n 2 1 0 cosh( n 2 1 t ) (6.13) Equation (6.13) is similar to equation (6.5). Therefore, a similar conclusion can be drawn, which is the acquisition time is inversely proportional to the loop bandwidth. Therefore, it generally requires a high bandwidth for fast acquisition. Phase Detector Clock Out Reference Charge Pump and Loop Filter ¸N where L is the normalized phase noise, F is an empirical factor, 1/f3 is a fitting parameter, Q is the quality factor of the resonator, and Psig is the signal power. VCO Clock Out Charge Pump and Loop Filter Divider ¸N (a) Divider (b) Fig.6.2 loop filter with one extra pole (a) charge-pump (b) active filter 6.1.2 Divider Delay and Phase-Detector Delay CMU based on integer-N frequency synthesizer needs a frequency divider in order to synthesize high frequency clock out of a relatively low frequency reference. The divider implies the phase detector is digital in nature. In fact phase detectors and phase frequency detectors for high-speed SerDes transceivers are almost digital ones. Although the loop filter and the VCO may be analog, continuous-time circuits, knowledge about phase error is available to the loop only at discrete instants. It usually involves a sample-and-hold (S&H) operation to convert a continuous-time signal into a discretetime signal. A zero-order hold (ZOH) function has a transfer function given by H ( s) u (t ) u (t T ) e st dt 0 1 e sT s (6.14) The phase of this transfer function is T T H ( j ) e jT / 2 T sin c( ) 2 2 (6.15) The ZOH function adds additional phases (delay) to the loop transfer function. The period of a divider output is usually 9 much larger than the period of the VCO clock. For this reason, its delay is more critical. Divider delay and phase-detector delay erode the phase margin of a PLL loop. As a consequence, the loop bandwidth is forced to decrease in order to avoid these effects. However, a reduced bandwidth may negatively influence settling time and noise performance. In practical implementation, phase comparison rate is set to about 10 times of the crossover frequency [H.14]. 6.1.3 Granularity Problems Since the PLL loop operates on a sampled basis and not as a straightforward continuous-time circuit, it has more stability problems than arise in continuous-time systems. In particular, an analog, second-order PLL is unconditionally stable for any value of loop gain, but the sampled equivalent will go unstable if the gain is made too large. Even a first-order digital PLL can be unstable [7]. It has been shown in reference [F.1] that a second-order PLL which is based on a classic tristate phasedetector and a charge-pump has a stability limit as K ' 1/ (1 ) it it (6.16) where i is the angular frequency of the reference, t=RC is the loop filter time constant, and K' is K' K O I p R 2C 2 (6.17) where Ip is the charge-pump current, KO is the VCO gain, R is a resistor in the loop filter, and C is a capacitor in the loop filter. For a first-order digital PLL, the loop gain should be smaller than 2. However, if a loop delay of M symbol intervals is introduced, the stability range is reduced to [7] 0 K 2 sin 2(2M 1) 6.1.4 (6.18) Digital PLL A critical problem of a conventional PLL-based CMU is its sensitivity to process variations, noise from power and substrate. Another problem is the limited voltage headroom associated with low-power, deep sub-micrometer CMOS process. In addition, the loop capacitor consumes chip area. Digital PLL is a solution to those problems. In a digital PLL, digital accumulator replaces the loop capacitor (integrator), and a DCO (digitally-controlled-oscillator) replaces the VCO. In general, LC oscillators have superior phase noise performance to ring-oscillators. However, it is difficult to make LC oscillators digital. A solution has been proposed in [F.5]. The DCO is a typical differential negative-resistance LC oscillator. Instead of one varactor, many varactors are used. The varactors are arranged in serval banks and are connected in parallel. CMOS varactors made with low-voltage deep submicrometer technologies exhibit very narrow linear tuning range. Interestingly the capacitance-tuning voltage curve looks like the input-output curve of a CMOS inverter. Therefore, the varactors can be made “digital” by setting the tuning voltage to two proper values. The differential varactor can be as small as a few attofarads (aF) [F.6]. Good frequency resolution can be achieved by switching a unit varactor on or off. Finer frequency resolution can be achieved by applying sigma-delta modulation to the unit varactor. The DCO enables a true phase-domain signal processing. Therefore, spurs due to nonlinearity is greatly suppressed [F.7]. In addition, the whole digital PLL can be retimed to the VCO clock. For this reason, the digital switching noise is mixed to become DC offset. The asynchronym between the VCO oscillation and system reference clock is compensated by using a time-to-digital converter (TDC). Tradeoffs in PLL-Based CMU Design A PLL-based CMU faces many conflicting requirements. The following table shows some tradeoffs of a PLL-based CMU design. High Bandwidth Rejection of input noise, PD noise, and charge pump noise Rejection of loop filter noise and VCO noise Fast acquisition Reduction of jitter integration Rejection ripple on VCO control line Improve loop stability against parasitic poles 6.2 Low Bandwidth Basic DLL-Based CMU Structure and Performance In PLL-based CMUs, the output clock is directly derived from the VCO oscillation, and the loop filter has lowpass filtering effect for the input. If the reference is noisier than VCO oscillation, there is an obvious advantage. However, in practice, the reference is much cleaner than the VCO oscillation in high-speed SerDes transceivers. Therefore, it is desirable to directly derive the output clock from the reference. This idea is a fundamental concept behind DLL-based CMU design. Basic DLL-based CMU structures are edge-combiners and cyclic reference injection ring oscillators. 6.2.1 Tunable Delay Cells Intrinsic delay of logic gates can be used in DLL. If N identical gates are serially connected, the total delay is N times the delay of a unit cell. Adding or removing one gate results in a change of the total delay. In high-speed SerDes transceivers, the delay of such a unit cell is not trivial compared with a symbol period. In addition, the delay is process dependent Therefore, this method can not provide very fine phase resolution and can only be used for coarse tuning [F.8], [F.9]. The beauty of this method is that it provides a kind of “digital delay”. If the unit delay provided sufficiently fine phase resolution, the problem of unit cell mismatch and process dependence would be solved by advanced digital signal 10 processing algorithms such as calibration. Fig.6.3 shows a CMU based on digitally tuned delay cells [H.31]. Phase Detector Reference Time to Digital Converter Digital Integrator Divider C[0] C[1] C[N] C[m] C[m+1] DCO VCDL has a total delay of 2, the phase error is zero and the DLL loop is locked. Assume the unit-delay-cells are identical, each stage will have a delay of 2/N. The edge combining logic circuits combines edges of each stage. The highest output clock will have a frequency that is N/2 times of the reference [F.14]. A fixed-ratio edge combing CMU is shown in Fig.6.4. C[m+N] Edge Combiner 0 Fig.6.3. A CMU based on digitally tuned delay cells Delay is a physical process. It can be roughly classified into two catalogues. The first is caused by charging or discharging a capacitive load. The second is caused by finite propagation speed. A representative of the second type may be a piece of transmission line. However, it is hard to tune its delay. The first type provides some freedom to tune the delay, since the time it takes to charge or discharge a capacitive load is determined by the current, the load capacitance and the voltage swing, we can tune the delay by adjusting the current or/and the capacitance if the voltage swing is fixed. VDD VDD VDD I Vctr Vin I Vctr 3 Clock Out Up Phase Detector Reference Down 0 1 2 3 4 5 Clock Out Fig.6.4. Fixed-Ratio edge combining CMU Despite the simplicity of the fixed-ratio edge combining CMU, it has a few problems. The first problem is its susceptibility to false lock to harmonics, because the reference is a periodic signal, a delay of 2k is equal to a delay of zero; furthermore, the unit-delay-cell usually has large tuning range. The second problem is its fixed ratio between input and output. The third problem is that it is sensitive to mismatch. In practice it is impossible to make the unit delay cells identical. As a consequence, the output clock will have strong pattern dependent jitter and duty cycle mismatch. Vout Vctr Vin 1 2 Vout Vout- Vctr Vout+ Vin+ Vout- A number of solutions can be used to prevent false lock to harmonics. One method is to use lock detector [F.15]. Information of lock to harmonics can be detected if phase information of each unit delay cell is used. Programmable Edge Combing CMU (b) (a) (c) Fig.6.4 some schemes to tune unit delay cells Fig.6.4 shows some schemes to tune the delay of a unit cell [F.10], [F.11]. Fig.6.4.(a) is a current-starved-inverter. The control voltage Vctr can control the current that flows through the inverter to the load capacitor to tune the delay. Fig.6.4.(b) is an RC delay cell. Vctr can control the on-state resistance of the nMOS transistor that connects the capacitor. Fig.6.4.(c) is a differential delay cell. The control voltage can control the current that flow through the positive feedback pMOS transistors. The idea to control delay through current can also be found wide use in phase interpolators and phase mixers [F.12], [F.13]. 6.2.2 Edge Combining DLL-Based CMU A DLL can only generate one delayed version or some delayed versions of the reference. A DLL-based CMU must include additional circuits to generate an output clock whose frequency is an integer multiple of the reference frequency. Edge combiner serves this purpose. Fixed-Ratio Edge Combining CMU In a fixed-ratio edge combing CMU, the reference is delayed by an N stage voltage-controlled-delay-line (VCDL). A phase detector detects the phase difference between the reference and the output of the last unit-delay-cell. When the A programmable edge combining CMU extends the applications of edge combing CMU. A straightforward method is to build some logic circuits to selectively feed the delayed clocks (0, 1, ... , N) to the edge combiner [F.16], [F.17]. In [F.17], a total number of N identical delay cells are connected in cascade to form a VCDL. A multiplier factor controller can select M delay cells out of them and mask the rest. The output of the last delay cell is feed to the phase detector. Therefore, under lock condition, the delay of each cell is 2/M. At the rising edge of the output of any of the M delay cells, the edge combiner toggles its output. Therefore, the frequency of the output of the edge combiner is M/2 times the frequency of the reference. Since the number M can be programmed, the edge combining CMU is programmable. There are many challenges in designing of a programmable edge combing CMU. Firstly, it usually incurs complicated logic circuits that are slow and power consuming. Secondly, the problems of an ordinary edge combing CMU remain. Those problems include harmonic locking, and susceptibility to mismatch. 6.2.3 Cyclic Reference Injection DLL-based CMU It is not an easy work to solve mismatch in high-speed and low power deep submicrometer CMOS processes. Therefore, edge combining CMUs tend to have high spur in the output. Conventional ring oscillators are not susceptible to mismatch, 11 because the oscillation circulates all delay cells. Conventional ring oscillators can be made programmable by using a programmable divider too. However, conventional ringoscillator-based CMUs suffer from jitter integration. Cyclic reference injection can solve the problem [H.1]. Fig.6.6 shows a block diagram of a cyclic reference injection DLL-based CMU. The circuit can work in a ring oscillator mode or a direct delay line mode, depending on the switch (MUX). When the delay cells are connected in the form of a ring oscillator, the CMU becomes a conventional PLL. It suffers from jitter accumulation, but not from mismatch. However, the jitter accumulation can only persist for several periods and will be eliminated periodically by an injected clean reference. However, it is very challenging to align the edge of the reference and an edge of the oscillation of the ring oscillator. This usually leads to high spur in the output. Clock Out Up PD Sel ¸M Fig.6.6 A block diagram of cyclic reference injection CMU Reduction of Phase Noise In a DLL-based CMU, there are many phase noise and timing jitter sources. The main sources include the in-lock error due to the mismatch in the charging and discharging current sources in the charge pump, the mismatch of the phase detector outputs, and the phase noise due to the mismatch among the delay stages, edge combining cells in the edge combiner based ones, or the re-alignment error caused by the reference injection in the cyclic reference injection multipliers. All of those errors can be considered as the systematic in-lock error. A number of techniques can be applied to mitigate some of the problems. In [F.18], the static phase error due to the imbalance of the mismatches in PFD/CP is compensated by adding a second low-bandwidth loop. The compensation loop is digital, and it comprises of a bang-bang PD and an accumulator to implement an integrator with infinite DC gain, and the output of the integrator controls a current digital-toanalog-converter (DAC) that leaks current from either side of the charge pump. The harmonic-locking problem of a cyclic reference injection CMU is solved in [F.19] by adding a logic circuit to dynamically control the switch and the divider. In [F.20], chopping, auto-zeroing and various other circuit techniques are employed to reduce static phase offset and crosstalk between the reference and the output clock. 6.3 If the reference is much cleaner than the VCO, DLL-based CMUs are more advantageous than PLL-based CMUs in terms of jitter performance. Conversely, if the reference is nosier than the VCO, then PLL-based CMUs may be better than DLL-based CMUs. Both PLL and DLL have filtering effect and jitter-peaking effect to input jitter. PLL always exhibits a lowpass filtering effect to input jitter; DLL exhibits an allpass filtering effect to input jitter, but the transfer function can be changed to a lowpass one if some techniques such as loop filtering and phase filtering are involved [F.22]. Jitter accumulation is generally a more serious problem to PLL than DLL; while spurs are commonly a more severe problem to DLL-based CMUs than PLL-based integer-N CMUs. PLL and DLL are usually modeled as second-order linear feedback system and first-order linear feedback system, respectively. However, both of them can be unstable, since they are sampled systems and parasitic poles exist. Down Reference 6.2.4 jitter will be dominated by other sources [F.21]. Simple comparisons are made in the following paragraph in terms of implementation easiness, jitter accumulation, jitter transfer, stability, and acquisition time. Comparison between PLL-Based CMUs and DLL-Based CMUs According to Mark A. Horowitz, in CMU design one can mess up DLL and PLL, because either has its own strength and weakness. If designed correctly, either works well, and 7 Equalization In the past years, transfer rate of high-speed serial data links has been ever increasing. Meanwhile cheap and low quality transmission lines are still extensively used in many applications to save cost. With the increased data rate, various impairments become more and more severe. For example, Fig.7.1 shows a measured backplane transfer function [G.1]. The physical channel exhibits considerable phase distortion and amplitude attenuation at frequencies above 4 GHz. The time domain impulse response of this channel Fig.7.1. Measured performance of a Tyco backplane The physical channel exhibits considerable phase distortion and amplitude attenuation at frequencies above 4 GHz. The time domain impulse response of this channel dampens considerably after one symbol period from the time instant of the amplitude peak if baud rate is low, for example below 1 Gbps. In this case, ISI is not a serve problem if the maximum runlength is constrained. However, the time domain impulse 12 response of this channel does not dampen sufficiently even after several symbol periods after the time instant of the amplitude peak, if the baud rate is high, for example 10 Gbps. In this case, some zero crossing points may be missing and some sampling points at the data center change their polarity. The eye is completely closed and clock and data recovery is impossible without proper equalization. In principle, the impairments can be reduced considerably by replacing the low quality transmission lines with high quality ones or by equalization [G.2]. High quality transmission lines tend to vastly increase the cost; while sub-micrometer CMOS technologies and equalization usually provide excellent cheap solutions. The impairments of physical channel are also strongly dependent on the length of the transmission lines. A short transmission line may not need any equalization. In some applications, the length of the transmission lines may vary considerably and their prosperities is not time invariant. Therefore, considerable efforts in the design of high-speed serial data links are paid to adaptivity. implemented in symbol spaced current domain [G.6], [G.7], [G.8], [G.9]. It does not help the loss at Nyquist frequency. This scenario is shown in Fig.7.2. The tap delay is achieved by D type filp flop (DFF), and tap coefficents are controled by bias current. The bias current is set by a digital to analog converter (DAC). The control of the bias current usually involves linear transconductors and current mirrors. I C (i ) i C ( 0) I 0 (7.1) N Dout (k ) C (i ) Din (k i) (7.2) i 0 DoutP DoutN Io I1 QP 7.1 A catalogue of equalization schemes Equalization methods can be linear or non-linear. Equalizer can be implemented in the transmitter side or receiver side or both sides. In microelectronic circuit implementation, it can be either continuous time (un-sampled) or discrete time (sampled). The signal amplitude can be discrete (digital) or continuous (analog). The equalizer can also be either adaptive or fixed, and the adaptive algorithm can be zero forcing (ZF) or LMS (or minimum-mean-square-error: MMSE) or some nonlinear approaches. In addition, the equalizer’s target response can be either full response or partial response. In the case of sampling equalizer, it can be baud rate sampled or over-sampled. Filter design can be either FIR or IIR. Therefore, there is quite a big set of combination of the above equalization schemes. Each scheme has its pros and cons regarding to a specific application or a specific history. The equalization schemes profoundly interact with CDR structure. 7.2 IN QN QP QN DinP DFF DFF DinN Clock Fig. 7.2. A current domain Tx equalizer A Tx equalizer can also be implemented in the time domain by utilizing pulse width modulation [G.10]. In this scheme, the duty cycle of baseband shaping pulse c(t) shown in Fig.2.2 is not a hold function for one symbol period, but a biphase or Machester code pulse whose duty cycle is manipulated to shape the combinational channel impulse response. This may be beneficial in deep sub-micrometer CMOS circuits, because the time resolution is better than the voltage resolution in deep sub-micrometer CMOS circuits. Furthermore, this solution is less constrained by the voltage headroom. However, this solution may not work well when the transfer function of the physical channel is complicated due to reflection and parasitic resonance. Duty cycle manipulation has insufficient variations to match channels with complicated transfer function. Tx equalization Tx equalization is usually called pre-emphasis as it always tries to emphasize the frequencies where high attenuation is located or de-emphasize the frequencies where low attenuation is located to make a flat frequency domain response across the passband. Since the clock is readily available in the transmitter side; the inputs to the equalizer are usually binary data; and the noise from the channel does not play an important role. Tx equalizers are usually simpler than Rx equalizers. However, Tx equalizers do not have the ability of adaptivity unless a backchannel is added. In addition, Tx equalizer is constrained by the peak power of the transmitter as the power is truly consumed by the load resistors [G.3.]. Tx equalizer can be implemented in continuous time analog circuits [G.4.]. This kind of equalizer is simply a continuous time analog high-pass filter. It has limited tuning range and a constant group delay is difficult. Tx equalizers are almost exclusively of a discrete time finite impulse response (FIR) feature [G.5.], since the clock is available and the input to equalizer is digital. Usually FIR Tx equalizers are 7.3 Rx continuous time analog linear equalizer Rx equalizer has much more varieties of implementations than Tx equalizer, and adaptivity can be realized in Rx equalizer. Physical channel of high-speed serial data links is usually of a low-pass nature. A passive high-pass filter (HPF) or active HPF is able to flatten the joint frequency domain channel response and reduce ISI. The HPF can be continuous time analog filter composed of passive RLC network [G.11]. It can also be active filters based on operational power amplifier (Opamp) [G.12] or transconductor-capacitor (Gm-C) [G.13.]. 13 Butterworth polynomials work very well up to gigahertz [G.19]. Inverter-based delay units with active inductor load (INV-AIL) are reported in fractionally spaced equalizer up to 2.5 Gbps [G.20]. In very high baud rate such as 40 Gbps, passive LC network or transmission line is usually used as delay cell [G.21]. The continuous time analog delay cells are not affected by the clock jitter. They enable Rx continuous time analog FIR equalizer. However, they also suffer from the challenges for high-speed analog CMOS design. Fig. 7.3. A high HPF cell and its transfer function CML buffer is a kind of “natural” equalizer. This has been exploited in [G.6] and [G.14]. As shown in Fig.7.3, the low frequency gain can be tuned by a MOS resistor M1 and the high frequency gain can be tuned by varactors Cd1 and Cd2. The high frequency gain can be further boosted by an on-chip inductor [G.15.]. In addition, the HPF cells are usually connected in cascade to give more gain at high frequencies. Rx continuous time analog linear equalizers are not sampled. Therefore, clock jitter does not affect their performance. However, there are many challenges for this kind of equalizer. Some challenges are listed as follows It has limited tuning range and rarely matches channel, especially when there are both frequency dependent attenuation and frequency dependent delay. Linearity is a challenge, especially when input swings vary greatly in amplitude. Limited by gain bandwidth of each stage of differentialpair. It is sensitive to PVT variations. It is sensitive to device mismatch and non-linearity. Offset cancellation and calibration are difficult. Multi-stage can achieve high gain, but it can also lead to clipping. Continuous time analog Rx linear equalizers are sometimes used as pre-equalizer for decision feedback equalizer. The task is to make the impulse response causal, with most of its energy concentrated in the time origin (with some fixed delay). It is also desirable to have a noise whitening filter functionality so that the DFE works best [7]. 7.4 Rx FIR equalizer The basic structure of an Rx FIR equalizer is shown in Fig.7.4. The main building blocks are delay cell, multiplier and adder. There are many variations in practical implementations, since each block can be either continuous time or discrete time, either analog or digital. The delay cell can be implemented in continuous time analog circuits. An ideal delay cell is an all-pass filter whose group delay is constant but tunable. An ideal all-pass filter is not realizable. In practice, low pass filter is used to approximate it. Active Opamp-MOSFET-C filter using Bessel type polynomials [G.16], [G.17], [G.18] or Gm-C filter using Delay cells can also be implemented in discrete time manner. The tapped delay can not be a simple DFF, which is very different from Tx FIR equalizer, because the received signals are basically analog. The tapped delay usually incurs sample and hold. Data In C0 Delay C1 Delay Delay Delay CN-1 CN Data Out Fig. 7.4. Schematic diagram of Rx FIR equalizer The multiplier is usually implemented in the current domain. A linear voltage to current (V-I) converter (transconductor) is needed to convert the voltage of each tap to current [G.19], [G.22]. The coefficients of the equalizer are realized by weighting the current. The coefficients are first normalized so that they do not exceed one. The current is led to a network of differential CMOS pairs. In any time one transistor of the differential pair is in the triode region while the other is off. The drain of all differential pairs are connected to the output of the V-I converter. In each pair, the source of one transistor is connected to the ground, while the source of the other transistor is connected to a shared output resistor. The differential pairs are controlled by weight setting logic circuits. When all transistors connected to the ground are on, there is no current passing through the shared resistor and the weight is zero. If they are all off, the weight is one. The weighted current is mirrored. The current adder can be simply realized by connecting mirrored current together. Rx FIR equalizer is difficult to implement. If it is continuous time, the delay cells are difficult, with little flexibility and limited tuning range. If it is discrete time, it is susceptible to clock jitter. If it is digital, it is very challenging for high-speed. In addition, if the discrete time equalizer is symbol space sampled, the output only contains samples at data center. Additional efforts are needed to find the zero-crossing points if threshold type CDR is used. Furthermore, Rx FIR equalizers are linear equalizers which tend to amplify noise and crosstalk. Therefore, in practice, Tx FIR equalizer and Rx decision feedback equalizer are more commonly used in SerDes transceivers. 7.5 Decision feedback equalizer Decision feedback equalizer has many advantages over linear equalizers in signal processing and microelectronic circuit implementations. A simple DFE diagram is shown in Fig.7.5. The feedback equalizer has the same structure as a Tx 14 a FIR equalizer, thus the circuits and techniques for Tx FIR equalizer directly apply to the feedback filter. From the system’s point of view, DFE does not need to have a flat spectrum across the passband, thus does not enhance noise and corsstalk. In addition, although noise still injects into the feedback filter through each tap, the noise level is reduced by the nonlinear decision circuit. The impulse response of the physical channel is rarely minimum phase or with most of its energy concentrated near the time origin. Suppose the maximum amplitude appears at n2T+ (<T) and the first non-zero point emerges after n1T+. Since DFE discards all data power that stems from past data symbols, it is desirable to make (n2+1)T+2 the first tap of the feedback equalizer to achieve the maximum SNR. Because the symbol power is of the removal of some energy, DFE is still suboptimum even when additive noise is white [7]. Feedback Filter T C0 C1 T T Data In T CN-1 Decision CN S Data Out Fig. 7.5. Schematic diagram of DFE FIR equalizer Mathematically DFE can be written as n 2 1 aˆ (kT ) a(k ) h(n 2 T ) a(k i) h(iT ) i n1 n2 M i n2 1 i n 2 M 1 (7.3) a(k i) h(iT ) c(iT ) a(k i) h(iT ) where the second item is precursor ISI; the third item is the post cursor ISI that can be cancelled if the feedback filter matches the channel; and the third item is the residual ISI due to finite number of taps if the feedback equalizer is FIR. When n1>=n2-1, the precursor ISI is effectively zero. The residual ISI is zero if the channel impulse response only lasts for a period shorter than the taps of the feedback filter. As a physical channel usually does not satify these requirements, a pre-filter is needed to reshape the channel response to effectively reduce precusor ISI and reduce the taps of the feedback equalizer. This can be done with a Tx equalizer or/and a prefilter (feed forward equalizer FFE) in the receiving end [G.23], [G.24], [G.25], [G.26]. The FFE can be implemented in either continuous time [G.27] or discrete time. DFE can be implemented in either continuous time [G.28] or in discrete time. When it is implemented in continuous time, it is not affected by clock jitter. From equation (7.3) we can see that DFE is much less sensitive to clock jitter than linear FIR equalizer. Nevertheless, a direct implementation of DFE can consume significant power, area, and complexity since it involves resolving the previous data and using them to add an analog value to the input within the next symbol period [G.24]. Loop unrolling can relax the requirements. Fig.7.6 shows a one-tap unrolled DFE. x(n) Mux D Q a Fig. 7.6. schematic diagram of an unrolled DFE The received signal x(t) is sampled at time nT+ and the sampled signal is denoted as x(n). If the first two taps of the impulse response of the channel is 1 and a. The signal to the decision circuit is x(k ) a , when a(k 1) 1 y (k ) x(k ) f (k ) x(k ) a a(k 1) x(k ) a , when a(k 1) 1 (7.4) The decision circuit will make a decision based on y(k). As we see from equation (7.4), the decision can be made according to the comparison between x(k) and +a or x(k) and a. The previous symbol a(k-1) is used to choose +a or -a. This unrolls the DFE into the structure shown in Fig.7.6, where no multiplier and summation are needed. The unrolling method can be extended to partial response DFE and PAM-4 DFE. Unfortunately the complexity of DFE goes up by 2 N (N is the number of taps) and the calibration of offset levels are needed. In addition, the probability of error propagation goes up when number of taps increases. In practice, Tx FIR preemphasis and FFE are used together with DFE. Those linear equalizers cancel precursor ISI and DFE cancel postcursor ISI. 7.6 Digital equalizer Digital equalizers have many advantages over analog equalizers. It does not inject noise from each tap node, which is a problem for FIR based analog equalizers, and it is not sensitive to PVT variations and power and substrate noise. In addition, it is possible to implement much more sophisticated equalization method in digital domain. Nevertheless, digital equalizers require very ADC, which is very challenging in high-speed serial data links. However, with the scaling down of CMOS processing, a baud rate digital equalizer has been reported using 65 nm CMOS technologies [G.29]. According to Nyquist sampling theorem, a sampling frequency higher than baud rate is needed if there is frequency offset or phase drifting in the incoming data. A sample rate converter is wanted when the sampling clock’s frequency is not the baud rate [G.30]. 7.7 Bandwidth compression Full response is aimed to eliminate all ISI at data centers, which requires a flat (folded) spectrum across the passband. However, a physical channel always exhibits a strong lowpass nature. In another word, in many applications, full response equalization does not match the channel response well. Partial response [G.31] is not aimed to eliminate all ISI, but rather to utilize the controlled ISI or constructive ISI. Therefore, it requires fewer boosts for high frequencies, has less noise enhancement and matches the channel better. Duobinary is a 15 widely used partial response method [G.32]. Duobinary has the advantage to eliminate ISI at data transitions. For this reason, an equalizer aiming at a target response as duobinary is also called edge equalizer [G.33]. This is beneficial for clock recovery. However, the received signal has 3 levels because of ISI. Recovery of one bit datum becomes dependent on the current received symbol and its previous symbol. This leads to possible error propagation. Error propagation can be circumvented by a precoding technique, such as TomlisonHarashima precoding [G.34], [G.35]. The precoding technique also applies to DFE. Two-level signaling is also possible if the shortest transitions are removed [G.33]. Strictly speaking, duobinary is not a bandwidth compression method. Duobinary signaling has the same symbol rate and bit rate rate as NRZ. If there is a hard cutoff frequency, the maximum bit rate that duobinary achieves is the same as NRZ. In reality, duobinary is usually regarded as a bandwidth compression methond because it does not require a flat spectrum as target response. True bandwidth compression method is based on multi-level such as PAM-4. Equalization methods for multi-level transceiver are basically the same as binary system. However, the microelectronic implementation is more complicated because of the additional levels [G.26]. Multi-level transceivers are usefull for highly loss channels. 7.8 Adaptation In high-speed serial data links, it is desirable for equalizers to have adaptivity. Firstly, there is an ensemble of fixed channels, but which one is available is a priori unknown. For example, the length of a high-speed Ethernet cable may vary; the properties of backplane channels are different due to fabrication variations. Second, a physical channel may not be time invariant. Adaptive equalizer is usually implemented in the receiving end, because the channel properties can be detected from the receiving signals. Adaptation is not possible in the transmission side unless a back channel is specified. noise ak h(t) r(t) ak n(t) S r(k) c(k) y(k) S e(k) delay Minize [e(k)]2 Fig.7.7. A block diagram of an adaptive linear equalizer Fig.7.7. shows a block diagram of an adaptive linear equalizer. The sampled received signal r(k) contains various impairments such as ISI and noise. The linear filter c(k) is able to eliminate much ISI if the coefficients are correctly set to match the channel. The difference between the output of the equalizer and the original data a(k) is N e(k ) y (k ) a~(k ) r (k n)c(n) a~(k ) (7.5) n0 where ã(k) is a delayed version of the original data sequence a(k) to match the channel delay. In practice we do not know the original data sequence. Otherwise, we do not need any equalization or adaptation, since the original data sequence is already known. However, ā(k) can be an exact replica of ã(k) if correct decisions have been made. If the coefficients have been correctly set, y(k) should be equal to of ã(k), and e(k) should be always zero. In practice, we inevitably have noise and residual ISI. Therefore, our cost function (error) is defined as the square of the e(k). To optimize the system, we need to bring the error to its minimum. Conventionally gradient descent method is very useful to find the minimum. It is done by iteration. cˆn 1 cˆn e 2 (k ) (7.6) where ĉ is the coefficient vector, and is the step. Since we have assumed a linear filter, the gradient can be derived as e2 (k ) 2e(k ) r (k ) (7.7) where ř(k) is a vector defined as r (k ) r (k ) r (k 1) ... r (k N ) (7.8) Equation (7.7) shows that the gradient is reduced to the product of e(k) and ř(k). Even though, in practice it is still very challenging to design analog multiplier to meet the very high speed of SerDes transceivers. To simplify the iteration method, the step is made very small and the gradient is replaced with the sign of e(k) and the sign of ř(k), which is cˆn 1 cˆn signe(k ) signr (k ) (7.9) This approach will be very sensitive to data patterns and noise distribution if the coefficients are updated with baud rate. Therefore, instead of minimizing the instantaneous error, we minimize the expectation of the error. In practical SerDes receivers, the effectiveness of equalization is dependent on the sampling phase of the clock, and the recovered clock depends on equalization too. Therefore equalizer adaptation loop interacts with the CDR loop. To avoid the two loops fight each other, the bandwidth of the two loops must be much different. In practical design, the CDR loop has a much wider bandwidth than the equalizer adaptation loop. We can write the cost function as 2 N error E a~(k ) r (k n)c(n) min n 0 (7.10) For this reason, the adaptation algorithm is called least mean square (LMS). The algorithm according to equation (7.9) is called sign-sign LMS. LMS may be the most widely used algorithm in equalizer adaptation. 16 delay + y(k) 8.1 ã(k) - - S e(k) Minize [e(k)]2 ak noise n(t) + z(t) r(t) S + + h(t) y(t) S - ā(k) b(k) c(k) Fig.7.8. A block diagram of adaptive decision feedback equalizer A block diagram of adaptive DFE is shown in fig.7.8. The cost function is 2 N error E z (k ) a~ a~ (k n)c(n) min n 1 (7.11) Equation (7.11) shows that the DFE is mimicking the channel, since [1, c(1), c(2), …] is actually the normalized channel impulse response if the error is zero. However, in practice a normalization factor is priori unknown, or z(k) is not normalized. For this reason, practical design may add one more adaptive parameter. It may be a adaptive reference level or a variable gain. 8 Data and Clock Recovery In high-speed SerDes transceivers where there are strong impairments, ISI is severe and eye is usually closed. After equalization, the impairments are compensated to a certain degree. ISI is reduced to an accept level and eye is open. Since most equalization schemes are aimed to eliminate the ISI at data centers and the system impulse response is at its maximum at those positions, it is desirable to sample the incoming waveform at the data centers because they have the maximum SNR. An abstract eye diagram is shown in Fig.8.1 [H.1]. A practical transition probability density curve may look quite different from the one shown in Fig.8.1. Transition Probability Density T 2 Resample Data Out Data In Clock Recovery tu tr th tm Clock Fig.8.1 An abstract eye diagram As mentioned in section 2, there is no exclusively allocated clock signal in high-speed SerDes transceivers. However, a clock signal whose frequency is the baud rate and whose phase is aligned to the data centers is indispensable for correct data recovery. As shown in Fig.8.1, a nonlinear clock recovery circuit exacts clock information from the incoming data waveform, and uses this regenerated clock to resample the data waveform to recover the data. Design Considerations for CDR Since the signal power at clock frequency or the baud rate is essentially zero in the incoming data waveform, clock signal has to be generated locally in the receiving end through nonlinear methods. A microwave filter type clock recovery scheme or more commonly called “nonlinear spectral line method” usually incurs a nonlinear device such as a rectifier, and a bandpass filter [H.2]. The nonlinear device generates signal power at the clock frequency and the filter eliminates spectra at other frequencies. Nevertheless, it is hard to achieve a high-Q bandpass filter with wide tuning range in microelectronic circuits. A more commonly used method is to have a local oscillator to generate a clock signal. The phase error and frequency error between the local oscillation and the incoming data waveform can be detected. The detected errors are feed to control loops that are able to minimize those errors. This type of clock recovery scheme usually involves a phase locked loop. Paractical microelectronic implementation of CDR is a tradeoff among many design considerations. Those design considerations are also called figures of merit (FOM). 8.1.1 Acquisition Time The first two tasks for a clock recovery circuit are locking the clock frequency to the baud rate of the incoming data and locking the phase of the clock to the data centers. However, there are initial frequency offset and phase offset due to the uncertainty of the data rate and VCO frequency. For example, due to PVT variations, a free running VCO has frequency offset from the nominal oscillation frequency and its phase is priori unknown after power-on. On the other hand, the data rate may change when switching from one application to another. The time it takes for a CDR to lock to the desired clock frequency and clock phase is called acquisition time. It is desirable to use separate loops for frequency tracking and phase tracking so that frequency tracking can achieve fast frequency acquisition and phase tracking can achieve good jitter performance. Frequency acquisition can be achieved by using frequency sweep [H.3], or frequency discriminator [H.4], [H.5], [H.6], [H.7], or frequency reference. In high-speed SerDes transceiver, the most commonly used solution is frequency reference, since the data rate of an application is predefined by standardization organizations. Frequency acquisition can be achieved by locking the locally regenerated clock frequency to the reference before real data transfer starts. In practice, CDR can be classified as continuous-rate CDR, multi-rate CDR, and single-rate CDR. A continuous-rate CDR accepts input data stream of any rate within a certain range. A multi-rate CDR can lock to input data stream whose rate is a priori unknown but is an integer multiple of a known base rate. Frequency reference helps fast frequency acquistion for single-rate CDR and multi-rate CDR, but is not so useful for continuous-rate CDR. In a high-speed SerDes transceiver, the transmitter clock is usually generated by a CMU that synthesizes highfrequency clock from a low-frequency but very accurate oscillator such as a quartz oscillator [H.8] or even an atomic clock [H.54]. When frequency is locked, the frequency lock loop (FLL) is usually frozen (or switched off), and the PLL 17 begins to align the clock phase to the incoming data. Usually the PLL can only tolerate a very small amount of frequency offset. As a consequence, FLL must reduce the difference between the regenerative clock frequency and the baud rate to a very small value, for example, 250 ppm, before handing the control to PLL. The dual-loop CDR structure embodies the idea of fine loop and coarse loop control. 8.2 Phase Selection (Picking) After FLL locks, the main task of a CDR circuit is to adjust the phase of the clock to sample the incoming data at the optimum sampling time instants. To a wide sense, this process is phase selection or phase picking. Within a clock period, the number of available phases may be infinite or finite, dependent on microelectronic implementations. Conventional notion of phase picking or phase selection may exclusively refer to finite available phases or discrete phases. A wide sense phase selection can be achieved indirectly by adjusting the clock frequency or directly by adjusting phase of the clock. The indirect phase selection usually involves PLL and the control of VCO frequency, while direct phase selection usually incurs DLL and control of delay. Since phase is the integration of frequency over time, in principle the indirect way of phase selection can give infinitely fine phase resolution and infinite large numbers of available phases. However, in the case that clock frequency is already equal to the baud rate, phase error can only be compensated by first forcing the clock frequency to deviate from baud rate and later pulling it back, which usually leads to a long acquisition time and jitter integration. The direct way is desirable because it can adjust the sampling phase without disturbing the clock frequency. Thus can achieve faster acquisition and low jitter integration. One disadvantage is that reference input jitter can directly propagate to regenerative clock. Another disadvantage is that it can tolerate only a very small frequency offset. Nevertheless, from a wide-sense phase-selection’s point of view, PLL and DLL are just two methods for phase selection. Although it has been argued much about their difference, good designs using either of them can satisfy most applications. 8.3 Jitter Performance As shown in Fig.8.1, in order to correctly recover the incoming data, a clock period T must meet T Tu Tr Th Tm Tcjitter (8.1) where Tu is the uncertainty of zero-crossing, Tr is the time it takes to cross the threshold (vm), Th is time that the incoming waveform must be held above or below the threshold, Tcjitter is the clock jitter, and Tm is the time margin. When the sum of Tu, Tr, Th, and Tcjitter is larger than a clock period, the time margin disappears and bit detection errors increase drastically. 8.3.1 Jitter Generation and Jitter Transfer There are basically two types of jitter. The first type is pattern dependent jitter. Although ISI at data centers can be suppressed greatly by equalization, ISI may still be severe at zero-crossing points, especially when there is little excess- bandwidth. In addition, DC offset and duty cycle mismatch in the transmitter clock also contribute to pattern dependent jitter. An analysis of data-induced zero-crossing jitter and the consequential pattern-dependent jitter can be found in [7], [H.9] and [H.10]. For symmetric system response, these papers show that zero-crossing jitter has a power spectral density with a depression near DC. Another type is noiseinduced jitter. Noise can cause zero-crossing points to shift. In some literature [H.16], jitter is further classified as random jitter (RJ), deterministic jitter (DJ), ISI, periodic jitter (PJ), duty cycle distortion (DCD) and etc. Because CDR is a highly nonlinear circuit, jitter-performance exhibits high nonlinearity. Modeling of jitter in CDR can be found in [H.11], [H.12], and [H.16]. Fig.8.2 shows various sources of jitter. Decision S Data Out Sampling Noise Input Jitter Input S PD noise Phase Detector S Charge Pump Noise Charge Pump S Filter Noise Loop Filter S VCO Noise VCO S Clock Out Fig.8.2 Noise sources in a conventional CDR Traditionally the input jitter is separated from the jitter generated by CDR. Jitter transfer is the ratio of the output jitter to the input jitter over jitter frequencies, which mainly describes the attenuation or amplification of the input jitter by CDR [H.13]. In Fig.8.2, the jitter transfer function can be approximated by a linear model as: H( f ) out F( f ) in 1 F ( f ) (8.2) where F(f) is a linear approximation of the forward loop transfer function. F(f) is usually of a lowpass nature and has large gain near DC. Therefore, jitter transfer function is approximately a unit at low frequency and smaller than a unit at high frequency. However, depending on the zeros and poles of the forward loop transfer function, there might be a frequency range where the jitter transfer function is larger than one. This is called jitter peaking. Jitter peaking is not wanted as the result is to amplify the input jitter. It has been shown in [H.14] that jitter peaking can be totally eliminated by using a voltage controlled delay element. After separation of the input jitter, jitter generation is defined as the amount of jitter added to the data signal or clock signal by the CDR circuit when the input data stream is essentially clean. In an analog CDR, an important source of jitter generation is the VCO. In digital CDR, spur is another main source of jitter generation due to the finite numbers of available phases (discrete phases). 8.3.2 Jitter Tolerance and Input Jitter Tolerance Mask A CDR circuit must work properly under the disturbance of noise. Jitter tolerance is defined as the capability of a CDR circuit to achieve a specified bit error rate (BER) under the worst-case jitter conditions. Jitter tolerance is a function of frequency. It is straight forward that at high frequencies 18 comparable with the symbol rate, jitter tolerance is small, while at low frequencies the jitter tolerance is large. At high frequency the loop gain is small, and jitter is rejected by averaging rather than tracking. Jitter larger than 1 UI can lead to a decision error. While at low frequency, the loop gain is large and jitter is effectively tracked and compensated by moving the clock phase accordingly. Very slow frequency drift such as baseband wandering can lead to enormous phase error (jitter), if the observation time is very long and the phase tracking loop is switched off. Luckily this jitter can be tolerated, since the frequency drift only causes a small phase error in one symbol period. This error can be effectively tracked and compensated by the PLL loop. Different applications require different jitter tolerance. For a specific application, its input jitter tolerance mask is usually defined by standardization. In many cases, jitter transfer exhibits a lowpass characteristic; while jitter generation exhibits a highpass characteristic. When jitter generation is taken into account, the input jitter mask has some turning points and looks like what is shown in Fig.8.3, which is actually an input jitter tolerance mask of SONET OC-12. decompose the VCO control into fine and coarse inputs, allowing the latter to remain quiet after the system is phaselocked [8]. This concept leads to dual-loop topologies. Referenceless Dual-Loop Topology Shown in Fig.8.4 the referenceless dual-loop topology is useful for continuous-rate CDR [9], [H.7]. In this topology the frequency-lock-loop is designed to lock the VCO frequency and the phase-lock-loop is designed to lock the VCO phase. The frequency detector usually involves quadrature phasedetectors [H.29]. In order to prevent locking to harmonics, harmonic detector (HD) based on special data patterns are usually used. It needs careful design to prevent the two loops from fighting each other. It is desirable to hand over the control of VCO from FLL to PLL after frequency acquisition is acquired and switch back to FLL when PLL loses lock. Usually a loss-of-lock-detector (LOLD) and a lock-detector (LD) are designed to serve these purposes. Decision Circuit Recovered Data Serial data stream VCO Frequency Detector Loop Filter S Phase Detector Divider Multi-band Selection Loop Filter Recovered Clock Fig.8.4. A block diagram of a referenceless CDR Referenced Dual-Loop Topology Fig.8.3. Sinusoidal input jitter tolerance mask of OC-12. Fig.8.3 only shows the sinusoidal jitter tolerance or jitter tolerance in the frequency domain. Apart from the sinusoidal part, the total jitter is also composed of random jitter and deterministic jitter. In the case of XAUI, the tolerance should be at least 0.37UI for deterministic jitter, 0.1UI for sinusoidal jitter, and 0.18 UI for random jitter. All together this amounts to 0.65UI total jitter. Therefore, a XAUI receiver has to work properly with only 112-ps (35%) stable data within a 320-ps data cycle [H.17]. For this reason, sometimes time-domain jitter tolerance such as peak-to-peak jitter or root-mean-square (rms) jitter is used. It has been argued about the equivalence of timing jitter and phase jitter in some literature [H.18]. 8.4 Basic CDR Topologies As mentioned before, basic tasks of a CDR circuit must include frequency acquisition and phase acquisition to ensure BER performance despite PVT variations and added noise to the circuit. A critical difficulty in modern CDR circuits stems from the use of low supply voltages. The gain of VCOs must increase as the supply is scaled down because the tuning range must remain a constant percentage of the center frequency. As a result, for a given ripple on its control line, the VCO suffers from greater jitter. A method to alleviate this issue is to It is rare in practical applications to have a continuous-rate CDR circuit. In most applications, the data rate is well predefined in various standards. For instance, data rate of SONET is defined on the basis of OC-1 which is 51.840 megabits per second (Mbps). OC-X is X times of OC-1. The data rate of 10GBASE Ethernet is 10 gigabits per second (Gbps). The nominal data rate of 1X Blu-ray Disc (BD) is 66 Mbps and X speed BD has a data rate of 66X Mbps. The accuracy and stability of the data rate varies from 1 in 1011 for primary reference clock (PRC) in SONET and SDH to a fraction of a percent in optical disk drives. In most high-speed SerDes transceiver applications, the rate stability is as good as tens of ppm. Therefore, it is advantageous to use reference for single rate CDR or multi-rate CDR. A block diagram of this topology is shown in Fig.8.5. Serial data stream Decision Circuit Recovered Data VCO PD S Loop Filter Reference CMU PFD Loop Filter Recovered Clock Fig.8.5. A block diagram of a dual-loop CDR with a reference 19 The frequency acquisition is achieved by locking the VCO frequency to the reference. It can start before actual data transfer takes place. Once transfer starts, acquisition can be fast because only phase error needs to be dealt with. Shared Reference Topology In some multi-channel receiver applications, the conventional dual-loop topology is not recommended. If each channel was allocated a VCO, there would be many VCOs. These VCOs could interact with each other through power supply and substrate. Thus, the architecture would suffer from inter-channel crosstalk problems [H.30]. To avoid this problem, a global reference is shared by all channels. A block diagram of shared reference topology is shown in Fig.8.6. A shared global reference also helps to reduce power consumption and chip area. In the shared reference topology, frequency offset between the bit-rate of each channel and the reference clock is dynamically compensated by manipulating the phase of the clock. This may incur phase interpolator, phase shifter and phase rotator. The shared reference topology also applies to single channel applications because of its potential to provide a better immunity to supply and substrate noise and PVT variations [H.8]. Many digital CDR circuits have this topology. Data Decision Multiphase clock generator Serial data stream 1 Multiphase Digital PD generator clock Filter CMU Reference Decision Multiphase Digital PD generator clock Filter Serial data stream k Serial data stream 1 PD Clock Loop Filter Data Phase Selection Clock Control Data Decision Phase Selection Control Clock Phase Selection Control Fig.8.6 A block diagram of shared reference topology In practical microelectronic implementations, there are many variations of each topology. For example, the phase lock loop in a dual-loop CDR can be a conventional PLL or a phase picking PLL [H.7]. The VCO can be LC oscillator or ring oscillator. The VCO can also be analog or digital [H.30], [H.31], [H.32]. In the following section, some representative microelectronic implementations are discussed. 8.5 Hogge Phase-detector A simple way to avoid misinterpreting missing edges is to generate phase errors only at data transitions. The design of a Hogge PD reflects this idea [H.33]. The phase error signal e is generated by XOR the input datum and its delayed (retimed) replica. An XOR gate outputs "0" when its inputs are equal and outputs "1" when its inputs are different. Therefore, e is generated only at transitions. The width of e is linearly proportional to the phase difference between the input data and the regenerated clock. A reference signal with a width of a half clock period is introduced to facilitate microelectronic implementation. As a consequence, the phase-detector output is e minus the reference. Under lock condition, e and the reference are equally wide, and the average of the phasedetector output is zero. A Hogge PD is sensitive to duty cycle mismatch because of the 0.5 clock period reference. In addition, a Hogge PD is also sensitive to transition density [H.14]. To understand why, we have to take into account of the loop integrator. Although under lock condition, the output of a Hogge PD is composed of a positive rectangular pulse and a negative rectangular pulse at a data transition, and the width of the two pulses is equally a half clock period, the output of the loop integrator has positive net area. The presence or absence of such a pulse affects the average output of the loop integrator. The data-dependent jitter thus introduced is often large enough to be objectionable. A modified triwave PD which is shown in Fig.8.7 (b) solves this problem by forcing the net area of the output of the loop integrator to zero at each data transition under the lock condition. Representative CDR Architectures There are a number of clock and data recovery schemes, for example, maximum-likelihood clock recovery, nonlinear spectral line clock recovery, zero forcing clock recovery, MMSE clock recovery and etc [7]. In high-speed SerDes transceiver, threshold-crossing clock recovery is the most common scheme because of its simplicity and suitability for high-speed microelectronic implementation. In the following sections, some representative CDR architectures are briefly discussed. 8.5.1 in a frequency syntheisizer is strictly periodic; but the input data stream in a CDR circuit is random. Therefore, some phase-detectors and phase frequency detectors that work well for frequency synthesizer do not work for CDR at all. A general problem is that they usually misinterpret missing edges as frequency errors. Therefore, phase-detectors and phase frequency detectors are very importance to CDR circuits. In high-speed SerDes transceivers, phase-detectors are predominately logic circuits. If a threshold-crossing timing scheme is used, phase errors are generated by comparing data transition edges with clock transition edges. Therefore, exclusive-or (XOR) gate is a fundamental building block of phase-detectors. Phase-detector and Phase Frequency Detector Clock recovery circuits in high-speed SerDes transceivers are similar to frequency synthesizers. However, the reference e Phase Error Phase Error reference Data D Q D Clock Q Data D Q D Q D Q D Q Clock (a) A Hogge PD (b) A modified triwave PD Fig.8.7 A Hogge PD and a modified triwave PD Under lock condition, the DC contents of the output of a Hogge PD are zero. Therefore, CDRs with Hogge PDs are generally less noisy than those with bang-bang PDs. However, linear phase-detectors used in CDRs suffer from a very small frequency locking range and slow acquisition time. These disadvantages are due to the fact that linear phase-detectors do 20 not respond correctly in the presence of a frequency difference between the clock and data [H.34]. Bang-Bang Phase-detector Bang-bang phase-detectors (BBPD) are also usually called binary phase-detectors or early-late phase-detectors. Compared with linear phase-detectors such as Hogge PD, bang-bang phase-detectors are simpler and easier in implementations in microelectronic monolithic circuits. A binary bang-bang PD can be a simple DFF [H.35], [H.38] or a sample-and-hold (S&H) cell [H.29]. In a bang-bang PD which is shown in Fig.8.8, the input data signal is used to trigger the DFF. When clock is leading data, a “down” signal is generated. Conversely, when clock is lagging data, an “up” signal is generated. It is obvious that the output of this binary PD is independent on transition density. Despite of this advantage and its simplicity, there are two main disadvantages of this type of binary PD. First, an additional data recovery circuit is needed. Second, it is sensitive to consecutive-identical-digit (CID). A CID pattern fixes the binary phase comparator output until the next data pulse appears. This causes clock phase drift and jitter generation [H.38]. Fig.8.9. An Alexander PD based on DFFs It can be seen from the truth table that the output under the lock condition is not defined. As a consequence, Alexander PD generates phase error signals even when there is actually no phase difference between data and clock. For this reason, CDRs with Alexander PDs are generally regarded as noisier than those with linear PDs. Theoretically the gain for an Alexander PD is infinite around zero input phase error. Therefore, CDRs with Alexander PDs exhibit ability for fast acquisition and wide frequency locking range. In practice, there is a linear region as shown in the right side of Fig.8.9 because of effect of metastability [H.55]. Therefore, linear methods to analyze the loop dynamics are still wide used. An Alexander PD is also sensitive to transition density because it does not generate a phase error signal if there is no data transition, and the net area is not zero under lock condition. In [H.38] a two-mode bang-bang PD is proposed to improve the performance against CID and transition density. It has a pulse stretcher working as a CID detector. The PD works as a binary PD when the input data CID is shorter than the stretcher time constant (t), and as a ternary PD when the data CID is longer than the stretcher time constant. Recovered clock PD D Data LF Clock Error VCO D Quadrature Correlation Phase Frequency Detector Data Q Q Recovered data Fig.8.8. A simple binary phase-detector and a timing chart Alexander Phase-detector Alexander PD [H.36] is a kind of tristate bang-bang PD. It is not as sensitive to CID neither as a conventional binary PD, since it outputs neither “up” nor “down” when there is no transition, namely it has a third state, a “doing nothing” state. In such a bang-bang PD the incoming data is oversampled and transitions can be found at where there is polarity change. Each transition is compared with an ideal transition to give phase error [H.37]. This idea is shown in Fig.8.9. A 0 0 0 0 1 1 1 1 B 0 0 1 1 1 1 0 0 C 0 1 1 0 0 1 1 0 UP 0 0 1 -0 0 -1 Down 0 1 0 -1 0 -0 Remark No transition Clock is early Clock is late Out of range Clock is early No transition Out of range Clock is late A B C Fig.8.9. PD outputs based on three successive sampling points If the oversampling rate is 2, the oversampled bang-bang PD is an Alexander PD. An implementation based on DFFs can be easily derived from the truth table and it is shown in Fig.8.10. Since the input is buffered, the output of an Alexander PD changes only at clock edges. Phase-detectors usually have very small frequency pull-in range. They do not work properly when there is large frequency difference between input data and the regenerated clock. Therefore, frequency detectors (FD) or phase frequency detectors (PFD) are sometimes needed. In conventional frequency synthesizer applications, quadricorrelator frequency FD and rotational FD are widely used [H.39], [H.40]. Special challenge of FD for CDRs is that the input data are of a random nature. In [H.29] a quadrature correlation phase frequency detector was introduced. Fig.8.10 illustrates how it works. Whenever a transition of Q1 occurs while Q2 is low, the sign of the transition corresponds to the sign of the frequency difference between the clock and the bit rate. Therefore, at a rising edge of Ql the FD output is set to “down” and at a falling edge to “up”, thus reducing the frequency difference. At any transition of Q1, while Q2 is high, the “up” and “down” can be reset. Obviously this can only extends frequency pullin range within a limited range. It is limited by two conditions. Firstly, it cannot be larger than half the hold-in range due to the 50% duty cycle of the frequency detector output. Secondly, at least one transition of the NRZ signal should occur within one quarter of the beat note period. The latter condition ensures that phase positions of Q1 and Q2 can be analyzed correctly by the FD. Clock is faster t Data Data D Q C D Q A UP D C2 Data C1 C2 Q1 Decision Clock Phase error Q1 Q2 PD Output D C1 Q 2 Up Down Data C1 C2 Q1 Clock D Q D Q B DOWN Q2 Clock is slower 21 Fig.8.10 A quadrature correlation PFD If the shortest runlength and the longest runlength of the input data are known, more powerful and harmonic-free frequency detectors can be devised [9], [H.6]. Those frequency detectors usually involve a counter to count the number of clock cycles between two consecutive transitions. Phase-detectors Working Under Reduced Rate At very high-speeds, it may be difficult to design oscillators that provide an adequate tuning range with reasonable jitter. For this reason, CDR circuits may sense the input random data at full rate but utilize a VCO running at a reduced rate. This technique also relaxes the speed requirements for phasedetectors and, in some CDR configurations, the frequency dividers [8]. This concept results in the design of half-rate PD [H.41], quarter-rate PD [H.42], and 1/8 rate PD [H.43], [H.44]. The basic idea of a reduced-rate PD was described in [8] and [H.41]. Fig.8.11 shows a linear quarter-rate PD and part of a timing chart. The basic requirements are still the same as a full-rate linear PD. Each data transition must produce an “error” pulse whose width is equal to the phase difference. It must also generate a reference clock with a fixed width. The rising edge of “error” pulses is triggered by data transition and the falling edge is triggered by clock transition. The pulse width increases if the recovered clock transits behind the center of input data and decreases otherwise. For “reference” pulses, however, both rising and falling edges are triggered by clock transition and thus the pulse width is always kept constant. The speed requirements for some parts of this linear PD are relaxed due to the reduced frequency of the multiphase clock. However, the speed requirements for the whole PD are not reduced dramatically because of two reasons. Firstly “error” pulses become very narrow when the phase error is close to -. Secondly, at a node in the circuit, the 4 phase error signals will be combined and fed to a single loop filter. The second challenge can be relaxed by using parallel charge-pumps [H.45]. CLK0 Error 0 Latch Data CLK2' Reference 0 CLK3'' Data CLK1 Error 1 Latch CLK3' Reference 1 CLK0 Error 2 CLK1 CLK0'' CLK2 Latch CLK0' Reference 2 CLK2 CLK1'' CLK3 Error 3 CLK3 Latch CLK1' Reference 3 CLK2'' Fig.8.11 A block diagram of a quarter-rate PD Reduced-rate PDs are not limited to linear PDs, a half-rate bang-bang PD and a quarter-rate bang-bang PD are reported in [H.46] and [H.47], respectively. In generally, the reduced-rate PDs share the basic design concepts with their full-rate counterparts. However, the higher the reduced-rate is, the more complicated circuits are needed to deliver phase error signals. 8.5.2 Analog CDR CDRs may be implemented in different technologies such as silicon CMOS, or SiGe, or GaAs, or HBT to meet different requirements. Even in mainstream CMOS technologies, numerous CDRs have been reported. Those CDRs can be classified into a few catalogues according to different criteria. The first criterion could be how to adjust the sampling phase. According to this, CDRs can be classified into continuousphase CDRs and discrete-phase CDRs. The second criterion might be through which structure the sampling phase is obtained. According it, CDRs can be classified into PLLbased CDRs and DLL-based CDRs. Another criterion would be how much digital circuits are used in a CDR. According to this, CDRs can be classified as analog CDRs, semi-analog CDRs, and digital CDRs. CDRs for high-speed SerDes transceivers are predominantly mixed signal circuits. Pure analog CDRs and pure digital CDRs are rare. Even in those pure digital CDRs, analog properties are important and static rail-to-rail CMOS logic circuits are seldom used. We roughly divide CDR structures into analog ones and digital ones only for the convenience of discussion. Referenceless Analog CDR A representative analog CDR structure is the referenceless analog CDRs. Some emerging telecommunication applications require CDRs to operate over a continuous rang of frequencies [H.7]. In those CDRs, reference can be used [H.5]. However, referenceless structure is more common, since the rate of the incoming data is unknown prior to receiving [9], [H.6], [H.7]. Fig.8.12 shows a block diagram of such a CDR. In the CDR, a wide tuning range VCO whose tuning range is at least 2/3 of its center frequency is needed. It is very challenging for conventional LC-based VCO to realize such a wide tuning range. VCOs reported in [9], [H6] are ring oscillators. Ring oscillators generally have inferior phase noise performance to LC-based oscillators and tend to increase jitter generation of CDRs. Therefore, LC-based VCO is still desirable. To extend the tuning range, two LC-tank VCOs with switched capacitors to select subbands are proposed in [H.7]. LC-based digitally controlled oscillator (DCO) with wide tuning range was proposed in [H.32], [H.48]. It may be a good candidate to improve the performance against PVT variations. Special care is needed to deal with frequency detection and locking to harmonics because of the wide tuning range VCO. The locking range of a conventional quadrature correlation phase frequency detectors (QPFD) is only 20% for full-rate systems [H.49] and 15% for half-rate systems [9]. For a QPFD, the harmonic locking will occur when the frequency range of a VCO exceeds twice the bit rate of data. Additional auxiliary circuit is required to avoid harmonic locking. In [H.6] a harmonic-free frequency tracing circuit (FTC) is proposed. A weakness of the FTC is that it does not work if the maximum runlength and the minimum runlength are unknown. In [H.7] a coarse frequency detector (CFD) is proposed and prior knowledge of the runlength is not needed. The CFD uses a frequency sweeping technique. The VCO frequency is initially set to the minimum. When presence of a single NRZ pulse within a single clock period is detected, then the clock is slow, and the VCO band is incremented. To eliminate sensitivity to bimodal jitter, the implemented CFD detects the presence of two consecutive pulses within two clock periods. CFD can reduce the frequency error to typically less than 10%, which is within the locking range of a conventional QPFD or a rotational frequency detector (RFD) [H.40], [H.50], [H.51]. 22 RFD or QPFD is able to acquire the remaining frequency error to an accuracy of better than 250 ppm. The frequency error is within the pull-in range of a conventional PLL or DLL. In [H.7] a delay and phase locked loop (D/PLL) was proposed to decouple the jitter transfer and jitter tolerance corner frequencies, since in a conventional second order PLL, good jitter transfer performance and good jitter tolerance performance are not compatible. Fig.8.13 A full-rate edge tracking CDR with a reference clock The CDR employs a dual-loop architecture where the FLL acts as an acquisition aid to the PLL and is disengaged during normal operation when the CDR is locked to the serial input data. Thus the structure realizes a large frequency acquisition range, while maintaining the precise control of phase alignment. This structure is suitable for single-rate CDRs and multi-rate, if the divider is programmable. Fig.8.12 A block diagram of an analog continuous-rate CDR The continuous-rate CDRs are generally analog intensive circuits requiring careful design of VCO and control loops. In practical applications, additional control logic, such as lockdetectors and loss-of-lock-detectors are implemented to prevent the loops from fighting each other. Although referenceless analog CDRs are able to lock to data streams of any bit rate within a frequency range, the acquisition time is usually too long. In [H.7] the acquisition time is 1 ms for a 2.5 Gb/s input, which is equal to 2.5 million symbol periods. In practice, the rate is predefined. It is too squandering to spend so long a time to find out the rate. Therefore, CDRs with a reference are much more common than referenceless CDRs. Full-Rate Edge Tracking CDR With A Reference Reference is useful for acquisition aid. It is especially useful for single-rate CDRs; because VCO frequency can be locked to the reference before data transmission actually starts. Therefore, the main phase-lock-loop can achieve fast acquisition once data transmission begins. Fig.8.13 shows a typical structure of a referenced full-rate edge-tracking CDR. Edge-tracking is a method that phase information is extracted from data transition edges and data sampling is performed 0.5 unit interval (UI) away from clock edges. Data Eye Tracking CDR In the above mentioned CDRs, phase information is extracted from data transition edges, and data is sampled at 0.5 UI away from the regenerated clock edges. However, in case of severe and asymmetric jitter environment, it is difficult to maintain the optimum BER performance. In [H.35] a data eye tracking CDR is proposed to improve the performance. The concept of data eye tracking can be understood through Fig.8.14. If we assume the optimum sampling point is DCLK, the transition probability and bit error rate at this point theoretically is the minimum. Away from this point to either side, the transition probability and bit error rate will increase. For each DCLK, two sampling points LCLK and RLCK are added to its left side and right side. Therefore, on the one hand, under lock condition, the transition probability on both sides should be equal. On the other hand, if the transition probability on both sides is equal, DCLK should be very close to the optimum sampling point, since the transition probability is proportional to the jitter histogram area whose origin is the optimum sampling point. When DCLK strays away from the optimum position, the transition probability on one side always goes larger and the other side goes smaller. Fig.8.14 Phase detection concept of data eye tracking Fig.8.15 shows a block diagram of a data eye-tracking CDR. It is composed of three loops. The reference loop is for 23 frequency acquisition aid. The second loop is an edge tracking loop. The third loop is an eye measuring loop. Reference Loop Data Loop fUP[0:3] Din Phase detector CP1 VCO CP2 VCDL CLKref Tracking Loop fDN[0:3] DS[0:3] dDN dUP Eye measuring Loop LCLK[0:3] DCLK[0:3] RCLK[0:3] It is straightforward to delay the oscillating of a VCO by 90 so that a quadrature clock is obtained. This can be achieved by using a delay cell. A delay cell can be a piece of transmission line. In the range of gigahertz the transmission line may be too long to fit in an integrated circuit. Another issue is that a piece of transmission line can only provide 90 phase delay at one frequency. Capacitor-based delay cells are more advantageous as the capacitor or the charging current can be tuned to deliver variable delay. Therefore, 90 phase delay can be achieved for a wide range of frequency. Ring oscillator composed of two or more stages of differential inverters can realize a very wide tuning range [H.60]. Fig.8.15 A block diagram of a data eye-tracking CDR Whenever there is a transition between LCLK and DCLK, the sampling position is thought to be too close to the left and the clock is assumed to be too fast, and fDN is asserted high. If there is a transition between DCLK and RCLK, the sampling position is thought to be too close to the right and the clock is assumed to be too slow, and fUP is asserted high. In the case of a very small eye opening or LCLK and DCLK are too far away, both fDN and fUP can be high. The tracking loop uses fDN and fUP as phase error signals. Under lock condition, fDN and fUP have the same expected value. When LCLK and RCLK are too close, CDR jitter performance may deteriorate, because data transitions happening before LCLK and after RCLK do not contribute to detection of phase error. Therefore, the eye measuring loop needs careful design to control the interval between LCLK and RCLK. In [H.35], when both fUP and fDN are low for n UIs, dUP is asserted high; otherwise dDN is asserted high. The requirements are mathematically expressed as: E fUP E fDN (8.3) EdDN 1 E fUP 1 E fDN 0.5 n n (8.4) If n is 4, E[fUP] is 0.083 and E[dDN] is 0.5. Since there are three loops, attention must be paid to avoid instability. The bandwidth of the tracking loop can be much larger than that of the eye measuring loop. Therefore, the phase of DCLK is adjusted by the tracking loop and will not be affected by the operation of the eye measuring loop. Delay Cell Based Analog CDR VCO is a very important component for CDRs. In many implementations, VCO is a LC-oscillator whose oscillating frequency can be tuned by adjusting the voltage applied to the varactor. In some applications, such as quardi-correlator PFD, VCO is required to deliver quadrature clocks. Design options to generate quadrature signals include: Combination of VCO, polyphase-filter (or R-C C-R filter), and output buffers (or limiters) as used in, e.g., [H.56], [H.57]. VCO at double frequency followed by master–slave flipflops. Two cross-coupled VCOs as proposed in [H.58], [H.59]. 8.5.3 Digital CDR Digital CDR has many advantages over analog CDR [H.19]. First, digital processing units are noise-immune. The only remaining noise-sensitive components are the clock generator and samplers, and clock-generating PLL are generally less sensitive to self-induced noise (e.g., the phase noise of the voltage-controlled oscillator) than clock recovery PLLs. This is because the reference clock is likely to have less noise than the random NRZ data stream; Second, digital circuits are easy to port to another process. This is an important criterion, as the SerDes is becoming a frequently demanded intellectual property (IP) component in system-on-chip (SoC) development. Third, instantaneous phase acquisition is possible if we store the samples until the phase decision is made. It is useful, for instance, in switch network applications where the CDR must acquire lock within a few bytes of the preamble period. Digital CDR can be classified as baud-rate sampling CDR [H.20], [7] and oversampling CDR according to the sampling rate [H.21]. It can also be classified as asynchronous sampling CDR and synchronous sampling CDR according to if the sampling clock is synchronized to the incoming data. Asynchronous (blind) oversampling CDR Fig.8.4 shows the topology of a blind oversampling CDR [H.19]. This type of CDR is mainly composed of three blocks: a multi-phase clock generator, parallel samplers, and phase decision logic. The sampling is inevitably asynchronous, because the sampling phases are defined by reference clock. The samples are stored in the sample storage. The input data stream is binary after a limit amplifier or a slicer. The working principle of blind oversampling is described as follows: If there is polarity change in directly adjacent samples, there is a transition between these two samples. If the system has a symmetrical response, the central point between the two adjacent transitions is the best sampling instant, and it is regarded as the nominal sampling phase of the multi-phase clock. Accordingly, the nominal tranasistion position is between phase 0 and phase 1. Seen from Fig.8.1, the probability density function has its maximum at (M+1)/2 away from the nominal sampling phase in a M oversampling system. A watching window of W bits is used to decide the proper sampling phase. Let us assume the best sampling phase changes to (M+3)/2 due to low frequency jitter, or frequency offset or whatever, the most probable transistion position will 24 move to between phase 1 and phase 2. A simple majority voting method can be applied. For example, it can be done in this way: If we assume the most probable bit boundary is between phase 0 and phase 1, we count the number of actual transistions in phase 0 and phase 1 to be N1. If we assume the most probable bit boundary is between phase 1 and phase 2, we count the actual transisitions to be N2. We continue until NM. We compare the numbers N1, N2, …, and NM, and we find N2 is the maximum. Therefore, we update (M+3)/2 to be the nominal sampling phase. It is straightforward that the length of the watching window W and oversampling rate M determine jitter tolerance of the system. The high frequency jitter is averaged out and the low frequency jitter is tracked due to updating of nominal sampling phase in every W bits. It has been shown in [H.19] that WM should be very large in order to achieve a reasonable jitter tolerance and low bit error rate. M is usually selected to be 3, 5, or 7, while W is usually a few thousands. Therefore, the sample storage is costy in chip area and power consumption. This method is sometimes called phase picking or phase selection because it selects the best phase of a multi-phase clock to sample the incoming data. Parallel samplers Recovered Data Reference Clock Multiphase clock generator Bit boundary detection Fig.8.4. A block diagram of a blind oversampling CDR. The above-mentioned method usually suggests a binary input stream. The incoming data stream for an asynchronously sampled CDR does not need to be binary. The sampling rate needs not be over-sampled or an integer multiple of the baud rate. It has been shown in [7] and [H.22] that asynchronous-tosynchronous conversion can be performed with the aid of an interpolator [H.23] under the control of a numerically controlled oscillator (NCO). A block of this type of asynchronously sampled CDR is shown in Fig.8.5. Loop Filter Recovered Clock NCO Phase detector antialiasing filter ADC Prefilter Synchronous oversampling CDR Synchronous oversampling CDR can detect the phase error and adjust the sampling phase promptly according to the phase error. In this way, it needs not big sample storage and consumes less chip area and power than asynchronous ones. Fig.8.6 shows the block diagram of a synchronously sampled CDR [H.24]. Multiphase clock generator Crystal Oscillator Interpolator Phasor Phase Selection Control Serial data stream PD Detector Recovered Data Fig.8.5 A block diagram of digital CDR with a interpolator This type of CDR involves a very high-speed ADC and a digital signal processing block. It is very challenging in implementing them in very high-speed SerDes transceiver. The recovered clock of a digital CDR usually contains a lot of spur due to the discrete phase. This is generally not a big issue because once the data is correctly detected; another clock can be used for jotter-free transmission [H.23]. For example, an elastic buffer and a system clock can be used. Digital Filter Data sampler Recovered Data Data selection Sample storage DSP 8.5.4 Phase detection logic Serial data stream Reference Clock Asynchronously sampled digital CDR usually incurs implementation which is costy in aera and power consumption. Synchronously sampled digital CDR is advantageous in these aspects. Recovered Clock Fig.8.6. A block diagram of a synchronously sampled CDR The phase error between the incoming data and the clock is expressed in the phasor diagram. When the phasor is greater than 2, it modulo 2 and the remainder will come back to 0 2. However, as a consequence, two adjacent nominal sampling phases may be within one clock period or beyond two periods if there is frequency offset or large low frequency jitter. This phenomenon is a kind of cycle slip. This is generally not a big issue as the low frequency jitter or small frequency offset can only cause a very small phase shift during one symbol period. When updating the sampling phase to a new one, the old sampling phase can still be kept without causing big problem if the number of sampling phases is sufficient. Nevertheless, some logic circuits have to be developed in order not to confuse the phase selection control. In [H.25] a flywheel method is proposed to solve this problem. The low frequency jitter tolerance and frequency offset tolerance is decided by the number of available phases and the CDR loop bandwidth. A treatise on tolerance of frequency offset is [H.26]. In [H.27], a detailed implementation of this type of digital CDR is discussed. The phase-detector is a bangbang PD and the digital loop filter is derived from an analog loop filter by using a backwards difference substitution of s with z. e sT z 1 s 1 z 1 T (8.3) The output of the digital filter is fed to a digital to phase converter (DPC) to select proper the phase for sampling. In multi-gigahertz SerDes transceivers, it is very difficult to make the digital filter work at baud rate. Therefore, the output of the bang-bang PD is decimated before it enters the digital loop filter. Nevertheless, since the implementation simply replacing analog blocks with their digital counter parts, some problems harassing analog CDR such as slow acquisition time 25 still exist. Furthermore, the power consumption is expected to be high due to the large high resolution digital process. An improved synchronous digital CDR is proposed in [H.13], and its block diagram is shown in Fig.8.7. It replaces the high resolution digital loop filter with a logic decision circuit, which is of benefit to reduction of power and chip area and fast acquisition. Furthermore, instead of using a large number of sampled data to decide the proper sampling phase, it decides the proper sampling phase by comparing the current nominal phase with a stored phase pointer. The phase pointer can be regarded as a historically decided phase based on a group of past sampled data. Therefore, it does not require large sample storage. In addition, the chance of cycle slip is reduced much, since it oversamples the incoming data and decides the correct sampling phase by using a watching window. A similar concept is explored in [H.28]. Parallel samplers Phase detection logic Sample storage Data selection Serial data stream Recovered Data L Reference Clock Phase Pointer Clock Selection Multiphase clock generator Recovered Clock Fig. 8.7. A synchronous oversampling CDR Theoretical high frequency jitter tolerance of the digital CDR shown in Fig.8.7 can be derived as: k JTp p (UI ) 1 N (8.4) where N is the oversampling factor and k is the threshold distance between a transition edge and its adjacent nominal data center. And the low frequency jitter is 1 M N Fj (8.5) where M is the longest runlength of the input data stream or continuous identical data (CID) and Fj is the jitter frequency normalized to baud rate. The theoretical frequency offset tolerance is f 1 f0 M N (8.6) When jitter tolerance is considered, the frequency tolerance becomes much smaller than equation (8.6). Nevertheless in most high-speed SerDes transceivers, the both the transmitter reference frequency and receiver reference frequency are generated by very stable oscillators with a stability of 100 ppm or better [H.8]. Therefore digital CDR is very suitable for high-speed single rate or multi-rate SerDes transceivers. 8.6 [1]. Adam, J.; Chi Shih Chang; Stankus, J.J.; Iyer, M.K.; Chen, W.T., “Addressing packaging challenges,” IEEE Circuits and Devices Magazine, vol. 18, Issue 4, July 2002, Pages:40 – 49. R Phase Rotating Control Logic JTp p (UI ) Reference [2]. Friedman, E.G., ed., Clock Distribution Networks in VLSI Circuits and Systems, IEEE Press, 1995. Transition Detection Digital Threshold Detection 9 Interaction with equalizer [3]. Maheshwari, N., and Sapatnekar, S.S., Timing Analysis and Optimization of Sequential Circuits, Kluwer, 1999. [4]. Lee, T.H.; Bulzacchelli, J.F.; “A 155-MHz clock recovery delay- and phase-locked loop, ” IEEE Journal of Solid-State Circuits, vol. 27, Issue 12, Dec. 1992, Page(s):1736 - 1746 [5]. Chia, B.; Kollipara, R.; Oh, D.; Yuan, C.; Boluna, L.S.; “Study of PCB trace crosstalk in backplane connector pin field,” IEEE Conference on Electrical Performance of Electronic Packaging, Oct. 2006, Page(s):281 – 284. [6]. A. X. Widmer and P. A. Franaszek, "A DC-balanced, partitioned-block, 8B/ul10B Transmission Code,", IBM J. Res. Develop, vol. 27, No.5, september 1983, page(s) 440-451. [7]. J. W. M. Bergmans, Digital Baseband Transmission and Recording, Kluwer Academic Publishers, 1996, ISBN 07923-9775-4. [8]. Razavi, B.; “Challenges in the design high-speed clock and data recovery circuits,” IEEE Communications Magazine, vol. 40, Issue 8, Aug. 2002, Page(s):94 – 101. [9]. Rong-Jyi Yang; Kuan-Hua Chao; Sy-Chyuan Hwu; Chuan-Kang Liang; Shen-Iuan Liu; “A 155.52 mbps3.125 gbps continuous-rate clock and data recovery circuit,” IEEE Journal of Solid-State Circuits, vol. 41, issue 6, June 2006 Page(s):1380 – 1390. [10]. Horowitz M.; Chih-kong Ken Yang; S. Sidiropoulos, “High-speed electrical signaling: Overview and limitations,” IEEE Micro, vol. 18, issue 1, Jan.-Feb. 1998, Page(s):12-24. [11]. K. Muhammad and R. B. Staszewski, “Direct RF sampling mixer with recursive filtering in charge domain,” in Proceedings of the International Symposium on Circuits and Systems (ISCAS ’04), vol. 1, pp. I-577– 26 I-580, Vancouver, BC, Canada, May 2004, sec. ASPL29.5. on, Volume 1, 6-9 May 2001 Page(s):176 - 179 vol. 1, Digital Object Identifier 10.1109/ISCAS.2001.921819 [12]. [G.1.] P. J. Pupalaikis, "Advanced tools for high-speed serial data measurements: equalizer emulation and virtual probing," DesignCon 2007. [24]. [G.13.] Aliparast, P.; Khoei, A.; Hadidi, KH.; “A Novel Fully-Differential Gm-C Filter Structure for Communication Channel Equalizer,” 14th International Conference on Mixed Design of Integrated Circuits and Systems, 21-23 June 2007, Page(s):209 – 214. [13]. [G.2.] Sinsky, J.H.; Duelk, M.; Adamiecki, A.; “Highspeed electrical backplane transmission using duobinary signaling,” IEEE Transactions on Microwave Theory and Techniques, Volume 53, Issue 1, Jan. 2005 Page(s):152 160 . [14]. [G.3.] B. Casper, P. Pupalaikis, and J. Zerbe, “Serial equalization,” DesignCon 2007 TechForum. [15]. [G.4.] Zhilong Tang, "A 100 MHz Gm-C analog equalizer for 100Base-TX application," Proceedings of 6th International Conference on Solid-State and Integrated-Circuit Technology, 2001.vol. 1, 22-25 Oct. 2001 Page(s): 240-242. [16]. [G.5.] T. M. Hollis, D. J. Comer, D. T. Comer, and B. Young, “Self-calibrating continuous-time equalization targeting Inter-symbol Interference,” IEEE North-East Workshop on Circuits and Systems, June 2006, Page(s):109 – 112. [17]. [G.6.] Miao Li; Kwasniewski, T.; Shoujun Wang; Yuming Tao; “A 10Gb/s transmitter with multi-tap FIR pre-emphasis in 0.18 m CMOS technology,” Proceedings of Asia and South Pacific Design Automation Conference, vol. 2, 18-21 Jan. 2005, Page(s):679 – 682. [18]. [G.7.] A. Fiedler et al., “A 1.0625 Gb/s transceiver with 2X-oversampling and transmit signal pre-emphasis,” in IEEE Int. Solid-State Circuits Conf. Dig. Tech. Papers, Feb. 1997, pp. 238–239. [19]. [G.8.] W. J. Dally and J. Poulton, “Transmitter equalization for 4-Gb/s signaling,” IEEE Micro, vol. 17, pp. 48–56, Jan./Feb. 1997. [20]. [G.9.] R. Farjad-Rad, C.-K. Yang, M. Horowitz, and T. H. Lee, “A 0.4m CMOS 10-Gb/s 4-PAM pre-emphasis serial link transmitter,” IEEE J. Solid-State Circuits, vol. 34, pp. 580–585, May 1999. [21]. [G.10] J. H. R. Schrader, E. A. M. Klumperink, J. L. Visschers, and B. Nauta, “Pulse-Width Modulation PreEmphasis Applied in a Wireline Transmitter, Achieving 33 dB Loss Compensation at 5-Gb/s in 0.13-m CMOS,” IEEE JOURNAL OF SOLID-STATE CIRCUITS, vol. 41, no. 4, APRIL 2006, pages: 990-999. [22]. [G.11.] Ruifeng Sun; Jaejin Park; O'Mahony, F.; Yue, C.P.; “A low-power, 20-Gb/s continuous-time adaptive passive equalizer,” IEEE International Symposium on Circuits and Systems, Vol.2, 23-26 May 2005, Page(s):920 – 923. [23]. [G.12.] P. Amini and O. Shoaer, "A low-power gigabit ethernet analog equalizer," Circuits and Systems, 2001. ISCAS 2001. The 2001 IEEE International Symposium [25]. [G.14.] J. S. Choi, M. S. Hwang, and D. K. Jeong, "A 0.18-m CMOS 3.5-Gb/s Continuous-Time Adaptive Cable Equalizer Using Enhanced Low-Frequency Gain Control Method," IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 39, NO. 3, MARCH 2004, page(s): 419-425. [26]. [G.15.] Mohan, S.S.; Hershenson, M.D.M.; Boyd, S.P.; Lee, T.H.; “Bandwidth extension in CMOS with optimized on-chip inductors, ” IEEE Journal of SolidState Circuits, Volume 35, Issue 3, March 2000, Page(s):346 - 355 [27]. [G.16.] Groenewold, G.; "Low-power MOSFET-C 120 MHz Bessel allpass filter with extended tuning range," IEE Proceedings on Circuits, Devices and Systems, Volume 147, Issue 1, Feb. 2000, Page(s):28 – 34. [28]. [G.17.] Johnson, J.; Johnson, D.; Boudra, P.; Stokes, V.; “Filters using Bessel-type polynomials,” IEEE Transactions on Circuits and Systems, Volume 23, Issue 2, Feb 1976, Page(s):96 – 99. [29]. [G.18.] Sands, N.P., Hauser, M.W., Liang, G., Groenewold, G., Lam, S., Lin, C.H., Kuklewicz, J., Lang, L., and Dakshinamurthy, R., “A 200Mb/s analog DFE read channel,” Proceedings of the ISSCC, 1996. [30]. [G.19] Burlingame, E.; Spencer, R.; "An analog CMOS high-speed continuous-time FIR filter," Proceedings of the 26th European Solid-State Circuits Conference, 1921 Sept. 2000 Page(s):288 – 291. [31]. [G.20] Xiaofeng Lin; Hoi Lee; Jin Liu; "A continuoustime adaptive FIR equalizer with LNV-AIL delay line for 2.5Gb/s data communication Proceedings of the IEEE Custom Integrated Circuits Conference, 18-21 Sept. 2005 Page(s):413 – 416. [32]. [G.21] Sewter, J.; Carusone, A.C.; “A CMOS finite impulse response filter with a crossover traveling wave topology for equalization up to 30 Gb/s,” IEE Journal of Solid-State Circuits, Volume 41, Issue 4, April 2006 Page(s):909 – 917. [33]. [G.22] J. E. Jauss, S. R. Mooney, “Discrete-time analog filter,” US patent 6,791,399, Sep. 14, 2004. [34]. [G.23] Balan, V.; Caroselli, J.; Chern, J.-G.; Chow, C.; Dadi, R.; Desai, C.; Fang, L.; Hsu, D.; Joshi, P.; Kimura, H.; Liu, C.Y.; Tzu-Wang Pan; Park, R.; You, C.; Yi Zeng; Zhang, E.; Zhong, F.; “A 4.8-6.4-Gb/s serial link for backplane applications using decision feedback equalization,” IEEE Journal of Solid-State Circuits, Volume 40, Issue 9, Sept. 2005 Page(s):1957 – 1967. 27 [35]. [G.24] Krishna, K.; Yokoyama-Martin, D.A.; Caffee, A.; Jones, C.; Loikkanen, M.; Parker, J.; Segelken, R.; Sonntag, J.L.; Stonick, J.; Titus, S.; Weinlader, D.; Wolfer, S.; “A multigigabit backplane transceiver core in 0.13-/spl m CMOS with a power-efficient equalization architecture,” IEEE Journal of Solid-State Circuits, Volume 40, Issue 12, Dec. 2005 Page(s):2658 – 2666. [36]. [G.25] Payne, R.; Landman, P.; Bhakta, B.; Ramaswamy, S.; Song Wu; Powers, J.D.; Erdogan, M.U.; Yee, A.-L.; Gu, R.; Lin Wu; Yiqun Xie; Parthasarathy, B.; Brouse, K.; Mohammed, W.; Heragu, K.; Gupta, V.; Dyson, L.; Wai Lee; “A 6.25-Gb/s binary transceiver in 0.13m CMOS for serial data transmission across high loss legacy backplane channels,” IEEE Journal of Solid-State Circuits, Volume 40, Issue 12, Dec. 2005 Page(s):2646 – 2657. [37]. [G.26] Zerbe, J.L.; Werner, C.W.; Stojanovic, V.; Chen, F.; Wei, J.; Tsang, G.; Kim, D.; Stonecypher, W.F.; Ho, A.; Thrush, T.P.; Kollipara, R.T.; Horowitz, M.A.; Donnelly, K.S.; "Equalization and clock recovery for a 2.5-10Gb/s 2-PAM/4-PAM backplane transceiver cell," IEEE Journal of Solid-State Circuits, Volume 38, Issue 12, Dec 2003 Page(s):2121 – 2130. [38]. [G.27] Momtaz, A.; Chung, D.; Kocaman, N.; Jun Cao; Caresosa, M.; Bo Zhang; Fujimori, I.; “A Fully Integrated 10-Gb/s Receiver With Adaptive Optical Dispersion Equalizer in 0.13-m CMOS,” IEEE Journal of Solid-State Circuits, Volume 42, Issue 4, April 2007 Page(s):872 – 880. [39]. [G.28] Mark J. Marlett, and Mark D. Rutherford, “Continuous-time decision feedback equalizer,” US patent 2006/0239341, Oct. 26, 2006. [40]. [G.29] M. Harwood, N. Warke, R. Simpson, T. Leslie, A. Amerasekera, S. Batty, D. Colman, E. Carr, V. Gopinathan, S. Hubbins, P. Hunt, A. Joy, P. Khandelwal, B. Killips, T. Krause, S. Lytollis, A. Pickering, M. Saxton, D. Sebastio, G. Swanson, A. Szczepanek, T. Ward, J. Williams, R.d Williams, T. Willwerth, “A 12.5Gb/s SerDes in 65nm CMOS Using a Baud-Rate ADC with Digital Receiver Equalization and Clock Recovery,” IEEE International Solid-State Circuits Conference, 2007, page(s): 436-613. [41]. [G.30] E.F. Stikvoort and J.A.C. v. Rens, “An all-digital bit detector for compact disc players,” IEEE J. Selected Areas Commun., vol. SAC-10, N0.1, Jan. 1992, page(s) 191-200. [42]. [G.31] P. Kabal and S. Pasupathy, “Patrial-response signaling,” IEEE Trans. Commun., vol. COM-23, No. 9, Sept. 1975, pages:921-934. [43]. [G.32] A. Lender, “The duobinary technique for highspeed data transmission,” IEEE Trans. Commun. Electron., vol.82, May 1963, pages: 214-218. [44]. [G.33] Yamaguchi, K.; Sunaga, K.; Kaeriyama, S.; Nedachi, T.; Takamiya, M.; Nose, K.; Nakagawa, Y.; Sugawara, M.; Fukaishi, M.; “12Gb/s duobinary signaling with /spl times/2 oversampled edge equalization,” Digest of Technical Papers, IEEE International Solid-State Circuits Conference, 6-10 Feb. 2005 Page(s):70 –71. [45]. [G.34] M. Tomlison, “New automatic equalizer employing modulo arithmetic,” Electron. Lett, vol.7, Nos 5/6, March 1971, pp.138-139. [46]. [G.35] H. Harashima and H. Miyakawa, “Matchedtransmission technique for channels with intersymbol inference,” IEEE Trans. Commun. Technol., vol. COM20, Aug. 1972, pp. 774-780. [47]. [H.1] Farjad-Rad, R.; Dally, W.; Hiok-Tiaq Ng; Senthinathan, R.; Lee, M.-J.E.; Rathi, R.; Poulton, J; “A low-power multiplying DLL for low-jitter multigigahertz clock generation in highly integrated digital chips,” IEEE Journal of Solid-State Circuits, Volume 37, Issue 12, Dec. 2002 Page(s):1804 – 1812. [48]. [H.2] W. R. Bennett, “Statistics of regenerative data transmission,” Bell Syst. Tech. J., Vol.37, pp.1501-1542, Nov. 1958. [49]. [H.3] F. M. Gardner, Phase Lock Techniques, New York: Wiley, 2nd edition, 1979. [50]. [H.4] F. M. Gardner, “Properties of frequency difference detectors,” IEEE Trans. Commun, vol. COM-33, No. 3, Feb. 1985, pages: 131-138. [51]. [H.5] Frambach, J.-P.; Heijna, R.; Krosschell, R.; “Single reference continuous rate clock and data recovery from 30 Mbit/s to 3.2 Gbit/s,” Proceedings of the IEEE Custom Integrated Circuits Conference, 12-15 May 2002, Page(s):375 – 378. [52]. [H.6] Rong-Jyi Yang; Kuan-Hua Chao; Shen-Iuan Liu; “A 200-Mbps/s2-Gbps continuous-rate clock-and-datarecovery circuit,” IEEE Transactions on Circuits and Systems I: Regular Papers, Volume 53, Issue 4, April 2006 Page(s): 842 – 847. [53]. [H.7] Dalton, D.; Kwet Chai; Evans, E.; Ferriss, M.; Hitchcox, D.; Murray, P.; Selvanayagam, S.; Shepherd, P.; DeVito, L.; “A 12.5-Mb/s to 2.7-Gb/s continuous-rate CDR with automatic frequency acquisition and data-rate readback,” IEEE Journal of Solid-State Circuits, Volume 40, Issue 12, Dec. 2005 Page(s): 2713 – 2725. [54]. [H.8] C. Kromer, G. Sialm, C. Menolfi, M. Schmatz, F. Ellinger, and H. Jackel, "A 25-Gb/s CDR in 90-nm CMOS for high-density interconnects," IEEE Journal of Solid-State Circuits, vol.41, No.12, Dec. 2006, pages: 2921-2929. [55]. [H.9] E. Panayirci, “Jitter analysis of a phase-locked digital timing recovery system,” IEE Proc.-I, vol.139, No.3, June 1992, pp.267-275. [56]. [H.10] B. R. Saltzberg, “Timing recovery for digital synchronous data transmission,” Bell syst. Tech. J., vol.46, pp.593-622, March 1967. 28 [57]. [H.11] Jri Lee; Kundert, K.S.; Razavi, B.; "Modeling of jitter in bang-bang clock and data recovery circuits," Proceedings of the IEEE Custom Integrated Circuits Conference, 21-24 Sept. 2003 Page(s):711 – 714. [58]. [H.12] Buckwalter, J.F.; Hajimiri, A.; "Analysis and equalization of data-dependent jitter," IEEE Journal of Solid-State Circuits, Volume 41, Issue 3, March 2006 Page(s):607 – 620. [59]. [H.13] Qingjing Du, “A Low-Power, High-JitterTolerance, All-Digital CDR and a Programmable, AntiHarmonic DLL Clock Multiplier with a Period Error Compensation Loop,” Ph.D. dissertation of Carleton University, 2007. [60]. [H.14] Thomas H. Lee, The design of CMOS RadioFrequency Integrated Circuits, Cambridge University Press, 1998. [61]. [H.15] [62]. [H.16] Ou, N.; Farahmand, T.; Kuo, A.; Tabatabaei, S.; Ivanov, A.; “Jitter models for the design and test of Gbps-speed serial interconnects,” IEEE Design & Test of Computers, Volume 21, Issue 4, July-Aug. 2004 Page(s):302 – 313. [63]. [H.17] Jitter Fundamentals: Jitter Tolerance Testing with Agilent 81250 ParBERT, available at http://cp.literature.agilent.com/litweb/pdf/5989-0223EN.pdf [64]. [H.18] Chin, J.; Cantoni, A.; “Phase jitter ≡ timing jitter?” IEEE Communications Letters, Volume 2, Issue 2, Feb. 1998 Page(s):54 – 56. [65]. [H.19] Jaeha Kim; Deog-Kyoon Jeong; “Multi-gigabitrate clock and data recovery based on blind oversampling,” IEEE Communications Magazine, Volume 41, Issue 12, Dec. 2003 Page(s):68 – 74. [66]. [H.20] Harwood, M.; Warke, N.; Simpson, R.; Leslie, T.; Amerasekera, A.; Batty, S.; Colman, D.; Carr, E.; Gopinathan, V.; Hubbins, S.; Hunt, P.; Joy, A.; Khandelwal, P.; Killips, B.; Krause, T.; Lytollis, S.; Pickering, A.; Saxton, M.; Sebastio, D.; Swanson, G.; Szczepanek, A.; Ward, T.; Williams, J.; Williams, R.; Willwerth, T.; “A 12.5Gb/s SerDes in 65nm CMOS Using a Baud-Rate ADC with Digital Receiver Equalization and Clock Recovery,” IEEE International Solid-State Circuits Conference, Digest of Technical Papers. 11-15 Feb. 2007 Page(s):436 – 591. [67]. [H.21] Hyung-Rok Lee; Moon-Sang Hwang; Bong-Joon Lee; Young-Deok Kim; Dohwan Oh; Jaeha Kim; SangHyun Lee; Deog-Kyoon Jeong; Kim, W.; “A 1.2-V-only 900-mW 10 gb ethernet transceiver and XAUI interface with robust VCO tuning technique,” IEEE Journal of Solid-State Circuits, Volume 40, Issue 11, Nov. 2005 Page(s):2148 – 2158. [68]. [H.22] G. Ascheid, M. Oerder, J. Stahl, and H. Meyr, “An all digital receiver architecture for bandwidth effective transmission at high data rates,” IEEE Trans. Commun., vol. COM-37, No.8, Aug. 1989, pages: 804813. [69]. [H.23] F. M. Gardner, “Interpolation in digital modems – part I: fudamentals,” IEEE Trans. Commun., vol. COM41, No.3, March 1993, pp.501-507. [70]. [H.24] Demir, A.; Feldmann, P.; “Stochastic modeling and performance evaluation for digital clock and data recovery circuits,” Proceedings of Design, Automation and Test in Europe Conference and Exhibition, 27-30 March 2000, Page(s):340 – 344. [71]. [H.25] W. Rhee, S. V. Rylov, and D. Friedman, “Semidigital delay-locked loop using an analog-based finite state machine,” United States Patent 6927611, Aug. 9, 2005. [72]. [H.26] A. L. Coban, M. H. Koroglu, and K. A. Ahmed, “A 2.5–3.125-Gb/s Quad Transceiver With SecondOrder Analog DLL-Based CDRs,” IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 40, NO. 9, SEPTEMBER 2005, Pages: 1940-1947. [73]. [H.27] Sonntag, J.L.; Stonick, J.; “A Digital Clock and Data Recovery Architecture for Multi-Gigabit/s Binary Links, ” IEEE Journal of Solid-State Circuits, Volume 41, Issue 8, Aug. 2006 Page(s):1867 – 1875. [74]. [H.28] Y. Miki, T. Saito, H. Yamashita, F. Yuki, T. Baba, A. Koyama, M. Sonehara, “A 50-mW/ch 2.5-Gb/s/ch Data Recovery Circuit for the SFI-5 Interface With Digital Eye-Tracking”, IEEE Journal of Solid State Circuits, Vol. 39, No.4 , April 2004. [75]. [H.29] Pottbacker, A.; Langmann, U.; Schreiber, H.-U.; “A Si bipolar phase and frequency detector IC for clock extraction up to 8 Gb/s,” IEEE Journal of Solid-State Circuits, Volume 27, Issue 12, Dec. 1992 Page(s):1747 – 1751. [76]. [H.30] Kreienkamp, R.; Langmann, U.; Zimmermann, C.; Aoyama, T.; Siedhoff, H.; “A 10-gb/s CMOS clock and data recovery circuit with an analog phase interpolator,” IEEE Journal of Solid-State Circuits,Volume 40, Issue 3, March 2005 Page(s):736 – 743. [77]. [H.30] Oh, Do-Hwan; Kim, Deok-Soo; Kim, Suhwan; Jeong, Deog-Kyoon; Kim, Wonchan; “A 2.8 GB/s AllDigital CDR with a 10b Monotonic DCO,” Digest of Technical Papers of IEEE International Solid-State Circuits Conference, 11-15 Feb. 2007 Page(s):222 – 598. [78]. [H.31] Olsson, T.; Nilsson, P.; “A digitally controlled PLL for SoC applications,” IEEE Journal of Solid-State Circuits, Volume 39, Issue 5, May 2004 Page(s):751 – 760. [79]. [H.32] Staszewski, R.B.; Chih-Ming Hung; Barton, N.; Meng-Chang Lee; Leipold, D.; “A digitally controlled oscillator in a 90 nm digital CMOS process for mobile phones,” IEEE Journal of Solid-State Circuits, Volume 40, Issue 11, Nov. 2005 Page(s):2203 – 2211. 29 [80]. [H.33] C. Hogge, “A Self-Correcting Clock Recovery Circuit,” IEEE Journal of Lightwave Technology, vol. LT-3, pp.1312-1314, December 1985. [81]. [H.34] Xinyu Chen; Green, M.M.; “A CMOS 10 Gb/s clock and data recovery circuit with a novel adjustable Kpd phase-detector,” Proceedings of the 2004 International Symposium on Circuits and Systems, Volume 4, 23-26 May 2004 Page(s): IV - 301-304. [82]. [H.35] Rennie, David; Sachdev, Manoj; “Comparative Robustness of CML Phase-detectors for Clock and Data Recovery Circuits,” 8th International Symposium on Quality Electronic Design, 26-28 March 2007 Page(s):305 – 310. [83]. [H.36] J. D. H. Alexander, “Clock recovery from random binary signals,” Electron. Lett., vol. 11, pp. 541–542, Oct. 1975. [84]. [H.37] Youngdon Choi, Deog-Kyoon Jeong, and Wonchan Kim,"Jitter Transfer Analysis of Tracked Oversampling Techniques for Multigigabit Clock and Data Recovery," IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 50, No. 11, NOVEMBER 2003, pages: 1573-1580. [85]. [H.38] Nosaka, H.; Ishii, K.; Enoki, T.; Shibata, T.; “A 10-Gb/s data-pattern independent clock and data recovery circuit with a two-mode phase comparator,” IEEE Journal of Solid-State Circuits, Volume 38, Issue 2, Feb. 2003 Page(s):192 – 197. [86]. [H.39] Soyuer, M.; Meyer, R.G.; “Frequency limitations of a conventional phase-frequency detector,” IEEE Journal of Solid-State Circuits, Volume 25, Issue 4, Aug. 1990 Page(s):1019 – 1022. [87]. [H.40] Messerschmitt, D.; “Frequency Detectors for PLL Acquisition in Timing and Carrier Recovery,” IEEE Transactions on Communications, Volume 27, Issue 9, Sep 1979 Page(s):1288 – 1295. [88]. [H.41] Savoj, J.; Razavi, B.; “A 10-Gb/s CMOS clock and data recovery circuit with a half-rate linear phasedetector,” IEEE Journal of Solid-State Circuits, Volume 36, Issue 5, May 2001 Page(s):761 – 768. [89]. [H.42] Byun, S.; Lee, J.C.; Shim, J.H.; Kim, K.; Yu, H.K.; “A 10-Gb/s CMOS CDR and DEMUX IC With a Quarter-Rate Linear Phase-detector,” IEEE Journal of Solid-State Circuits, Volume 41, Issue 11, Nov. 2006 Page(s):2566 – 2576. [90]. [H.43] T. Toifl,C. Menolfi, P. Buchmann, C. Hagleitner, M. Kossel, T. Morf, J. Weiss, M. Schmatz, "A 72mW 0.03mm2 Inductorless 40Gb/s CDR in 65nm SOI CMOS [Abstract] [PDF]Titles (Sorted Alphabetically): A 72mW 0.03mm2 Inductorless 40Gb/s CDR in 65nm SOI CMOS," ISSCC 2007, session 12, pages: 226-227, 598. [91]. [H.44] Seong-Jun Song; Sung Min Park; Hoi-Jun Yoo; “A 4-Gb/s CMOS clock and data recovery circuit using 1/8-rate clock technique, ” IEEE Journal of Solid-State Circuits, Volume 38, Issue 7, July 2003 Page(s):1213 – 1219. [92]. [H.45] Ohtomo, Y.; Nishimura, K.; Nogawa, M.; "A 12.5-Gb/s Parallel Phase Detection Clock and Data Recovery Circuit in 0.13-m CMOS,” IEEE Journal of Solid-State Circuits, Volume 41, Issue 9, Sept. 2006 Page(s):2052 – 2057. [93]. [H.46] Mehrdad Ramezani, C. Andre, and T. Salama, "Analysis of a Half-Rate Bang–Bang Phase-LockedLoop," IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS—II: ANALOG AND DIGITAL SIGNAL PROCESSING, VOL. 49, NO. 7, JULY 2002, pages: 505-509. [94]. [H.47] Jri Lee, and Behzad Razavi, "A 40-Gb/s Clock and Data Recovery Circuit in 0.18-m CMOS Technology," IEEE JOURNAL OF SOLID-STATE CIRCUITS, VOL. 38, NO. 12, DECEMBER 2003, Pages: 2181-2190. [95]. [H.48] Staszewski, R.B.; Leipold, D.; Muhammad, K.; Balsara, P.T.; “Digitally controlled oscillator (DCO)based architecture for RF frequency synthesis in a deepsubmicrometer CMOS Process,” IEEE Transactions on Circuits and Systems II: Analog and Digital Signal Processing, Volume 50, Issue 11, Nov. 2003 Page(s):815 – 828. [96]. [H.49] B. Stilling, “Bit rate and protocol independent clock-and-data-recovery,” Electron. Lett., vol. 36, pp. 824–825, Apr. 2000. [97]. [H.50] L. DeVito, “A versatile clock recovery architecture and monolithic implementation,” in Monolithic Phase-Locked Loops and Clock Recovery Circuits, B. Razavi, Ed. Pistacaway, NJ: IEEE Press, 1996, pp.405–420. [98]. [H.51] L. DeVito et al., “A 52MHz and 155 MHz clock recovery PLL,” in IEEE Int. Solid-State Circuits Conf. (ISSCC) Dig. Tech. Papers, San Francisco, CA, Feb. 1991, pp. 142–143. [99]. [H.52] J. Cao, M. Green, A. Momtaz, K. Vakilian, D. Chung, K.-C. Jen, et al, “OC-192 Transmitter and Receirver in Standard 0.18-um CMOS”, IEEE Journal of Solid State Circuits, Vol. 37, No. 12, December 2002, pp. 1964 - 1973. [100]. [H.53] S.-H. Lee, M.-S. Hwang, Y. Chor, S. Kim, Y. Moon, B.-J. Lee, D.-K. Jeong, W. Kim, Y.-J. Park and G. Ahn, “A 5-Gb/s 0.25-um CMOS Jitter-Tolerant Variable-Interval Oversampling Clock/Data Recovery Circuit”, IEEE Journal of Solid State Circuits, Vol.37, No.12, December 2002. [101]. [H.54] International Engineering Consortium, “SONET Tutorial”, available at http://www.iec.org/online/tutorials/sonet/index.html [102]. [H.55] J. Lee, K. S. Kundert, and B. Razavi, “Analysis and modeling of bang-bang clock and data 30 recovery circuits,” IEEE Journal of Solid-State Circuits, Vol.39, No.9, Sept. 2004, pp.1571-1580. [103]. [H.56] J. Craninckx and M. S. J. Steyaert, “A fully integrated CMOS DCS-1800 frequency synthesizer,” IEEE J. Solid-State Circuits, vol. 33, pp. 2054–2065, Dec. 1998. [104]. [H.57] M. Borremans and M. Steyaert, “A CMOS 2V quadrature direct up-converter chip for DCS-1800 integration,” in Proc. 26th Eur. Solid-State Circuits Conf. (ESSCIRC), Stockholm, Sweden, Sept. 2000, pp. 88–91. [105]. [H.58] M. Rofougaran, A. Rofougaran, J. Rael, and A. A. Abidi, “A 900-MHz CMOS LC-oscillator with quadrature outputs,” in Proc. IEEE Int. Solid-State Circuits Conf., New York, NY, 1996, p. 392. [106]. [H.59] Tiebout, M.; “Low-power low-phase-noise differentially tuned quadrature VCO design in standard CMOS,” Solid-State Circuits, IEEE Journal of Volume 36, Issue 7, July 2001 Page(s):1018 – 1024. [107]. [H.60] Grozing, M.; Phillip, B.; Berroth, M.; “CMOS ring oscillator with quadrature outputs and 100 MHz to 3.5 GHz tuning range,” Proceedings of the 29th European Solid-State Circuits Conference, 16-18 Sept. 2003 Page(s):679 – 682 [108]. [F.1] Gardner, F.; “Charge-Pump Phase-Lock Loops,” IEEE Transactions on Communications, Volume 28, Issue 11, Nov 1980 Page(s):1849 – 1858 [109]. [F.2] D. B. Leeson, “A simple model of feedback oscillator noise spectrum,” Proc. IEEE, vol. 54, Feb. 1996, pp.329-330. [110]. [F.3] J. Rogers, C. Plett, and F. Dai, Integrated circuit design for high-speed frequency-synthesis, Artech House, 2006. [111]. [F.4] J. Bulzacchelli, “A delay-locked loop for clock recovery and data synchronization,” Master’s thesis, Massachusetts Institute of Technology, 1990. [112]. [F.5] Staszewski, R.B.; Leipold, D.; Muhammad, K.; Balsara, P.T.; “Digitally controlled oscillator (DCO)based architecture for RF frequency synthesis in a deepsubmicrometer CMOS Process,” IEEE Transactions on Circuits and Systems II, Volume 50, Issue 11, Nov. 2003 Page(s):815 – 828 [113]. [F.6] Waheed, K.; Staszewski, R.B.; “Characterization of deep-submicron varactor mismatches in a digitally controlled oscillator,” Proceedings of the IEEE Custom Integrated Circuits Conference, 18-21 Sept. 2005 Page(s):605 – 608. [114]. [F.7] Staszewski, R.B.; Balsara, P.T.; “Phase-domain all-digital phase-locked loop,” IEEE Transactions on [see also Circuits and Systems II, Volume 52, Issue 3, March 2005 Page(s):159 – 163 [115]. [F.8] C. Chung and C. Lee, “An all-digital phaselocked loop for high-speed clock generation,” IEEE J. Solid-State Circuits, vol. 38, pp. 347–351, Feb. 2003. [116]. [F.9] T. Hsu, C. Wang, and C. Lee, “Design and analysis of a portable high speed clock generator,” IEEE Trans. Circuits Systems II: Analog and Digital Signal Processing, vol. 48, pp. 367–375, Apr. 2001. [117]. [F.10] Johnson, M.G.; Hudson, E.L.; “A variable delay line PLL for CPU-coprocessor synchronization,” IEEE Journal of Solid-State Circuits, Volume 23, Issue 5, Oct. 1988 Page(s):1218 – 1223. [118]. [F.11] Chen C.-C. ; Chang J.-Y. ; Liu S.-I.; “ DLLBased Variable-Phase Clock Buffer,” EEE Transactions on Circuits and Systems II, accepted for future publication, Volume PP, Issue 99, 2007 Page(s):1 – 1 [119]. [F.12] Sidiropoulos, S.; Horowitz, M.A.; “A semidigital dual delay-locked loop,” IEEE Journal of Solid-State Circuits, Volume 32, Issue 11, Nov. 1997 Page(s):1683 – 1692. [120]. [F.13] Lee, T.H.; Donnelly, K.S.; Ho, J.T.C.; Zerbe, J.; Johnson, M.G.; Ishikawa, T.; “A 2.5 V CMOS delaylocked loop for 18 Mbit, 500 megabyte/s DRAM,” IEEE Journal of Solid-State Circuits, Volume 29, Issue 12, Dec. 1994 Page(s):1491 – 1496 [121]. [F.14] G. Chien and P. R. Gray, “A 900-MHz Local Oscillator Using a DLL-Based Frequency Multiplier Technique for PCS Applications”, IEEE Journal of Solid State Circuits, Vol. 35, No. 12, December 2000. [122]. [F.15] D. J. Foley and M. P. Flynn, “CMOS DLLBased Synthesizer and Temperature-Compensated Tunable State Circuits, Vol. 36, No. 3, March 2001. [123]. [F.16] C.-C. Wang, H.-C. She and R. Hu, “A 1.2 GHz Programmable DLL-Based Frequency Multiplier for Wireless Application”, Transaction on VLSI, 2002. [124]. [F.17] J.-H. Kim, Y.-H. Kwak, S.-R. Yoon, M.-Y. Kim, S.-W. Kim, C. Kim, “A CMOS DLL-Based 120MHz to 1.8GHz Clock Generator for Dynamic Frequency Scaling”, IEEE ISSCC, 2005, pages: 516-517. [125]. [F.18] G.-Y. Wei, J. T. Stonick, D. Weinlader, J. Sonntag, S. Searles, “A 500MHz MP/DLL Clock Generator for a 5Gb/s Backplane Transceiver in 0.25 um CMOS”, IEEE ISSCC, February 2003. [126]. [F.19] Qingjin Du; Jingcheng Zhuang; Kwasniewski, T.; “A Low-Phase Noise, Anti-Harmonic Programmable DLL Frequency Multiplier With Period Error Compensation for Spur Reduction,” IEEE Transactions on Circuits and Systems II, Volume 53, Issue 11, Nov. 2006 Page(s):1205 – 1209. [F.20] Maulik, P. C.; Mercer, D. A.; “ A DLL-Based Programmable Clock Multiplier in 0.18 MS t S a SStat t a M t t Sta t 31 M a a aaa t Stat a tt ta aatt a t a t a SStat t a at Kata a a tt a tat a S tt a at a t a St t a [131]. a a S ta at m CMOS,” IEEE Journal of Solid-State Circuits, Vol.36, No.4, April 2001, pages: 706-771. [132].