Active Optical Clock Distribution

Active Optical Clock Distribution

by

Travis L. Simpkins

Submitted to the Department of Electrical Engineering and Computer

Science in partial fulfillment of the requirements for the degree of

Master of Science in Electrical Engineering and Computer Science at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

May 2002

@

Massachusetts Institute of Technology 2002.

All rights reserved.

A uthor ........

............

Department of Electrical Engineering and Computer Science

May 24, 2002

C ertified by .............................

Anantha P. Chandraksan

Associate Professor

Thesis Supervisor

A ccepted by .........

... ........................

Arthur C. Smith

Chairman, Department Committee on Graduate Students

BAMKE R

MASSACHUSETTS iNSTITUTE

OF TECHOLOGY

JUL 3 12002

LIBRARIES

2

Active Optical Clock Distribution by

Travis L. Simpkins

Submitted to the Department of Electrical Engineering and Computer Science on May 24, 2002, in partial fulfillment of the requirements for the degree of

Master of Science in Electrical Engineering and Computer Science

Abstract

Clock distribution has become a major problem in integrated circuits. Although clock cycle times continue to decrease, the time allocated to uncertainty in the clock due to skew and jitter has remained constant. Therefore, the percentage of the clock budget devoted to uncertainty has become significant.

One solution to the clock uncertainty problem is to distribute the clock optically.

Conventionally, this has involved using a transimpedance pre-amplifier to convert the optical current pulses from the photodetector into voltage waveforms. An inverterbased cascade is then used to amplify the clock pulses into full-swing signals that drive the local clock buffers. Past research has shown that this approach is limited by the imperfect matching of amplifiers from one block to another. Arising from process, voltage, and temperature, these variations can significantly increase the skew, thus negating the benefits of distributing a skewless optical clock.

This thesis will focus on an alternative approach to optical clock distribution.

Whereas the cascaded amplifier approach attempts to convert optical current pulses into an electrical waveform, the architecture to be explored in this thesis will use an optical reference clock to deskew an electrical clock. The architecture resembles that of a delay-locked loop (DLL) in that a voltage-controlled delay line is used to synchronize the fully-buffered electrical clock with the optical current pulses from the photodetector. The use of a feedback-based architecture allows the loop to compensate for variations due to slow process, voltage, and temperature, and thus minimize skew.

Thesis Supervisor: Anantha P. Chandraksan

Title: Associate Professor

3

4

Acknowledgments

I would like to thank Prof. Anantha Chandrakasan for his technical contributions to this thesis, as well his guidance and encouragement during the course of the project.

It is an honor to have the opportunity to conduct research under a true visionary in the field. This research would also not have been possible without the work of Dr.

Paul-Peter Sotiriadis, who provided the transistor-level design of the phase detector and contributed to the architecture. Additionally, I must thank Ben Ruedlinger who offered his optoelectronics experience to the project.

Next, I would like to thank my parents Jerry and Mary Ellen. Their unwavering support of my pursuits over the past twenty-four years is nothing short of amazing.

I cannot begin to thank them enough for everything they have done.

Chip design is always a challenging endeavor, and as such, requires the transfer of accumulated knowledge down through the generations of designers. For this reason,

I am indebted to Seong-Hwan Cho, Chee We Ng, and Andrew Chen for offering their help and expertise during the design process. I am also appreciative of the support offered by the rest of the research group, including Benton Calhoun, Francis Honore,

Alice Wang, Fred Lee, Rex Min, Nathan Ickes, Theodorus Konstantakopoulos, Raul

Blasquez-Fernandez, Piyada Phanaphat, and Puneet Newkasar, as well as alumni

Manish Bhardwaj, Amit Sinha, and James Kao. I would also like to thank David

Wentzloff for his consultations on the art of analog circuit design.

Many people have contributed to my technical development as an engineer. My friends from Suite820-Dan Baker, Aaron Carkin, Kent Lee, Dan Prorok, Ben Ruedlinger, and Steven Troyer-have had a phenomenal influence on my career and my life. I would also like to thank my friends and colleagues at Agilent Technologies in Fort Collins,

CO, for jump-starting my career in chip design over the course of two summer internships. I am particularly indebted to my mentor-manager, Stephen Clarke, who has repeatedly offered his technical expertise, both to my projects at Agilent and to my research at MIT.

The following people have also contributed to my success: Prof. Francis Merat,

5

Prof. David Smith, Paul Maccoux, Jeff Tracey, Ray Rosenberry, Jim Duxbury, James

Powell, Bill Yerman, Joyce Fast, David Tibbitts, and the staff of Oak Street Elementary in Orrville, OH.

Finally, I am also grateful to the National Defense Science and Engineering Graduate Fellowship (NDSEG) and to MARCO for financial support throughout the project.

6

Contents

1 Introduction

1.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

1.1.1 Electrical Clock Distribution . . . . . . . . . . . . . . . . . . .

1.1.2 Optical Clock Distribution . . . . . . . . . . . . . . . . . . . .

17

18

18

20

2 Architecture of the Optical Deskew Buffer

2.1 System Operation . . . . . . . . . .

2.2 The Local Controller . . . . . . . .

2.3 The Delay Line . . . . . . . . . . .

2.4 Stability of the ODB Architecture .

23

24

24

29

29

3 Circuit Implementation o f the Optical I eskew Buffer

3.1 Circuits of the Local Co ntroller .

.

3.1.1 The Phase Detector . . . . . . .

3.1.2 Amplifiers . . . . . . . . . . . .

3.1.3 Latched Comparator . . . . . .

3.1.4 Control Block . . . . . . . . . .

3.1.5 Charge Pump . . . . . . . . . .

3.1.6 Local Controller Synchronization

3.2 Circuits of the Delay Line . . . . . . .

3.2.1 Bias Generator . . . . . . . . .

3.2.2 Delay Elements . . . . . . . . .

3.2.3 Differential-to-Single-Ended Converter

7

36

37

38

40

41

42

44

32

32

31

31

34

3.3 Local Controller and Delay Circuits Interaction

3.4 Auxiliary Components . . . . . . . . . . . . . .

3.4.1 Current Pulse Generator . . . . . . . . .

3.4.2 Ring Oscillator . . . . . . . . . . . . . .

3.4.3 XOR Phase Detector . . . . . . . . . . .

4 Optoelectronics

4.1 Background . . . . . . . . . . . . . . . . . . . .

4.2 Lateral-PIN Photodetectors . . . . . . . . . . .

4.3 Implementation . . . . . . . . . . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

46

48

49

49

51

. . . . . . . . . .

. . . . . . . . . .

. . . . . . . . . .

5 On-Chip Measurement of System Performance

5.1 Background . . . . . . . . . . . . . . . . . . . .

5.2 Previous Work . . . . . . . . . . . . . . . . . .

5.2.1 Time-to-Voltage Converters . . . . . . .

5.2.2 Time-to-Digital Converters . . . . . . . .

5.3 Overview of the Time-to-Digital Converter . . .

5.4 Time-to-Digital Converter Implementation . . .

5.4.1 Operation of the TDC . . . . . . . . . .

5.4.2 Resolution and Calibration . . . . . . . .

5.5 Sum m ary . . . . . . . . . . . . . . . . . . . . .

6 The Test Chip

6.1 Dual-Optical Deskew Buffers .

6.1.1 DODB Results . . . . .

6.2 Closed-Loop Simulated Pulsing

6.2.1 CLSP Results . . . . . .

6.3 Summary . . . . . . . . . . . .

7 Conclusions

7.1 R esults . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

7.2 Performance Limitations of the ODB Architecture . . . . . . . . . . .

83

83

84

8

71

72

74

76

77

81

59

60

61

61

61

62

63

66

68

68

53

53

54

56

7.3 Summary ....... ................................. 86

7.4 Future W ork . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86

7.4.1 Optoelectronics . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7.4.2 Circuitry . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 87

7.4.3 Testing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88

A Test Chip Implementation Details 89

A.1 Open-Loop Simulated Pulsing . . . . . . . . . . . . . . . . . . . . . . 89

A.1.1 OLSP Results . . . . . . . . . . . . . . . . . . . . . . . . . . . 91

A.2 Layout Techniques . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

A .3 Sum m ary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

93

B Bonding Diagram 97

9

10

List of Figures

1-1 Illustration of skew and jitter. . . . . . . . . . . . . . . . . . . . . . . 17

1-2 Balanced H-tree clock distribution network. . . . . . . . . . . . . . .

19

1-3 Intel deskew buffer (IDSK) architecture [6] . . . . . . . . . . . . . . . 20

1-4 Optical clock distribution using waveguides [9]. . . . . . . . . . . . .

21

1-5 Transimpedance amplifier-based optical clock distribution. . . . . . . 21

2-1 Optical deskew buffer architecture. . . . . . . . . . . . . . . . . . . . 24

2-2 Local controller block diagram. . . . . . . . . . . . . . . . . . . . . . 25

2-3 Illustration of A. lead, B. lag, and C. locked with the corresponding phase detector output. . . . . . . . . . . . . . . . . . . . . . . . . . . .25

2-4 T im ing. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27

2-5 ODB modes of operation. . . . . . . . . . . . . . . . . . . . . . . . .

28

3-1 Block diagram of phase detector. . . . . . . . . . . . . . . . . . . . . 32

3-2 Fully-differential offset-cancelling switched capacitor amplifier. .... 33

3-3 Timing diagram of the switched-capacitor amplifier. . . . . . . . . . . 33

3-4 Latched comparator. . . . . . . . . . . . . . . . . . . . . . . . . . . . 35

3-5 Timing diagram of the latched comparator. . . . . . . . . . . . . . . . 35

3-6 Control block. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36

3-7 Charge pump schematic. . . . . . . . . . . . . . . . . . . . . . . . . . 38

3-8 Amplifier and comparator timing diagram. . . . . . . . . . . . . . . . 39

3-9 Clock generation block. . . . . . . . . . . . . . . . . . . . . . . . . . . 40

3-10 Delay circuits block diagram. . . . . . . . . . . . . . . . . . . . . . . 41

3-11 Delay Line bias generator. . . . . . . . . . . . . . . . . . . . . . . . . 41

11

3-12 D elay line. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 42

3-13 Delay elem ent. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

43

3-14 Normalized delay vs. control voltage. . . . . . . . . . . . . . . . . . . 44

3-15 Normalized delay vs. Up pulses received by the charge pump. .... 45

3-16 Differential-to-single-ended converter. . . . . . . . . . . . . . . . . . . 46

3-17 Control voltage of ODB while in lock. . . . . . . . . . . . . . . . . . . 47

3-18 Current pulse generator. . . . . . . . . . . . . . . . . . . . . . . . . . 49

3-19 Ring oscillator. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 50

3-20 Ring oscillator and delay line. . . . . . . . . . . . . . . . . . . . . . . 50

3-21 XOR phase detector . . . . . . . . . . . . . . . . . . . . . . . . . . . 51

4-1 Cross-sectional structure of a typical lateral-PIN photodetector. . .

.

55

4-2 Layout of a single finger of the photodetector. . . . . . . . . . . . . . 56

4-3 Layout of the complete photodetector. . . . . . . . . . . . . . . . . . 57

4-4 Cross-sectional structure of the implemented photodetector. ..... 58

5-1 Illustration of skew. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60

5-2 TDC overview. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62

5-3 TDC simplified timing diagram. . . . . . . . . . . . . . . . . . . . . . 63

5-4 TDC block diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . 64

5-5 TDC schem atic slice. . . . . . . . . . . . . . . . . . . . . . . . . . . . 65

5-6 TDC timing diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . 67

6-1 Dual-Optical Deskew Buffer (DODB) architecture . . . . . . . . . . .

72

6-2 DODB simulation results showing the control voltages of each ODB. .

74

6-3 Normalized phase of each ODB local clock output. . . . . . . . . . . . 75

6-4 Closed-Loop Simulated Pulsing (CLSP) architecture. . . . . . . . . . 76

6-5 CLSP system control voltage and XOR phase detector output. . . . . 77

6-6 CLSP reference clock and local clock phase relationship during acquisition and lock modes. . . . . . . . . . . . . . . . . . . . . . . . . . . 78

6-7 CLSP system control voltage while in lock, showing quantization noise. 79

12

6-8 CLSP power consumption. . . . . . . . . . . . . . . . . . . . . . . . . 80

A-1 Open-Loop Simulated Pulsing (OLSP) architecture. . . . . . . . . . . 90

A-2 O LSP results. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92

A-3 Layout of the chip. . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95

B-1 Bonding diagram. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98

13

14

List of Tables

1.1 Variation induced skew in optoelectronic circuits.....

3.1 Delay line dynamic range at 100 MHz, 1.8 V, and 25 'C.

22

42

4.1 Photodetector design summary . . . . . . . . . . . . . .

58

5.1

5.2

TDC design summary . . . . . . . . . . . . . . . . . . . . . . . . . .

TDC pins . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .

68

69

6.1

6.2

6.3

DODB pins . . . . . .

CLSP pins . . . . . . .

Simulated results .

. .

A.1

A.2

A.3

OLSP pins . . . . . . .

Test chip summary . .

Other pins . . . . . . .

91

94

94

73

81

81

15

16

Chapter 1

Introduction

Clock distribution has become a significant challenge in VLSI design. Modern microprocessors contain tens-of-thousands of synchronous elements whose correct operation relies upon the distribution of a precise clock. Traditionally, the primary means of increasing the performance of these devices has been to increase the clock rate, or equivalently, to decrease the clock cycle time. Therefore, the maximum operating frequency of a device occurs when the slowest sequence of register-bounded logic gates can just complete execution within the corresponding cycle time. For this reason, it is advantageous to fully utilize the entire clock period. In typical systems, however, the full clock period cannot be devoted to logical operations due to imperfections in the clock waveform, commonly known as skew and jitter.

Whereas skew refers to the static phase difference between the clock signals at two unique points in the network, jitter refers to the tendency for a particular clock edge to dynamically shift either forward or backward in time. Fig. 5-1 illustrates the difference between skew and jitter. Together, skew and jitter are known as clock

I

+-T

= 8 ns

I

1|+T = 8 ns

Skew =1 ns Jitter = 0.5 ns

Figure 1-1: Illustration of skew and jitter.

17

-

uncertainties and often account for as much as 10% of the clock budget in a modern microprocessor.

With technology scaling, propagation delays have likewise decreased, thus leading to higher clock frequencies and greater performance. Uncertainties in the clock waveform, however, have remained generally constant. When combined with shorter cycle times, constant amounts of skew and jitter have led to increased portions of the timing budget being devoted to uncertainty. Recently reported figures indicate that during Intel's transition from .18 ptm to .1 pm technology, the skew budget increased by 37%[1].

This chapter will discuss the background and motivations for the work presented in this thesis. A discussion of the previous work in both electrical and optical clocking will be presented in this chapter.

1.1 Background

Since the creation of microprocessors in the 1970's, electrical clocks have been used to coordinate the operation of the various synchronous elements contained on the integrated circuit. This approach has proven widely successful across many generations of designs and architectures. As frequencies increase beyond 10 GHz, however, electrical clocks are expected to become far less reliable, thus threatening the future development of high-speed designs[1).

In anticipation of this obstacle, alternative clock distribution schemes have been considered. Possibly the most promising, optical clock distribution was first proposed in 1985 [2, 3]. Although significant research has been conducted since that time, optical clocking has yet to be successfully demonstrated on a commercial microprocessor.

1.1.1 Electrical Clock Distribution

To help reduce skew, many microprocessors use a hierarchical clock distribution scheme based on a balanced H-tree network as shown in Fig. 1-2 [4, 5]. This approach uses additional buffers at each level of the hierarchy to provide the proper

18

Figure 1-2: Balanced H-tree clock distribution network.

amount of drive strength. Since all of the buffers at a given level are driven by the same higher-level buffer and are located equidistant from this buffer, the clock edges transmitted from the higher-level buffer theoretically arrive simultaneously at each local buffer. Although careful routing can ensure that the H-tree is designed in a perfectly balanced manner, process variations during fabrication now produce enough imperfections that their effects are significant. For this reason, basic H-tree distribution schemes are becoming less viable for clock distribution.

One solution that compensates for the imperfect H-tree distribution networks is to deskew the clock at the second level buffers[6, 7] as shown in Fig. 1-3. Proposed

by Intel and implemented on the IA-64 microprocessor, this approach distributes two clocks in symmetric H-trees. One of the clocks is the global core clock while the other clock is a reference clock that sees a fixed load and undergoes less buffering. These characteristics cause it to experience less skew than the primary global clock. At the local level, the reference clock is used to deskew, or synchronize, the global clock before being distributed to the local clock domain via a clock grid. The deskewing, however, is only performed upon startup; once the local clock is synchronized to the reference clock, the deskewing operation ends. As a result, this approach can compensate

19

Deskew Buffer

Global Clock 1

Delay

Circuit

TAP I/F

Ref. Clock

S Local ControllerRC

I

RCD

MMMMM

-

Regional

Clock Grid

~:i ii

I I .]-MM

Figure 1-3: Intel deskew buffer (IDSK) architecture [6]. for process-induced variations in the H-tree distribution network and the higherlevel buffers of the global core clock. To increase the stability of the system, Intel currently stops deskewing once the startup phase is complete. Therefore, the IDSK does not presently compensate for skew induced by dynamic voltage and temperature gradients, or for a time-varying load. Using this approach, Intel has reported a a 75% reduction in skew from 110 ps to 28 ps [6].

1.1.2 Optical Clock Distribution

One commonly proposed solution to the clock skew problem is to distribute the clock optically [2, 3, 8]. In such a system, an off-chip optical source, such as a laser, generates a pulse train of photons at the desired frequency. These optical pulses are then distributed across the chip, either using off-chip holographic distribution or optical waveguides on-chip, to the local clock domains where they are converted into a conventional electrical clock. Fig. 1-4 shows one possible implementation of optical clocking in which the traditional H-tree topology is still employed, first to distribute the optical pulses, and later to distribute the electrical clock after the optical-to-electrical conversion has occurred. However, since the highest level of the hierarchy involves photons traveling through optical waveguides rather than electrons in metal routes, virtually no skew is introduced. Once the photons reach the end of the top-level waveguides, they are converted to electrical pulses before being further

20

waveguides rec eiver circuitry electrical clock distribution

Figure 1-4: Optical clock distribution using waveguides [9]. distributed to synchronous elements in the local block.

The conversion of the optical current pulses to voltage waveforms is performed

by a transimpedance amplifier (TIA) [10]. Since the waveforms out of the TIA are typically not full swing, a cascade of voltage buffers are used to amplify the pulses into logic-level signals. A block level diagram of this transimpedance-based architecture is shown in Fig. 1-5. The problem with this approach is that it requires excellent

Cok Local Cloc~k

Photodetector

Reverse Bias

Figure 1-5: Transimpedance amplifier-based optical clock distribution.

21

Table 1.1: Variation induced skew in optoelectronic circuits.

Variation Source Skew

-10% VDD

+50

0

C

+10% Lp

01 y

-10% Vt

24 ps

18 ps

80 ps

70 ps matching between the amplifiers that are located across the chip. Variations arising from process, voltage, and temperature combine to create skew between the local clock domains. Recently reported figures for this skew are shown in Table 1.1 [11].

Research has shown that these variations can nullify nearly all of the gains associated with distributing an ideal optical clock [12].

22

Chapter 2

Architecture of the Optical Deskew

Buffer

The architecture of the optical deskew buffer resembles that of Intel's deskew buffer

(IDSK) [6]. Both are modeled after a traditional delay-locked loop architecture in which the output clock is synchronized to an input clock. Likewise, both systems utilize a local controller and a delay line to accomplish the deskewing operation.

At this point, however, the similarities end. Whereas the IDSK uses an electrical clock as the reference, the Optical Deskew Buffer (ODB) presented here utilizes an optical clock for this purpose. Using an optical clock for the reference allows for further reduction in the skew arising from mismatches in the reference clock H-tree.

The two systems also differ in when the deskewing is performed. In the case of the

IDSK, deskewing only takes place during the startup sequence of the microprocessor, while the ODB performs continuous deskewing of the global clock. By deskewing continuously, the ODB is able to compensate for skew induced by process variations as well as slowly changing voltage and temperature gradients. As it is currently implemented, the IDSK can only compensate for process-induced skew, although the architecture is reportedly capable of continuous deskewing.

The remainder of this chapter will discuss the details of the architecture and operation of the optical deskew buffer.

23

Optical

Reference Clock

Photodetector

Loca Cotrlle

Global Electrical Clock Variable

Delay Circuits

Local Clock

Figure 2-1: Optical deskew buffer architecture.

2.1 System Operation

The ODB synchronizes an electrical local clock to an optical reference clock by sampling the local clock and adjusting the amount of delay added to the global clock, such that the local clock becomes matched in phase to the reference clock. The local controller directs this operation. By comparing the optical current output from the photodetector with a feedback version of the local clock, the local controller determines whether the local clock leads or lags the optical reference. The local controller then either increases or decreases the control voltage of the delay line accordingly.

When the local clock is synchronized with the optical clock, the ODB has attained lock, and the control voltage of the delay line is maintained. The local controller does not stop sampling its inputs, however. If fluctuations in voltage, temperature, or load should cause changes in the skew, the local controller issues corrections to the delay line to compensate for these variations in the phase. Therefore, the ODB employs active deskewing.

2.2 The Local Controller

The local controller integrates the functions typically performed by the phase detector and charge pump of a traditional DLL. The operation of the local controller begins

24

Optical Reference Clock

Feedback F'Phase

Local ClockL. Detector

Apier >Mrfe Latched

Apfir Apfer Comparator-, Control

Charge Control Voltage

Pump Lo

Capacitor

Figure 2-2: Local controller block diagram.

with the phase detector which continuously compares the feedback local clock to the optical current output from the photodetector. The output of the phase detector is a fully-differential small-signal voltage that is proportional to the phase difference of the input signals. This voltage is then amplified in two fully-differential offset-canceling switched-capacitor amplifiers before being latched in a comparator. The function of the comparator is to convert the amplified analog voltage representing the phase difference into a digital CMOS-level output. A block diagram of the local controller is shown in Fig. 2-2.

The phase detector outputs a voltage that indicates whether the electrical input leads or lags the optical input as illustrated in Fig. 2-3. Since the goal of the ODB is to align the respective clock edges, a lead signal should result in more delay being added to the global clock while a lag signal should reduce the delay. As shown in

Fig. 2-3 C, the phase detector considers the clocks to be in lock when they have a phase offset of 900, and thus the phase detector performs quadrature locking. This condition is a result of the characteristics of the phase detector, and does not affect the performance of the system.

Given a random phase relationship of the inputs, the probability of receiving

Optical

Reference

Current

Local

Clock

LJL'

m_

L :

7 Y

PD Output-1 --------------..--------------------------

PD Output_2 ...........................---------

Leading Lagging

--

Locked

Figure 2-3: Illustration of A. lead, B. lag, and C. locked with the corresponding phase detector output.

25

either a lead or lag signal from the phase detector is 50%. This random phase state is exactly the condition that occurs upon startup since the phases of both the electrical and optical inputs to the phase detector are unknown. Therefore, if the output of the comparator is directly used to control the charge pump, a condition could occur upon startup in which the charge pump is instructed to decrease the voltage of the loop capacitor. Of course, since the system has just started, the voltage on the capacitor is already zero, and hence, the charge pump will be unable to lower it further. With the control voltage fixed at zero, the phase of the local clock will remain the same, and hence, this condition will be terminal.

Since the problem arises because the loop capacitor is initially at a voltage of zero, the solution is to increase the control voltage regardless of the output of the phase detector. Therefore, the Local Controller always attempts to align the clocks

by adding delay to the global clock. In terms of Fig. 2-3, this means that the global clock will be continuously shifted to the right until the edge of the local clock properly aligns with the reference clock. To add this capability, a control block of digital logic is inserted between the comparator and the charge pump.

The process just described is depicted in Fig. 2-4. The top plot in the figure shows the reference clock and the local clock. At the start, the local clock slightly lags the reference. The corresponding phase detector output is shown in the second plot. Due to the relative position of the A and B outputs, the comparator issues a Down signal. The control block overrides this signal, however, and issues an Up signal to the charge pump. By time M, the global clock has been delayed sufficiently far that the outputs of the phase detector are now flipped, and hence, the commands issued by the comparator now match those of the control block. At time N, the edges of the two clocks are properly aligned, and the loop is locked.

Just as it is important to begin unconditionally charging the loop capacitor upon startup, it is also critical to know when to exit this process and begin controlling the charge pump based on the output of the comparator. Since the preprogrammed routine will eventually result in the comparator issuing an Up command, the control block should relinquish control when the comparator issues an Up followed immedi-

26

Reference

Cloc

B

...... .. .- .. ...........

0-0

M

- -

N

I ime

Time

Comparator

Signal

Signal L AAAALL AALAAAAA

Figure 2-4: Timing.

AALAAAA.A.AL.L

ately by a Down command. When this Up/Down sequence is detected, the system has entered lock, and the control voltage should be maintained at its present level.

From this point onward, the control block will forward the signals from the phase detector to the charge pump, and thus allow the system to continue adjusting the delay to compensate for future changes in the skew due to voltage, temperature, or load variations.

The second role of the control block is to decrease the acquisition time of the

ODB. Typically defined as the amount of time it takes for the system to obtain lock after starting from a known initial state, acquisition time represents idle time for the digital elements of the microprocessor since no useful computation can be performed during this period. Since microprocessors generally initiate the startup sequence infrequently, acquisition time is of less importance for PLL/DLLs found in these systems. Nevertheless, it is still desirable to minimize the acquisition time to

27

a|

0)

0

Reset I

Mode I

Acquisition

Mode

I Lock

Mode

Time

Turbo

Reset

Figure 2-5: ODB modes of operation.

some extent.

Since the Up/Down sequence from the comparator indicates that the system is approaching lock, this sequence can also be used to change the amount of charge delivered by the charge pump onto the loop capacitor. By enabling a special mode upon startup, the charge pump delivers large quantities of charge initially, which causes the loop to approach lock faster. Once in lock, smaller quantities of charge are delivered to reduce the phase noise of the system. This leads to three logical modes of operation as shown in Fig. 2-5. While Reset is asserted, the loop capacitor is grounded, thus negating the effect of either Up or Down pulses. When Reset is deasserted, the system enters acquisition mode in which the control block begins unconditionally charging the loop capacitor. During this mode, the Turbo signal is asserted so as to reduce the acquisition time by increasing the amount of charge delivered by the charge pump. When the edges of the reference clock and the local clock are properly aligned, the system enters lock mode in which the phase detector continually monitors the phase of the system, and issues corrections as necessary.

The operation of the local controller is now complete. At this point, the phase

28

detector has compared the electrical input to the optical input and the control block has operated the charge pump accordingly. The loop capacitor stores the charge from the charge pump, and serves as the link from the local controller to the delay line.

Implementation details for the loop controller will be discussed in the next chapter, while the next section will describe the operation of the delay line.

2.3 The Delay Line

Since the loop controller always issues Up commands while approaching lock, the delay line needs to be able to provide exactly one period of delay in the worst case.

To allow the system to compensate for voltage, temperature, and load variations, however, added dynamic range is needed, which is provided by adding more delay stages. So long as this dynamic range requirement is met, any type of voltagecontrolled delay line is compatible with the system. Since the ODB architecture performs clock synchronization rather than clock generation, the local clock output jitter will be at least as large as the global electrical clock input jitter. In other words, the ODB must accept the jitter generated upstream, but should strive to minimize the jitter added at this level. For this reason, the Maneatis self-biased architecture was selected for the delay elements, as well as the accompanying bias circuitry, differentialto-single-ended converter, and charge pump [13, 14]. Combined, these elements have been shown to exhibit low jitter, high power-supply rejection, and high substrate noise rejection. Implementation details of these circuits will be provided in Chapter

3.

2.4 Stability of the ODB Architecture

Delay-locked loops have been shown to be first-order systems, meaning that they are generally stable by design. Since the ODB is based on a DLL-type architecture, stability is not a significant issue. Nevertheless, stability problems can occur in DLLs if the control voltage of the delay line is updated without knowing the results of the

29

previous correction. Designing the system such that it waits longer before sampling the inputs and issuing corrections is commonly referred to as "slowing down the loop," and can generally be used to avoid stability problems in DLLs.

Due to the characteristics of the phase detector in the local controller, it is important to limit the rate at which corrections to the control voltage are administered.

While the phase detector continually compares the local clock to the reference clock, it cannot respond instantly to changes in phase. Therefore, the phase detector must be given adequate time such that its outputs reflect the phase relationship of its inputs.

This is accomplished by having the amplifiers and comparator sample the outputs of the phase detector at a rate much lower than the global clock.

The slower sampling rate is also necessary to insure that the amplifiers have time to settle between each stage of the sampling. Specifically, the outputs of the first amplifier must have settled before the second amplifier samples its inputs. Likewise, in order for the comparator to accurately latch the input, the outputs of the second amplifier must have first settled. Finally, adequate time must be allocated for the amplifiers to reset before the next sampling cycle begins.

30

Chapter 3

Circuit Implementation of the

Optical Deskew Buffer

This chapter will describe the circuit implementation of the optical deskew buffer.

The target process for the design was the TSMC .18 pm Logic process, as available through MOSIS. Discussion will follow the pattern established in the previous chapter; it will start with the local controller and proceed to the delay line. The chapter will conclude with a description of auxiliary circuit blocks not fundamental to the ODB itself, but useful during the test chip implementation.

3.1 Circuits of the Local Controller

The local controller consists of five discrete circuit blocks, namely the phase detector, amplifiers, latched comparator, control, and the charge pump. Although many of these blocks contain analog circuitry, they require precise synchronization to function properly. The following subsections will present the circuitry of each block, as well as a timing diagram of its operation. An overall timing diagram of the entire local controller controller will then be presented to explain the interaction of the various blocks.

31

3.1.1 The Phase Detector

Schematic design of the phase detector was completed by Paul-Peter Sotiriadis. For this thesis, this circuitry will be considered a blackbox which accepts a pair of nonoverlapping differential electrical clocks and one optical reference clock in the form of current pulses. As shown in Fig. 3-1, it produces a pair of small-signal differential signals proportional to the phase difference of the inputs. The common-mode output of the phase detector is proportional to the magnitude of the optical current input

by a 10:1 ratio. With optical current of ~2 pA, a common-mode output of ~20 mV is expected. Layout of the device was completed as part of this thesis.

3.1.2 Amplifiers

Since the outputs of the phase detector are small-signal, significant amplification is necessary before the signals can be latched in the comparator. The amplifiers were specifically designed to exhibit minimum input offset voltage, high power-supply rejection, high common-mode rejection, and good noise performance. By using smaller transistor sizes, input capacitance of the amplifiers was minimized in order to match the drive capabilities of the phase detector. The frequency response of the amplifiers was not a design concern since the inputs are basically DC signals. Settling time, however, was considered, since the outputs of the system should be stable before the comparator latches. This was a flexible design specification since the sampling rate of the comparator can be slowed to accommodate the settling time of the amplifiers.

Nonetheless, a settling time of 20 ns was targeted for the amplifier.

Optical Reference Current

Electrical Clock

(Differential)

Phase

Detector

Output

(Differential)

Figure 3-1: Block diagram of phase detector.

32

Clk

InA

Vdd

CMControlW

L =.;It

Clk

Clkc~

.

pF

Vdd

W.2=72

W=.72

1=360

W.

OutA vb2

L =36 W=4 b3

W.

L .b3 W.

W72

=.38

2

2 pF

Vdd

OutB

=.36

=18

--

InB

M

CIkc

Ck

CMControl

vb2

W=3W.

L =36 =.36V

I

I

Vdd

AF=10.0

vb3

W22H

I

V~

-

=.3

W=.3

WW vb2

Figure 3-2: Fully-differential offset-cancelling switched capacitor amplifier.

Ck

CI kc

InA

InB

OutA_

OutB

-- --

----------

----

Figure 3-3: Timing diagram of the switched-capacitor amplifier.

33

A cascade of fully-differential offset-canceling switched capacitor amplifiers was implemented to meet these requirements. Each stage was designed to provide a gain of A, 1 10, such that when cascaded, the overall gain would be A, ~ 100. The fully-differential topology of the amplifiers was selected for its good common-mode noise rejection. A switched-capacitor offset-cancellation scheme was employed to minimize the input offset voltage. To reduce thermal noise, PMOS inputs were used for the amplifier, at the expense of a lower gm. To reduce the output impedance, and therefore minimize the charge-injected coupling from one stage to the next, an output buffer was added to the design. The schematic of a single amplifier is shown in Fig. 3-2

Switched-capacitor amplifiers are derived from sample-and-hold circuits and therefore, require precise synchronization to insure proper operation. Fig. 3-3 shows a basic timing diagram for the switched-capacitor amplifier. When Clkc is asserted, the amplifier is in the reset state. At this time, nodes ni and n2 are connected to the common mode input voltage while nodes n3 and n4 are shorted to the output. This feedback configuration allows the amplifier to adjust the voltage of nodes n3 and n4 such that current through each side is symmetric and the outputs are matched. Once this balanced state has been attained, the resulting voltages at nodes n3 and n4 are equal to the offset voltage of the amplifier. When Clkc is deasserted, the common mode input is disconnected and the offset voltage is "stored" on the capacitor. Likewise, when Clk is asserted, nodes n1 and n2 are connected to the source signals, which in turn causes nodes n3 and n4 to respond proportionally. The amplifier is now in the amplify state, and the outputs adjust to reflect the relative level of the inputs.

3.1.3 Latched Comparator

By detecting which of its two inputs has a higher voltage, the latched comparator, shown in Fig. 3-4, acts as a sense-amplifier followed by an RS-latch. The circuit was designed to have minimum input capacitance, low input offset, and a reasonable settling time. Ideally, a latch with no offset voltage would be preferred.

The circuit operates in one of two states, pre-charge and sample, and is controlled

34

ClockCompA

L =.36 L =.36

Vdd

~T

L =.36 L =.36

,,ClockCompA

R

S

InA

W.4"4

CompU

Comp D

ClockComp

1

.4=.80

Figure 3-4: Latched comparator.

by two clocks. A timing diagram of the circuit is shown in Fig. 3-5. During the precharge state, ClockComp and Clock-CompA are low, and nodes R and S are precharged to Vdd. When ClockComp and Clock-Comp-A transition high, the latch enters the sample state. Depending on which of the inputs is stronger, either node

R or S will be pulled down. The positive feedback provided by the cross-coupled transistors insures a fast execution of this event. Since nodes R and S are connected to the inputs of an RS-latch, the downward going transition on node R or S has the

CIk_CompA

ClkComp

InA

InB

S

CompD

Figure 3-5: Timing diagram of the latched comparator.

35

effect of setting or resetting the latch. These values are then stored until the inputs of the comparator change, even when the comparator enters the precharge state.

3.1.4 Control Block

The control block is designed to accept the inputs from the latched comparator and produce the Up and Down pulses for the charge pump. Due to the characteristics of the system described in Chapter 2, the control block also regulates whether the system is in the acquisition or locked mode of operation, and thus whether the outputs of the latched comparator are used to control the charge pump.

-- CompU Y

)p

B s

D

Q

~leDFF

Reset

->

CK

QN--

Vdd

CompU

B

Control

Pulse

Reset

NRS

C

UPC

Up

-CompD

A

Control

Down

Downc

Pulse

Clkln ub

Figure 3-6: Control block.

The control block, shown logically in Fig. 3-6, uses a few simple storage elements to track the outputs of the latched comparator, and thereby determine the mode of operation. Specifically, the control block looks for the latched comparator to issue an

Up signal followed immediately by a Down signal. (Recall that the latched comparator has differential outputs, such that exactly one of the Comp U or CompD outputs is always asserted, indicating the intended signal.) The first D-type flip-flop (DFF)

36

of the upper path stores the output of the Up signal from the comparator. When the output of this DFF and the current Down signal are both asserted, the desired

Up/Down sequence has been detected and the system changes from the acquisition to the locked state. The RS flip-flop then stores this state and outputs the appropriate

Control signal. The Turbo signal is easily generated as well.

The second path of logic generates the control pulses used to direct the operation of the charge pump. The top multiplexor and its ensuring logic generate the Up signal and its complement, Upc. If the system is in the acquisition state, the Control signal directs the multiplexor to sample a logical Hi

(VDD). Therefore, when the

Pulse signal arrives at the NAND gate, an Up pulse is issued to the charge pump.

Conversely, when the system is in the locked state, the Control signal causes the output of the comparator to be sampled when the Pulse signal arrives. The lower multiplexor performs the same operation except that it accepts the CompD signal from the comparator as well as a logical Lo (GND), and outputs the Down and Downc signals to the charge pump.

The final path of logic in the block accepts the system control clock, which is the same clock as used by the first amplifier, and outputs a short Pulse signal used by the

DFFs and the NAND gates within the block. The width of the pulse is determined

by the number of inverters preceding the NAND gate, and is currently set to produce a pulse of -200 ps. Since the Up and Down pulses delivered to the charge pump are derived from this pulse, the charge delivered by the charge pump will be directly proportional to its width.

3.1.5 Charge Pump

The charge pump is based on the offset-cancelled charge pump published by Maneatis and is shown in Fig. 3-7 [13, 14]. When an Up pulse is received, node n1 is momentarily pulled down, which causes a burst of charge to be deposited on the loop capacitor.

Conversely, when a Down pulse is received, charge is removed from the loop capacitor.

Since the charge pump and the delay line interact closely, a feedback signal, VBn, is used to dynamically adjust the amount of charge transferred based on the state

37

of the delay line. This helps to insure that each Up or Down pulse causes an equal amount of delay to be added or subtracted from the global clock. Therefore, as VBn decreases, the charge transferred by the circuit during each pulse decreases, but the relative adjustment in delay is the same.

Vdd

Up

Turbo

UPC

VBn

Downc

VBn

Down

Turbo

Reset

Control

Loop

Capacitor

Figure 3-7: Charge pump schematic.

The topology differs from the Maneatis version in that two extra transistors have been added to provide a Turbo mode of operation. In this mode, the charge pump delivers more charge per pulse to the loop capacitor, thus decreasing system acquisition time. Additionally, a path to ground was added to provide a means of discharging the loop capacitor during testing.

3.1.6 Local Controller Synchronization

The preceding sections have presented the various circuits of the local controller as well as timing diagrams of the signals needed for their operation. To function properly, however, each of these blocks must be precisely synchronized with the others. A timing diagram showing the complete operation of the local controller is provided in

Fig. 3-8.

The sequence of events begins when the first amplifier samples the outputs of the phase detector. While the first amplifier is holding its outputs, the second amplifier samples and further amplifies these signals. Finally, after the second amplifiers' outputs have settled, the comparator latches and the amplified output of the phase

38

rOl- T/4 -

4~ .

-

---

ClkAmp_

CtkAmp_2

ClkCompA

Clk_Comp

PhaseDtecor

O

80P r

O~pts...

....

....... ..

CornpU

CompD

UP

DN

Control

Voltage

(Analog)

F

_ L

............ ............

--

-- s-

-

Figure 3-8: Amplifier and comparator timing diagram.

detector is stored as Comp U and CompD. As the figure shows, Comp U and CompD only transition during cycles when the outputs of the phase detector change. At all other times, their values remain constant. The rising edge of the first amplifier clock also causes the control block to issue a correction to the charge pump. Depending on whether this pulse is an Up or a Down, the charge pump either delivers or removes charge from the loop capacitor, causing its voltage to rise or fall.

Six clocks are used to synchronize the operation of the local controller. Each of the amplifiers require two differential non-overlapping clocks and the comparator requires two clocks, one being a slightly advanced version of the other. Since the amplifiers, comparator, and charge pump are designed to operate in a nested fashion, the phase and duty-cycle of the clocks is critical. While the phase and duty-cycle of each clock vary, the frequency of the clocks are identical and is equal to the feedback rate of the system. The feedback rate controls the rate at which the local controller makes corrections to the amount of delay being added to the global clock. As implemented, the feedback rate is set to 1:512 meaning that the local controller updates the control voltage of the delay line once for every 512 cycles of the global clock.

The clock generation block used to create these six clocks is shown in Fig. 3-9.

39

Figure 3-9: Clock generation block.

The top row of DFFs is used to divide the global clock. For simplicity, only four

DFFs are shown, although the actual implementation would require nine to produce the 1:512 feedback ratio. The block labeled CS in the schematic is a clock-separation block based on an RS flip-flop. It is used to generate non-overlapping clocks from purely differential clocks. Essentially, it "separates" the clock pulses in time, and thus is referred to as a clock-separator. These non-overlapping clocks prevent glitches from occuring during the various logical operations performed in the block. The output clocks of the block are shown in the timing diagram of Fig. 3-8.

3.2 Circuits of the Delay Line

The delay line and supporting circuits are based on the self-biased architecture published by Maneatis [13, 14]. This includes the bias circuit, delay elements, and differential-to-single-ended converter as shown in the block diagram of Fig. 3-10.

As with the local controller, the layout of the delay line employed spherical commoncentroid layout wherever possible in order to increase matching. The self-biased topology of the Maneatis architecture increases the symmetry of the design, and therefore is quite amenable to this type of layout. The following sections will discuss the topology as it pertains to the 0DB, with particular attention devoted to any modifications made.

40

ConrolBn, Bp

Global Clock

(Differential)

10 Delay Line

Differential-to Local Clock

Single Ended

Converter

Figure 3-10: Delay circuits block diagram.

3.2.1 Bias Generator

The bias generator was implemented exactly as described by Maneatis and is shown below in Fig. 3-11 [13, 14]. This block receives the control voltage from the loop capacitor as its input, and generates two bias voltages, VBp and VBn, as outputs.

The VBp output is a buffered version of the control voltage, and typically tracks it over the operating region of the circuit. VBn is basically the complement of VBp since it inversely tracks the control voltage. Since the control voltage of the system always begins at zero, the reset signal is asserted upon startup to force the circuit into a known state.

OntI

Vdd

VBp

Vdd

Reset VBn

Figure 3-11: Delay Line bias generator.

41

3.2.2 Delay Elements

Maneatis delay elements were implemented in the delay line [13, 141. As mentioned in

Chapter 2, the architecture requires the delay line to have a minimum of one period of delay across all operating regions to allow the system to acquire lock, plus added delay to enable tracking of voltage, temperature, and load fluctuations. To ensure adequate dynamic range, the delay line was designed to provide about two periods of delay at operating frequencies ranging from 100 MHz to 500 MHz at all process corners. This required an array of fourteen delay elements as illustrated in Fig. 3-12.

Figure 3-12: Delay line.

The dynamic range of the delay line across process corners is summarized in Table

3.1. These measurements were made by sweeping the control voltage entering the bias generator and observing the change in delay across the 14-stage delay line. As shown, the delay line provides at least two periods of delay across all process corners, and thus meets the design requirements. At the extreme corners, where both NMOS and

PMOS devices are either fast or slow, the elements have additional delay range.

The schematic of a single delay element is shown in Fig. 3-13. The elements uses symetrric loads composed of two PMOS transistors, one of which is diode-connected.

This helps linearize the delay vs. control voltage characteristic of the element. Due to the self-biased architecture of the entire delay line circuits, considerable symmetry

Table 3.1: Delay line dynamic range at 100 MHz, 1.8 V, and 25 'C.

Corner Delay Periods of Delay

TT 20 ns

SS 28 ns

FF 21 ns

SF 20 ns

FS 20 ns

2.0

2.8

2.1

2.0

2.0

42

exists between the delay elements and the bias generator. By including a replica of a delay element in the bias generator, the control voltages produced by the bias generator are better matched to the operating point of the delay elements.

Vdd

W=1.25 W=1.25

OutB

A F6VBp

L .6OutA

mA0

L =36 L =.36

Figure 3-13: Delay element.

A plot of delay vs. control voltage is shown in Fig 3-14. While the control voltage is below .4 V, the input clock is delayed by a fixed amount which is the intrinsic propagation time of the delay line. Above this level, incremental delay is added as the control voltage increases. Once the voltage reaches 1.2 V, however, the delay line fails and the end of the dynamic range is reached. As the plot shows, delay is a nonlinear function of control voltage, meaning that incremental increases in the control voltage do not result in uniform increases in total delay. Indeed, when the control votlage is around 1.0 V, the system is operating in the high-gain region, meaning that very small increases in voltage produce very large amounts of delay to be added to the global clock.

The Maneatis self-biased architecture compensates for this non-linearity through the use of feedback between the delay line bias generator and the charge pump. By modulating the tail current source of the charge pump with the VBn output of the bias generator, the charge pump is able to deliver smaller quantities of charge to the

43

2-

.1.5-

Ca,

1-

0

.

-

0.5-

0

0 0.2 0.4 0.6 0.8

Control voltage (V]

1 1.2

Figure 3-14: Normalized delay vs. control voltage.

1.4

loop capacitor when the delay line is in the high-gain region. Since the charge pump is ultimately operated by administering either Up or Down pulses to it, the delay vs.

Up pulses relationship of Fig. 3-15 is a better performance metric of the delay line.

To generate this plot, a simulation of the charge pump, bias generator, and delay line was run. Beginning with the system in the reset state, the charge pump was directed to administer continuous Up pulses at a rate of 20 MHz. The rising edge of the local clock exiting the delay line was then measured. By normalizing the delay of the local clock to the its period (5 ns), the delay vs. Up pulse relationship was determined. As the figure shows, this relationship is highly linear over most of the operating range.

3.2.3 Differential-to-Single-Ended Converter

Since the outputs of the delay line are not full-swing waveforms, some form of amplification is necessary. One possibility would be to simply amplify one of the outputs, but

44

2.5-

2-

-~ 1.5-

0 to

0

0.5

0

-0.5

1II

0 200 400 600 800

I

1000

UP pulses

I

1200 1400 1600 1800 2000

Figure 3-15: Normalized delay vs. Up pulses received by the charge pump.

this would result in both duty-cycle distortion and skew. To avoid this, a differentialto-single-ended converter published by Maneatis is used to generate a single-ended full-swing waveform [13, 14]. A schematic of the block is shown in Fig. 3-16. The circuit consists of two differential amplifiers that accept the low-swing inputs followed

by two common-source amplifiers connected by a current mirror. A pair of buffers at the end produces a final full-swing clock output. At this point, a buffer chain would typically be inserted to increase the drive capability of the signal. For the purposes of this research, those blocks were deemed unnecessary and the outputs of the delay line were directly used to generate the feedback clock that is sampled by the phase detector. The additional buffers required to drive a large clock network would simply add further latency within the feedback loop.

For proper operation of the phase detector, the electrical feedback clock inputs must be both complementary and non-overlapping. To facilitate this second requirement, a modified SR-latch was inserted between the converter and the phase detector.

45

Vdd mA

VBn

B

__

Out

Outc

VBn

Figure 3-16: Differential-to-single-ended converter.

This latch served to insure a uniform non-overlap period between each pulse and its complement. Since this block required complementary inputs, the differential-tosingle-ended converter was designed to provide complementary outputs as shown in the figure.

3.3 Local Controller and Delay Circuits Interaction

The steady state phase error of the system is a function of the step size of the charge pump, the delay vs. pulse signal characteristic of the delay line, the size of the loop capacitor, and the feedback ratio of the system. Since phase error must be accounted for as clock uncertainty in the timing budget, it is desirable to minimize or eliminate phase error. In the system implemented, the charge pump continues to administer both Up and Down pulses to the loop capacitor even after the system is in lock. If each Up and Down pulse resulted in symmetric changes in the control voltage, every pulse would exactly cancel the previous one, and the control voltage would oscillate between two constant voltages. This would cause the delay line to alternately add and subtract an incremental amount of delay. In this situation, the phase error would be exactly equal to the incremental delay of the delay line.

46

Equilibrium Level of Control Voltage.--..--...... ------......-----.......----

Pulse Sequence U D U D U D D U D U D U D D

Figure 3-17: Control voltage of ODB while in lock.

13 States

Perfect matching of Up and Down pulses of the charge pump is not possible for the design implemented. With even the slightest mismatch, the charge pump will still alternate between Up and Down pulses, but instead of each pulse exactly cancelling the previous one, a residue will now be left behind, thus causing the system to slowly drift away from the equilibrium point. Once the system is a full step away from equilibrium, the charge pump administers two consecutive pulses of the same direction to bring the system back to equilibrium. In this scenario, shown in Fig. 3-17, the system can drift as far as one step out of lock in each direction, such that its range of control voltage is equal to three steps. Therefore, the system implemented will, at best, experience a phase error equal to the phase associated with two quantization steps of the charge pump. Given a loop capacitor of 2 pF with a control voltage of

800 mV, one Up pulse results in a .9 mV increase in the control voltage, while each

Down pulse produces a .5 mV decrease in the control voltage. Therefore, the control voltage should fluctuate within a 2 mV range due to the mismatch in Up and Down pulses. From the delay vs. control voltage characteristic of the delay line presented in

Fig. 3-14, a voltage change of 2 mV (at 800 mV control voltage) yields an incremental delay of ~6 ps. As a result, the local clock output from the delay line can be expected to experience jitter of 6 ps due to quantization noise on the loop capacitor.

The phase error also depends on the feedback rate of the system. This affects how often the local controller updates the control voltage, and thus, how often the delay line adjusts the delay added to the global clock. As the feedback rate of the system is increased, stability declines. This causes the control voltage to oscillate around the equilibrium level by more than the inherent two quantization steps as previously

47

described. These larger oscillations result in increased phase error since the delay line is continually adding and then subtracting larger amounts of delay from the global clock. Fundamentally, this occurs because corrections to the phase do not have time to fully propagate through delay line to the phase detector before the next correction is applied. This causes the local controller to issue the next correction before knowing the effect of the last correction. This problem can be alleviated by simply slowing down the rate at which the loop controller updates the control voltage. This does not come without a price, however, as slower updates to the control voltage mean that the acquisition time, or the time it takes for the loop to reach lock, will increase.

Whereas a longer acquisition time hinders the performance of a microprocessor at startup, phase error degrades performance during normal operation by reducing the useful portion of each clock cycle. For this reason, the implementation of the ODB architecture attempts to minimize phase error at the expense of increased acquisition time. The feedback rate of the ODB system is controlled by the clock generation block which, through a variety of logical operations, generates the four amplifier clocks as well as the two comparator clocks from the global clock. (The clock controlling the charge pump pulses is the same as the first amplifier clock.) To decrease the feedback rate, the global clock is further divided upon entering the clock generation block, such that all of the ODB system clocks run slower. Through simulation, this proper feedback rate was determined to be 1:512, meaning that the local controller updates once every 512 cycles of the global clock. At this feedback rate, the system is stable, and thus all phase error is contributed by quantization effects of the charge pump.

3.4 Auxiliary Components

The components described in this section are not fundamental to the ODB architecture. They were designed and implemented in order to better test the design and inclusion of them here is for completeness.

48

3.4.1 Current Pulse Generator

Since the photodetector was expected to have a maximum operating frequency of

~200 MHz, a higher-speed alternative to stimulating the phase detector was sought.

To provide this, a current pulse generator was designed. The current pulse generator uses a basic charge pump topology as shown in Fig. 3-18. By varying the frequency of the clock input, current pulses can be generated at frequencies up to 1 GHz. The current pulses have a 50% duty-cycle and a peak amplitude of approximately 10 pA.

Vdd

Out

Ck

Figure 3-18: Current pulse generator.

3.4.2 Ring Oscillator

Since even high-performance I/O pads typically have a bandwidth of less than 200

MHz, most modern microprocessor designs rely on on-chip phase-locked loops (PLLs) to multiply the incoming clock up to the GHz range. In order to avoid designing a

PLL for the test chip, a ring oscillator operated in open-loop configuration was used to generate on-chip clocks at frequencies above 200 MHz. In order to leverage previously designed components, the ring oscillator consisted of three delay elements, as well as the accompanying bias circuitry. The control voltage to the bias generator was input from off-chip, and a frequency-divided version of the output was ported off-chip. By varying the control voltage, the frequency of the ring oscillator, and therefore the output clock, could be adjusted. A block diagram of this device is shown in Fig.

49

ControlBn, Bp

2

Delay Delael aylay

2

Figure 3-19: Ring oscillator.

Differential-to

Single Ended

Converter

Clock

Differential Clocks

3-19.

A second similar block was designed in which the outputs of the ring oscillator were connected to the inputs of a delay line, the control voltage of which was input from off-chip. The outputs of this block consisted of two sets of clocks, one from the output of the ring oscillator, and one from the outputs of the delay line. By varying the control voltage of the ring oscillator, the frequency of both sets of clock could be modified. Similarly, by varying the control voltage of the delay line, the skew between the two sets of clocks could be adjusted. A block diagram of this architecture is shown in Fig. 3-20. This block was included on the test chip to simulate the random and variable skew between the optical input (simulated with the current pulse generator) and the global clock.

The electrical global clock and the optical reference clock must be derived from the same source to insure that they have precisely the same frequency. Therefore, the ring oscillator and delay line block can only be used with the current pulse generator,

DL Control

RO Control Sk E atn

Ring

Oscillator

RO Clock ON

Figure 3-20: Ring oscillator and delay line.

50

not the photodetectors.

3.4.3 XOR Phase Detector

To determine whether the ODB has locked the output local clock to the input global clock, it is useful to know how the relative phase between the two clocks is changing.

By connecting both clocks to an XOR-based phase detector, the change in relative phase can be determined. Since the output of the phase detector is only a DC voltage, the exact phase difference between the two signals cannot be known. Moreover, since the output of the phase detector is unlikely to be linearly related to the phase difference of the inputs, comparisons between various output voltages (and hence phase offsets) cannot be reliable measured. Despite these limitations, the XOR-based phase detector can be used to determine whether the phase difference between the input signals is changing with time, or whether the phase has settled to a constant value. A schematic of the phase detector is shown in Fig. 3-21. A second device for measuring skew was also developed as part of this thesis and will be presented in

Chapter 5.

Figure 3-21: XOR phase detector

51

52

Chapter 4

Optoelectronics

Photodetectors are the bridge between optics and electronics, and therefore represent the fundamental component of all optical clock distribution architectures. In the most basic sense, photodetectors receive light in the form of photons, and convert it to electricity in the form of electrons. The speed and efficiency of this conversion define the quality of the photodetector and therefore its range of applications. Although extensive research is ongoing in this area, this thesis was concerned with selecting the best known photodetector for the given process rather than contributing to the field.

This chapter will review the various types of photodetectors and their applications.

It will then describe the photodetector selected for implementation on the test chip and provide details on its design.

4.1 Background

Photodetectors can be categorized by structure, orientation, and by the type of semiconductor from which they are made. Two common structures include the PIN and the Metal-Semiconductor-Metal, although only the PIN will be further discussed.

PIN photodectectors are composed of a diode-like structure in which heavily doped

p+ and n+ regions are separated by a region of intrinsic material [15]. This PIN structure has two general orientations, either lateral or vertical. In a vertical PIN, the incident photons must travel through either the p+ or n+ region before reaching

53

the intrinsic region. Conversely, in a lateral-PIN the illuminating source is located perpendicular to the plane of the intrinsic region and thus the photons reach the intrinsic region directly.

Whereas the orientation influences the processing steps involved in fabricating the device, the material from which the photodetector is made impacts the performance.

Two performance metrics of importance are responsivity and bandwidth. The responsivity measures how many photons must be received to create one electron, and thus can be thought of as a mesure of efficiency with units of Watts

/

Amps. Bandwidth measures the operating frequency of the device, and is largely influence by how fast carriers in a particular material can move to the appropriate terminal, either through drift or diffusion processes.

Research has shown that the low absorption coefficient of silicon forces designers to trade responsivity for bandwidth, thus making silicon a less than optimal candidate for optoelectronics [16, 17]. For this reason, much effort has been devoted to developing devices in GaAs, SiGe, or InP [18, 19]. Structures fabricated in these materials promise both higher responsivity and greater bandwidths. Since the cost of producing chips in these processes is significantly greater than those composed of silicon, recent research has attempted to bond these structures onto silicon substrates. This will combine the higher-speed devices with the proven silicon process.

4.2 Lateral-PIN Photodetectors

A lateral-PIN photodetector in an N-well was selected for implementation on the test chip. The primary motivation for choosing this topology was its compatibility with standard CMOS processes and its simplicity of design [17]. Unlike the other topologies considered, the lateral-PIN photodetector can be fabricated completely in silicon, and therefore requires no post-processing. This reduction in cost and complexity does not come without a price, however, since this photodetector is known to have much lower performance than other designs.

The lateral-PIN photodetector is constructed in an n-well on a p-type substrate,

54

\\\

Optical Current Output

Reverse

+

Bias _E

nweIl p-substrate

Figure 4-1: Cross-sectional structure of a typical lateral-PIN photodetector.

as shown in 4-1. Two highly doped regions are implanted into the n-well, which, along with the intrinsic nature of the n-well, form the PIN structure. The n+ region is connected to a voltage source, such that the n-well is now biased at a positive potential. This has the effect of reverse-biasing the diode formed by the n-well to p-substrate interface. Likewise, the p+ region is connected to the output of the photodetector, and is assumed to have very low potential. With the n+ region connected to a positive supply and the p+ region virtually grounded, an electric field is setup between these nodes. This field serves to deplete the region between the implants of carriers, thus creating the depletion region.

When the intrinsic region of the photodetector is illuminated with light of the proper wavelength, electron-hole pairs (EHPs) are generated in the depletion region from the energy of the incoming photons. The EHPs are then separated by the electric field, such that the electrons and holes are swept towards the n+ and p+ regions, respectively, thus creating an electric current. If EHPs are generated in the neutral n+ and p+ regions, the carriers must diffuse, rather than drift, to the appropriate junction. Since diffusion is a much slower process than drift, it is desirable to have all EHP generation take place within the depletion region. Assuming that this is the case, the maximum operating frequency of the photodetector is proportional to the time that it takes these carriers to drift out of the depletion region. To insure that

EHPs are primarily generated in the depletion region, either the depletion region can

55

be made large compared to the n+ and p+ regions, or the n+ and p+ regions can be covered with an opaque material. Since the time it takes carriers to drift across the depletion region is proportional to the width of this region, the protective covering approach is typically used so as not to limit the speed of the device.

4.3 Implementation

Since the laser source illuminating the device tends to produce a circular pattern, it is desirable for the photodetector to have an aspect ratio of approximately 1:1.

Furthermore, to produce an output current of about 10 uA, the device was designed to occupy an area of nearly 10,000 pm

2

. To meet these specifications while still maintaining minimum-sized depletion regions, the photodetector was layed out as a series of interdigiated fingers, each 95 pm in length. Each finger contained the

.5 um .5 um

-* -0- -4-0-

95 UM

.25 um

Figure 4-2: Layout of a single finger of the photodetector.

56

Reverse +

Bias

9 5 urn'I

-4-----------

85 um

----

n-well

-

I '

Optical Current Output

Figure 4-3: Layout of the complete photodetector.

complete PIN structure such that multiple fingers could be joined by abutment. The layout of a single finger is shown in Fig. 4-2. The widths of the n+, p+, and intrinsic regions were kept as small as possible without violating the design rules. The complete photodetector consisted of 58 abutted fingers and is shown in Fig. 4-3.

To prevent spurious EHPs from being generated elsewhere on the wafer and interfering with the analog circuitry, a protective metal covering was placed over the entire die except for the pads. Windows were then designed into this covering to allow for optical stimulation of the photodetector. The protective covering was comprised of overlapping strips of metal 5 and metal 6. These strips were then connected together to form a single net which was grounded. A cross-section of overall structure showing the abutment of individual fingers and the protective metal covering is shown in Fig.

4-4.

57

M6II

Oxide

Mi Mi M1

_ qW_4.....0Oxide9 Oxide W Oxi de W .....

61z6

Oxide

M1

W Oxide

M

Oxide p-substrate

Figure 4-4: Cross-sectional structure of the implemented photodetector.

Six additional photodetectors were included around the periphery of the chip for characterization. These devices had separate pins for their optical current outputs, such that each could be tested independently of the others. The n+ nodes of these devices were connected, however, such that they share a common pin for reversebiasing. A summary of the photodetector design is shown in Table 4.1.

Table 4.1: Photodetector design summary

Type

Finger length

Finger width

P+ width

N+ width

Intrinsic region width

Number of fingers

Intrinsic area per finger

Total intrinsic area

E evice arean

Expected current

Lateral-PIN

93.95 [tim

1.48 pm

.58 g m

.48 pm

.23 pum

58

44.6 pIM 2

2 2587 pIM

0300

10 pmi

/A

58

Chapter 5

On-Chip Measurement of System

Performance

The primary contribution of this thesis has been the development and implementation of the Optical Deskew Buffer architecture for optical clock distribution. This system has the potential to reduce the skew and jitter associated with distributing a traditional electrical clock. While the preceding chapters have presented the architecture and circuits of the ODB, this chapter will examine methods of measuring its performance.

Performance measurements of clock distribution architectures are difficult to obtain. It is not feasible to port the clock signal off-chip for direct measurement due to the high frequency of the clock and the limited bandwidth of the I/O pads. Furthermore, once the clock has been successively divided down, its phase information may no longer be accurate. For these reasons, it is necessary to explore means of measuring the performance of the clock with on-chip circuitry.

The remainder of this chapter will discuss the various architectures for on-chip clock performance measurement before presenting the implementation of the one best suited to this application.

59

Clock_A

Clock_B

I I

T = 8 ns+

i L F LI

JJI

IJ

Skew

=

1 ns

Figure 5-1: Illustration of skew.

5.1 Background

In this research, the time period to be measured is defined as the phase difference, or skew, between two clocks as shown in Fig. 5-1. In one instance, the two clocks consist of the outputs of adjacent Optical Deskew Buffers, such that the phase difference between them corresponds to the relative skew that would exist in neighboring clock domains of a microprocessor. Ideally, the skew in this configuration would be zero across all operating conditions, a condition indicating that both systems are in lock with the skewless reference clock and in lock with each other. The other set of clocks to be measured are within one of the characterization instances of the architecture. In this setup, the skew to be measured is the difference between the input to the current pulse circuit and the local clock output of the ODB. This phase difference represents the characteristic skew between the optical reference and the electrical local clock.

Although this skew will not be zero, it is expected to be finite and constant across all regions of operation.

In both cases described, the primary purpose of measuring the skew is to confirm that the loop is locked, while determining the exact amount of skew is of slightly less importance. Therefore, the measurement device need not have extremely high resolution. Furthermore, since the clocks to be measured operate between 100-500 MHz, a large dynamic range was also unnecessary. Without the need for high-resolution or wide dynamic range, the primary objective was to minimize complexity and thus decrease the design time.

60

5.2 Previous Work

A variety of architectures for sub-nanosecond time measurement have been proposed.

Despite the large number of variations, most designs can generally be separated into two basic categories, time-to-voltage converters (TVCs) and time-to-digital converters (TDCs). These architectures tend to make trade-offs between three parameters, namely dynamic range, sampling resolution, and design simplicity.

5.2.1 Time-to-Voltage Converters

TVCs consist of a pair of switches that regulate the flow of current onto a capacitor, and hence produce a voltage that is proportional to the sampling period of the signal

[20]. Through proper selection of the integrating current and the size of the capacitor, these converters can be designed with large dynamic range. The resolution, however, tends to be limited by the transition times of the switches. The analog nature of the circuitry involved increases the complexity.

5.2.2 Time-to-Digital Converters

Time-to-digital converters (TDCs) have received considerable research in recent years due to the finer resolution offered by these devices [21, 22, 23, 24, 25, 26]. TDCs generally consist of an array of registers that sample the input waveform at closely spaced clock intervals. By comparing the outputs of successive registers, the relative time of the input transition can be observed. Since the resolution of the TDC is set

by the spacing of the sampling phases, arbitrarily low-resolutions can theoretically be achieved through the use of delay-locked loops and interpolation schemes. As the resolution increases, however, the dynamic range decreases for a given number of sampling elements.

61

5.3 Overview of the Time-to-Digital Converter

A time-to-digital converter was developed to meet the design criteria. A simplified block diagram is shown in Fig. 5-2. The converter was designed to accept one of three pairs (A,B) of clock sources and to produce two thermometer-coded outputs representing the relative skew between the signals. These thermometer codes appear as a series of O's followed by a series of 1's, with the boundary representing the relative time of the transition on the source signal. The converter uses an external reference clock to both sample the source signals and to shift the resulting data out of the device. The converter also requires two select signals to operate the multiplexors which allow the converter to switch among the three sets of source signal pairs.

RefClock

Soc

_

B Y

Outpu

_oc-1

OutpuL2.

Select 1,

Figure 5-2: TDC overview.

The TDC allows the skew between the two signals to be measured by first quantizing and subsequently magnifying the time difference between the rising edges of the two signals. As the timing diagram of 5-3 shows, the two signals initially have a skew relative to each other denoted as Skews. During the sampling operation, the TDC quantizes this skew in the form of a thermometer code. In the process of shifting the code off-chip, the time difference between the edges is magnified. The resulting skew

62

Skew,

Source_A

Source_B

OutputA

OutputB

RefClock

.

......

Skew

0

Figure 5-3: TDC simplified timing diagram.

of the output signals is denoted as Skew,. The output skew has been magnified by a factor proportional to the reference clock period and inversely proportional to the sampling period (quantization factor) of the sampling elements.

.Skewo

Magni f cation -

Taef

Skewi Ts.,jie

(5.1)

With knowledge of the magnification factor, the skew between the source signals can be calculated once the skew between the output signals has been measured. Since the magnification factor is dictated by the implementation, it will be discussed in the next section.

5.4 Time-to-Digital Converter Implementation

The time-to-digital converter consists of three primary blocks, namely the delay line, sampling elements, and shift elements, as well as a small control block. After being triggered by the delay line, each block of sampling elements captures the transition of its source signal in the form of a thermometer code. Once the transition is stored, the shift elements slowly transfer the data off-chip, one bit at a time. As shown in Fig.

5-4, the TDC has a high degree of symmetry since both source inputs use identical

63

Source A

RefClock

SourceB

Shift Relements (DFFs)

OutputA

- WII f t t

11-

Sam piing Elements (DF s) u1 control r

Delay Line

S ..................

+ MacF

.. II L............

Shift Relements (DFFs)

Output B

Figure 5-4: TDC block diagram.

circuitry. For simplification, further discussion will concentrate on only one source, although the same analysis applies to both sides of the device.

While the architecture of the TDC is described as consisting of three primary blocks, the implementation of the TDC is best visualized as a series of symmetric slices. Whereas the delay line, sampling elements, and shift elements are each comprised of identical modules performing the same function, the vertical slice is composed of one of each component necessary for operation, specifically one inverter, one sampling element, one shift element, and one multiplexor. Considering the TDC as a series of vertical slices, rather than as horizontal blocks, provides a better intuition into the operation of the device. A diagram showing three vertical slices is shown in Fig. 5-5 with dashed lines representing the architectural blocks and dotted lines enclosing the components of each slice. As implemented, the TDC consists of

400 vertical slices tiled in successive columns.

The individual sampling elements each consist of a D-type flip-flop (DFF) with the D input connected to the source signal. Therefore, when the DFF receives a rising edge at its clock input, it samples the source and stores the corresponding value at its

Q

output. By connecting the clock input of each DFF in the block to

64

Slicel

Ref_Clock

Source

Control

--------------------

Shift

Elmns

_

CK

QN

DFF

D

Q B

SDFF

D

Slice2

Ref-Clock

Source

Control

CK

QN

--

Q

SIice3

RefClock

Source

Control

- --- -- -- -- -- -- -- -- -- -

CK

QN-

BD

SDFF

a B

S

-.. -...

I

Delay

.

..-. .

D 0-L

Sample n

DFF

CK

QN

-

- - - - - - - - - - -

Samplinghase

-

D

DFF

0

Sample-n

CK

QN

-

... -.. amplinghae

D

DFF

0

Sample n

CK

QN

- - - - - - -

SamplingPhase

Figure 5-5: TDC schematic slice.

consecutive taps of the delay line, each DFF receives a successively delayed clock signal.

1

When a clock pulse enters the delay line, it creates a series of progressively delayed clocks, or sampling phases, which cause the DFFs to sample the source at increasingly later points in time. When this operation is complete, the collective sampling block contains 400 samples of the source signal, each having been recorded at equally spaced intervals in time. The 400 samples are then shifted off-chip via the shift register. After each shift element has loaded the value sampled by its associated sampling element, the multiplexors change state such that the input to each shift element is now connected to the output of the previous one to form a shift register.

Since the last element is connected to the I/O pad, one bit is shifted off-chip each time the shift register is clocked by the reference clock. By comparing successive bits of the thermometer code, the point at which the source transition occurred can be determined.

'Due to the fact that consecutive taps of the delay line produce rising-edge and falling-edge sampling phases, a second version of the slice containing a negative-edge triggered DFF was required.

65

5.4.1 Operation of the TDC

The operation of the TDC begins with the 50 MHz reference clock which is input from an off-chip source. Within the control block, the reference clock is divided by

512, creating a low-frequency clock of about 100 kHz. This clock is used to stimulate the delay line, thus initiating the creation of the sampling phases. As the waveform propagates through the delay line, a series of sampling phases is created, each spaced apart by the propagation time of one inverter, or -30 ps. When each of the sampling elements is triggered by the appropriate sampling phase, it stores the value of the source signal at its output. Since there are 400 slices in the TDC, the sampling process takes 12 ns to complete. With a reference clock period of 20 ns and a sampling period of 30 ps, the skew between the TDC output signals will be 666 times the skew of the input signals due to the magnification characteristic of the device. Therefore, if the measured skew at the output is 200 ns, the skew at the input of the device is 300 ps. Since the device quantizes the skew before magnifying it, the accuracy of the measurement is ±15 ps.

The second role of the control block is to create the Control signal which determines whether the TDC is in the sample or the shift state. While in the sampling state, each shift element receives its input directly from the sampling element via the multiplexor. Once the sampling process has completed, however, the multiplexor connects the input of the shift element to the output of the previous shift element such that the data can be transferred off-chip. Since the shift register can transfer one bit per pulse of the 50 MHz reference clock, it takes 8 pus to shift all 400 bits off-chip. Given that the sampling state requires 12 ns and the shift state requires 8 pts, the TDC remains in the shift state over 99% of the time.

When all 400 bits have been shifted off-chip, the TDC has completed one sampling cycle. At this point, the control block asserts the Control signal, thus instructing the shift elements to load the values from the sampling elements. In this fashion, the

TDC operates continually as long as the reference clock is supplied. To allow the entire 8 pts cycle to complete, the system is designed to issue a control pulse once

66

Sampling

Phases 1-6

Control

Source 1

Sampling

Elements 1

Data Out 1

RefClock

1 1 10

0 0

EMS

Figure 5-6: TDC timing diagram.

every 10 ps, thus giving the TDC a sampling rate of 100 kHz, which is largely limited

by the time needed to shift the data off-chip.

A simplified timing diagram is shown in Fig. 5-6. The waveforms in this figure are from a TDC containing only six sampling elements, rather than the 400 in the actual implementation. In the 6-element TDC, the reference clock is divided by 16 to generate the delay line clock, rather than 512. After the six sampling phases have been created, the source is sampled by the DFF-sampling elements, whose outputs are shown. As only elements 5 and 6 have high outputs, this indicates that the transition on the source input must have occurred between sampling phases 4 and

5. After each of the sampling phases has occurred, the control pulse causes the shift elements to obtain the input data from the sampling elements. For the next six cycles of the reference clock, data is shifted out of the shift registers in the form of a thermometer code where the data from the sampling elements is in reverse order, such that "110000" is received off-chip.

Although this example only used one source for simplicity, the output from a second source might be "111100", indicating that it transitioned between sampling phsaes 2 and 3. Therefore, the skew between the two sources was equal to two sampling quantization steps. Since this is expected to be -30 ps, the two units of phase difference in the example would translate to a relative skew of 60 ps, with an accuracy of t15 ps.

67

Table 5.1: TDC design summary

Frequency Range of Sampled Signals

TDC Operating Frequency (Shift Registers)

Nominal Resolution

Sampling Elements

/ Array

Continuous Sampling Rate

Clock source pairs

Area

100-500 MHz

50 MHz

30 ps

400

100 kHz

3 (including test)

470 pm x 370 pm

5.4.2 Resolution and Calibration

As implemented, the TDC requires no calibration. The resolution of the system is exactly equal to the propagation delay of a minimum-sized inverter, which is ~30 ps in the target process. Since this propagation delay can vary with process, voltage, and temperature variations, a reference oscillator consisting of five inverter stages is also included in the design. By measuring the operating frequency of the free-running oscillator, the propagation delay of a minimum-sized inverter, and the resolution of the TDC, can be determined.

5.5 Summary

A low-resolution time-to-digital converter was designed and included on the test-chip for fabrication. The device was composed entirely of digital logic which simplified the layout. To insure uniform propagation delays, the source and clock signals required careful routing. For this reason, as well as the repetition inherent to the design, the converter was hand-placed and routed using standard cells. The summary of the design is shown in Table 5.1. Finally, a list of pins used by the TDC is shown in

Table 5.2.

68

Table 5.2: TDC pins

Name

Source-A

SourceB

Direction Type

Input Digital

Input Digital

Input Digital

Comment

Test source A

Test source B

Mux control SO

S1

ClkIn

Input Digital

Input Digital

Mux control

TDC clock input

Data-OutA Output Digital Data output A (thermometer code)

Data-OutB Output Digital Data output B (thermometer code)

Ref Output Digital Reference oscillator output

69

70

Chapter 6

The Test Chip

A test chip was fabricated to verify the ODB architecture. It was designed in the

TSMC .18 pm Logic process and fabricated through MOSIS. The Artisan Sage-X standard cell library was used for the digital logic blocks and a modified version of the Artisan digital I/O library was used for the pads and buffers. The results from the chip are not yet available for inclusion in this thesis. All quoted results are based on simulation data.

Three different instances of the architecture were included on the chip, each with a distinct testing objective. The Dual-Optical Deskew Buffer (DODB) was designed to simulate the deployment of the architecture on a commercial microprocessor, while the Closed-Loop Simulated Pulsing (CLSP) instance was included to verify basic functionality and to measure the characteristic skew of the phase detector. Finally, the

Opened-Loop Simulated Pulsing (OLSP) version isolated the analog components of the local controller to verify their functionality and to characterize the phase detector.

The DODB and the OLSP instances contain complete versions of the architecture and are described in the following sections of the this chapter. The OLSP instance, however, was intended to test the circuitry of the local controller and will be described in Appendix A. along with other details of the test chip implementation.

71

DLControl

Optical

Reference

Clock

Photodetector -4 Local Controller A

ControlA

Variable

Delay Circuits

|____TCode1

TC0

Skew Emulation

Delay Line

Optical

Reference

Clock

FPhotodetecto~r bClk

Local Controller B

Control_.B

Variable

Delay Circuits

Figure 6-1: Dual-Optical Deskew Buffer (DODB) architecture.

Coe

Z

6.1 Dual-Optical Deskew Buffers

This configuration of the architecture consists of two fully operational ODBs located in close proximity to each other. The local controller of each ODB receives the optical clock from a discrete photodetector and the global electrical clock from a source offchip as shown in Fig 6-1. Each ODB then attempts to synchronize the local clock output from its delay line to the optical clock input to its photodetector. If the systems are ideally matched and operational, the local clock outputs should have no relative phase difference between them. To simulate skew in the global clock, a delay line was inserted between the global clock and the input to the delay circuit of one of the ODBs. The control voltage for the delay line was ported off-chip, thus allowing the relative skew to be adjusted during testing. Since the delay line contains a dynamic range of two periods, ODB performance under absolute worst case skew can be observed and quantified.

72

Table 6.1: DODB pins

Name

RBias

Control-1

Control-2

Direction Type Comment

Z

ClkIn

Output Analog Output of XOR phase detector

Input Digital Global clock input for ODB_1 and ODB_2

Clk_Out_d4 Output Digital Global clock + 4

CompU1 Output Digital Comparator1 Up Signal

CompU_2 Output Digital

RO-Control Input Analog

Comparator_2 Up Signal

Turbo_1 Output Digital Mode of operation of ODBi1

Turbo_2 Output Digital Mode of operation of ODB_2

DL-Control Input Analog Control voltage for skew-inducing delay line

Control voltage for ring oscillator

Input Analog Reverse bias voltage for photodetectors

Output Analog Control voltage for delay line in ODBJ1

Output Analog Control voltage for delay line in ODB_2

To insure that the global clock and the optical clock are matched in frequency, they must be generated from the same source. Since the same source that drives the laser must also drive the ODB delay circuits, the performance of the DODB test configuration will be frequency-limited by the bandwidth of the I/O pads, packaging, and photodetectors. Therefore, the maximum operating frequency for the DODB is expected to be ~100 MHz. To measure the performance of this test setup, the local clock from each ODB is connected to the XOR phase-detector and to the time-todigital converter. As explained in Chapter 3, the phase detector can only be used for detecting changes in the relative skew, not in quantifying the absolute skew.

Accordingly, the output of the phase detector will settle to a constant voltage if the two local clock outputs from the ODB are synchronized. As the skew between the local clocks approaches zero, so will the output of the phase detector.

To measure the absolute skew, the local clocks are also connected to a pair of inputs of the time-to-digital converter. With a sampling frequency of 100 kHz, the

TDC will measure the relative skew between the local clocks once every 10 Ps, thus providing snapshots of the skew as the ODBs attain lock.

A list of pins associated with the DODB test instance is shown in Table 6.1.

73

Transient Response

1.

._: DODB.ControlA

800m.

600.

400m

200m.

0.0

1.0 DODBControlB

800m.

600m.

400m.

200m.

0.0

0.00

. . . .

8.6ou 17.2u time (s

25.8u 34.4u 43.Ou

Figure 6-2: DODB simulation results showing the control voltages of each ODB.

6.1.1 DODB Results

The DODB instance was simulated to verify its functionality and estimate its performance. In this simulation, the off-chip clock was modeled as an ideal clock with 5 ns period and 50% duty-cycle. The initial skew between the two systems was minimized

by setting DLControl to 0 V. The optical current input from the photodetectors was modeled as a current source with a 5 ns period and 50% duty-cycle. Fig. 6-2 shows the control voltages of each ODB as a function of time. As the figure shows, both systems locked their respective local clocks to the reference clock after about 30 ps.

74

I I I I I-

0.5 --

0

0.4 -

CL

.Ci

Co a

0.31

N

0 z 0.2

0.1

0 0.5 1 1.5 2

Time

2.5 3 3.5

Figure 6-3: Normalized phase of each ODB local clock output.

4 x 10

4.5

Fig.6-3 shows the relative phase of each of ODB local clock outputs as a function of time. By using the skew-emulation delay line, the electrical clock input to the first

ODB was skewed by about .32 periods relative to that of the second to simulate the effect of skew on the global clock tree. As the plot shows, the phase difference between the local clocks remains nearly constant while both are in the acquisition mode. Once the first ODB attains lock, however, the phase of its local clock stabilizes. The other

ODB continues to adjust the phase of its local clock until it is in lock with the reference clock. When both ODBs are in lock, the average phase difference between their local clock outputs is zero. Although the time average phase difference is zero, there are still cycle-to-cycle phase differences caused by quantization jitter on the control voltage.

75

JDL Control

ROCn Oscllator

Empt

Pseudo-optical Reference Clock

-k Pulse

,

CnZ

Control aria

T._Code1

TDC TCode2

Figure 6-4: Closed-Loop Simulated Pulsing (CLSP) architecture.

6.2 Closed-Loop Simulated Pulsing

Whereas the DODB configuration aimed to test the architecture in a realistic implementation, the Closed-Loop Simulated Pulsing (CLSP) setup was designed to test the ODB architecture at higher frequencies. As shown in Fig 6-4, the CLSP instance replaces the photodetector with the current pulse generator to achieve a higher frequency of operation. The clock source for the pulse generator is derived from the same global clock that drives the delay circuits. This global clock can either originate off-chip or be generated by the ring oscillator. To simulate the phase difference typically present between the global clock and the optical clock, a delay line is inserted between the global clock and the current pulse generator. The control voltage for this delay line is ported off-chip so that the delay can be adjusted during testing.

The primary objective of the CLSP instance is to verify the ODB's ability to lock the local clock to the pseudo-optical current pulses. Once the ability to lock has been confirmed, the secondary objective of the CLSP instance is to measure the characteristic skew between the pseudo-optical current pulses and the local clock output. Ideally, this skew will be constant and will not vary with the relative phase offset between the optical clock and the global clock. By adjusting the skew between the pseudo-optical clock and the global clock, the variation in the characteristic skew can be observed.

The CLSP instance contains two means of measuring the characteristic skew of the ODB architecture. Firstly, the XOR phase detector is used to determine whether the phase difference between the local clock and the pseudo-optical clock is constant or time-varying. Secondly, these same points are connected to the time-to-digital

76

converter to obtain a quantitative value for the characteristic skew.

6.2.1 CLSP Results

The CLSP instance was simulated to verify functionality of the architecture. In this simulation, the mutiplexor was set to accept an off-chip clock source, which was modeled as a clock with period 7.5 ns and 50% duty-cycle. The initial skew between the global and pseudo-optical clocks (the electrical clock supplied to the current pulse circuitry) was equal to the intrinsic delay of the delay line since DLControl was set to 0 V. Under these conditions, the ODB acquired lock in under 30 ps. Fig. 6-5 shows the control voltage of the CLSP instance as well as the output of the XOR phasedetector. The ODB control voltage continually rises until lock is achieved, at which point it stabilizes. The output of the XOR phase detector stabilizes shortly thereafter.

The pseudo-optical reference clock and the local clock waveforms are shown in Fig.

6-6 during both acquisition and lock modes. The noticeable phase offset between

Transient Response I]

900m CLPControl

700m

500m

300m

1.80 CLPZ

1.40

1.00

600m

200m

0.0

10u 20u

30u time ( s

40u 50u

Figure 6-5: CLSP system control voltage and XOR phase detector output.

77

60u

the reference and local clocks during lock mode results from the quadrature-locking characteristic and the characteristic offset inherent to the phase detector.

The CLSP instance was also used to measure the jitter on the output local clock induced by quantization noise on the control voltage. Fig. 6-7 shows a close-up of the control voltage after the ODB has obtained lock. Quantization noise arises from the fact that each Down pulse does not exactly cancel each Up pulse. Therefore, the control voltage tends to fluctuate around an equilibrium level which causes the delay line to successively add and subtract delay from the global clock. In this

2.0

Acquisition mode

.: Pseudo-optical reference clock

0 Lock mode

2.0 -:

Pseudo-optical reference clock

1.3

600m[

2.0 -:

Local clock

1.3

600 m-

-loom .... ...............

2.0 *: Local clock a

1.3

1.3

600m.

-1005u-7 5.91u time ( s

)

5.92u

L

5,93u

600 m

46.48u 46. 49u time ( s

46.50u 46.51u

Figure 6-6: CLSP reference clock and local clock phase relationship during acquisition and lock modes.

78

simulation, the control voltage fluctuates +1 mV about the steady-state level of M1, for a combined range of 2 mV. The control voltage vs. delay characteristic of the delay line shown in Fig. 3-14 indicates that a 2 mV variation in the control voltage at 830 mV results in ~6.4 ps of delay. Therefore, in this simulation, the peak-to-peak jitter on the local clock due to quantization noise on the control voltage was -6.4 ps.

Finally, the CLSP instance was used to obtain power estimates for the ODB architecture. As shown in Fig. 6-8, the CLSP configuration uses -5.5 mW of power

Transient Response

853.0m

852.8m

852.6m

852.4m

./

852.2m

.

854.om-e: CLSP-Control

853.8m.

853.6m.

853.4m .

853.2m.

44u 45u 46u 47u time ( s

48u

7

49u 5

OU a

Figure 6-7: CLSP system control voltage while in lock, showing quantization noise.

79

8

8I

7.5-

7-

E

6.5-

0

6-

5.5-

5 1I

0

I I

10 20 30

Time (us)

40

Figure 6-8: CLSP power consumption.

50 60 after the system has obtained lock. This measurement includes all of the power used during simulation of the CLSP configuration, including the biasing, control clock generation, and pseudo-optical current pulse generator. In an actual implementation of the architecture, the current pulse generator would be replaced by an off-chip laser source. The power required by the laser would be supplied by the system, rather than the chip, and is therefore not included in the power estimate. Likewise, the simulation did not account for the clock buffers used to drive the local clock grid since these would have existed even without a clock deskewing system.

A list of pins associated with the CLSP test instance is shown in Table 6.2.

80

Table 6.2: CLSP pins

Name Direction Type Comment

Z

ClkIn

Output Analog

Input Digital

Output of XOR phase detector

Global clock input for ODB

ClkOutd4 Output Digital

Clk-Out Output Digital

Global clock + 4

Pseudo-optical clock + 4

CompU Output Digital Comparator Up Signal

CompD

Turbo

Output Digital Comparator Down Signal

Mode of operation of the ODB

DLControl Input Analog Control voltage for skew inducing deley line

RO-Control Input Analog Control voltage for ring oscillator

Control

Output Digital

Output Analog Control voltage for delay line in ODB

6.3 Summary

Since the DODB and CLSP instances each contained one or more complete instances of the ODB architecture, the results from these simulations provide a fair estimate of the system's peformance. These results are summarized in Table 6.3. The figures reported are normalized for one ODB instance as would be found in each local clock domain.

Table 6.3: Simulated results

Power / ODB

Local Clock Jitter (due to quantization noise)

Average skew between clock domains

Area / ODB

Acquisition Time (wost case observed)

5.5 mW

6.6 ps

0

300 pm x 500 pm

30 ps

81

82

Chapter 7

Conclusions

This chapter will first present the results of the research undertaken in this thesis.

Performance limitations of the ODB architecture will then be discussed in order to evaluate its viability in future designs. A summary of the work completed as part of this thesis will then be provided, as well as possible future directions for research in this area.

7.1 Results

A test chip demonstrating the proposed optical deskew buffer architecture was submitted to TSMC via the MOSIS fabrication service. Post-silicon test results are unavailable at the time of publication of this thesis.

Pre-silicon simulations of the design demonstrated the functionality of the architecture. In these simulations, the photodetectors were replaced with current sources.

Results from these simulations showed that the ODB architecture was capable of synchronizing a global electrical clock to a pseudo-optical clock composed of current pulses. Furthermore, a simulation of the dual optical deskew buffer configuration confirmed that the system could synchronize the phases of adjacent clock domains in the present of global clock skew.

Due to the difficulties in simulating optoelectronic circuits, simulations were not performed on the photodetectors. Their design was believed to be correct by construc-

83

tion. Likewise, the number of transistors in the time-to-digital converter precluded

FET-level simulation of this device. A modified version with fewer sampling elements, and hence less dynamic range, was successfully simulated.

7.2 Performance Limitations of the ODB Architecture

Pre-silicon simulation data of the DODB architectural instance indicates that system performance will be limited by jitter caused by quantization noise on the control voltage. In this configuration, the skew between the local clock domains was small enough that it was negligible when compared to the jitter.

The results obtained from the DODB simulation represented ideal operating conditions for the architecture. When implemented on an actual chip, manufacturing effects will likely cause mismatch in the transistors, capacitors, and resistors of the circuits, as well as the photodetectors. This mismatch will manifest itself as nondeterministic skew between the local clock output and the reference clock input.

Although each of the ODBs will still be able to obtain lock, the skew between their local clock outputs will not be zero because of the different amounts of skew imparted

by the phase detector.

The performance of the ODB system will also be limited by its susceptibility to noise. The magnitude of the thermal noise from the capacitors of the phase detector and first amplifier will be significant when compared to the voltages expected to be output from the the phase detector. Once the difference in the voltage outputs becomes small (meaning the system is approaching lock), the thermal noise from the passive devices will be of the same magnitude as the differential voltage, thus preventing the comparator from accurately determining the relative levels of the inputs.

Wrong decisions by the comparator will be equivalent to a dead-band region of a conventional phase detector, since the comparator will randomly issue Up and Down signals until the phase of the local clock drifts enough that the output of the phase

84

detector is no longer lost in the noise. Once the comparator can again make accurate decisions, the system will reapproach lock, repeating the cycle. The net result of this process will be jitter on the control voltage which directly leads to jitter on the local clock output of the ODB. Although thermal noise can never be eliminated, it can be lowered by either decreasing the operating temperature of the system or by using large capacitors.

Other sources of noise that will impact the performance include substrate noise and power supply noise. Substrate noise can be controlled by careful layout, such as isolating the critical analog circuitry from noisy digital components, surrounding the circuits with deep N-wells, and placing large numbers of substrate contacts to collect spurious carriers. Each of these techniques was implemented on the test chip

(although standard N-wells were substituted for deep N-wells due to the features of the process) to reduce the effects of noise as much as possible. Likewise, power supply noise can be controlled by using separate supply buses for the critical circuits and designing the amplifiers to have high power-supply rejection. These techniques were also employed on the test chip.

Although the performance of the delay circuitry may impact the performance of the ODB architecture as implemented on the test chip fabricated for this thesis, it should not be considered a limitation to the architecture in general. Essentially, all that is required of the delay line circuitry is to shift the phase of the global clock as directed by the comparator. In the process, it should contribute as little jitter as possible to the clock, such that the jitter on the local clock output is no greater than the jitter on the global clock input. If this is accomplished, the delay line circuitry will have succeeded in minimizing the jitter of the local clock. While the delay circuitry implemented on the test chip is based predominantly on the architecture proposed

by Maneatis, any type of delay line would be compatible with the local controller.

Finally, the photodetectors will most certainly limit the performance of the test chip since they are not expected to operate above 200 MHz. Like the delay line circuitry, however, the photodetectors can easily be replaced by more advanced optoelectronics on future versions, and should therefore not be considered a limitation

85

of the ODB architecture in general.

7.3 Summary

This thesis has presented a system which uses an optical reference clock to deskew an electrical global clock. Referred to as the Optical Deskew Buffer, this architecture has the potential to significantly reduce the skew and jitter associated with traditional

H-tree clock distribution topologies. While optical clocking has been discussed for over fifteen years, it has yet to be commercially demonstrated due to the difficulties in performing the optical-to-electrical conversion. The ODB architecture proposed in this thesis side-steps this conversion difficulty by using the optical reference clock to synchronize an electrical clock, rather than replace the electrical clock.

This thesis also presented a design for a low-resolution time-to-digital converter that is implementable in standard cell logic. By allowing the relative phase between two signals to be measured, this TDC will add significant measurement capabilities to the chip. Furthermore, since the design is synthesizable, it can be easily ported to future processes and included as a block on any chip where low-resolution timing measurements are needed.

Finally, this thesis explored the area of optoelectronics. Various photodetector technologies were examined in order to determine which best met the needs of the project. The lateral-PIN photodetector was selected for its compatibility with the standard CMOS process. While photodetectors of this type are known to have lower performance, it was expected to meet the goal of the project, namely to demonstrate the functionality of the architecture.

7.4 Future Work

A variety of possibilities exist for continuing the work presented in this thesis. These areas include the optoelectronics, the circuits, and the measurement techniques, each of which will be discussed in this section.

86

7.4.1 Optoelectronics

The DODB instance on the test chip used a lateral-PIN photodetector implemented in a standard silicon process. Photodetectors of this type are known to have poor responsivity and a low maximum operating frequency. The proposed ODB architecture, however, is compatible with any type of photodetector. Future directions could therefore involve integrating Ge, InP or GaAs-based photodetectors onto a silicon substrate containing the ODB architecture. Although the post-processing necessarily increases the cost of production, this tends to be the approach currently favored by both academia and industry. Finally, further work in the optoelectronics area could involve integration of the waveguides needed to distribute the optical source to the local clock domains.

7.4.2 Circuitry

A substantial amount of circuitry was designed for this thesis. Without exception, best known topologies were selected for each component. Despite this, optimizations could be made to improve the system, particularly in the delay line circuits. Since the ODB system does not perform clock generation, it is constrained to transmit any input jitter directly to the output, in addition to the jitter added by the delay line itself. Although the Maneatis delay line is expected to contribute low-jitter, the wide dynamic range required for the test system may degrade the performance in this area.

Future versions of the architecture may not require the same wide dynamic range, and instead may be able to focus on jitter reduction.

Since the outputs of the local controller are digital signals, future revisions of the architecture could implement a digitally-controlled delay line based on capacitive loading. This would eliminate the charge pump and bias circuits needed to operate the delay line, as well as improve the steady-state performance by eliminating quantization noise associated with the analog control voltage. A digitially-controlled delay line would also simplify the startup condition associated with the proposed architecture.

87

The circuits of the local controller could be further improved as well. Since the phase detector, amplifiers, and comparator are all analog circuits, they are all highly susceptible to process variations, substrate noise, power supply noise, and temperature fluctuations. Perturbations due to any of these factors would be manifested as either random skew or phase error at the output of the system, and should therefore be avoided. Improvements in the design of these circuits could improve the immunity to these effects.

7.4.3 Testing

Testing of clock networks is becoming increasingly difficult. As clock cycle times decrease, every picosecond of jitter becomes significant. Yet, it is extremely difficult to obtain timing measurements which are accurate to within a few picoseconds. In this work, a low-resolution time-to-digital converter was implemented for on-chip measurement of skew. Since the expected operating frequencies of the signals to be measured was less than 500 MHz, a resolution of ~30 ps was adequate for this work. As future implementations of the architecture achieve higher frequencies, better measurement techniques will be required.

88

Appendix A

Test Chip Implementation Details

The test chip included three configurations of the architecture. The Dual-Optical

Deskew Buffer (DODB) and Closed-Loop Simulated Pulsing (CLSP) contained the complete circuitry of the architecture and were intended to demonstrate functionality and performance of the entire system. The final configuration, Open-Loop Simulated

Pulsing (OLSP), was included to test various circuit blocks of the local controller.

This appendix will present this configuration as well as general topics related to the design of the test chip.

A.1 Open-Loop Simulated Pulsing

While the DODB and CLSP instances test the functionality of the ODB architecture, the Open-Loop Simulated Pulsing (OLSP) instance provides a means of verifying the analog circuit components of the local controller, specifically the phase detector, amplifiers, and latched comparator. As shown in Fig A-1, the OLSP instance once again uses the current pulse generator in place of the photodetector, which was omitted in an effort to isolate the analog components of the loop controller from the optoelectronics. Since the OLSP instance is designed to operate in an open-loop configuration, the electrical clock inputs to its phase detector are connected to either an off-chip clock or to the output of the ring-oscillator, rather than to the feedback clock from the delay line. Without the need for the feedback local clock, the delay circuitry

89

DLControl

Line

RQotol Rn

Oscillator

>riAmpmu

Phase--0.Lth

Detector mlifimComparator Control Charge

PD

UP/DN

A

Arnp~utB

Figure A-1: Open-Loop Simulated Pulsing (OLSP) architecture.

is eliminated, further isolating the local controller components. As in the CLSP instance, the current pulse generator is clocked by a variable-delay version of the same electrical clock connected to the phase detector. With the control voltage for this delay line ported off-chip, the amount of delay inserted between the clocks can be varied during testing, thus simulating the skew between the global and the optical clocks.

Measuring the performance of the phase detector is extremely difficult since its signals are in the sub-100 mV range. Therefore, the outputs of the second amplifier are low-pass filtered and ported off-chip instead. By varying the control voltage of the delay line, various phase offsets between the electrical and pseudo-optical clocks can be simulated, and the corresponding outputs of the second amplifier can be measured.

To further verify the operation of the phase detector, the outputs of the amplifier are connected to the control block, just as in the standard configuration. In the OLSP instance, however, the control block is hard-wired to always forward the Up/Down pulses directly from the latched comparator to the charge pump. Without the feedback provided by the local clock output of the delay line, the charge pump repeatedly issues Up or Down pulses regardless of the voltage of the loop capacitor. Therefore, the voltage on the capacitor will reach a steady state value of

GND or VDD depending on whether the charge pump is issuing Down or Up pulses, respectively. This control voltage is then connected to an inverter whose output is ported off-chip. By varying the control voltage of the delay line that feeds the current pulse generator, the point where the phase detector output transitions from Up to Down (or vice-versa) will be

90

Table A.1: OLSP pins

Name Direction Type Comment

Z

ClkOut

Output Analog

Output Digital

Output of XOR phase detector

Pseudo-optical clock / 4

Output Digital Thresholded and inverted control voltage Down/Up

DL-Control

RO-Control

Input Analog Control voltage for skew inducing delay line

Input Analog Control voltage for ring oscillator

Amp._Out_2A Output Analog Low-pass filtered output of second amplifier

AmpOut-2B Output Analog Low-pass filtered output of second amplifier observable. Since the electrical clock and its delayed version are both connected to the XOR phase detector, their relative phase difference will also be known, and thus the characteristic plot of the phase detector can be generated.

A list of pins associated with the OLSP test instance is shown in Table A.1.

A.1.1 OLSP Results

A simulation was performed to verify the OLSP configuration, and the results are shown in Fig. A-2. For this simulation, the ring oscillator was set to provide a global clock of 320 MHz. The delay line voltage was then varied so as to create a phase offset between the pseudo-optical current pulses and the electrical clock from the ring oscillator. The delay control voltage (DL-Control) was initially set to .7

V and was then increased to .9 V and 1.0 V before returning to .7 V. As the plot shows, with the delay control voltage set to .7 V, the phase detector issued DNpulses, thus keeping the system control voltage near zero. The control voltage remains near zero even as the delay control voltage is increased to .9 V. Once the delay control voltage is raised to 1.0 V, however, the phase detector begins to issue Up pulses, thus causing the sytem control voltage to rise. As soon as the system control voltage

(OLSP-Control) increases beyond the switching threshold of the output buffer, the

OLSP-ControLHiLo signal transitions high. Finally, after 150 ps, the delay control voltage is reset to 0.7 V and the charge pump begins to lower the system control voltage. Around 200 ps, the system control voltage passes the threshold of the buffer,

91

600rr

400mr

200m.

0.0

0.00

1.2

1.0

1.6

1.4

2.0

:

-:

QLSPPhose

OLSPControl

1.8

50.Ou

Transient Response

v: OLSPControlHiLo

0: OLSPDLControl

- I

...

I

.........

- -

10Ou 150u 200u time ( s )

I

250u

Figure A-2: OLSP results.

and the OLSP_ ControLHiLo signal returns to zero. The plot also shows the output of the phase detector which indicates the phase relationship between the electrical and pseudo-optical inputs to the phase detector. As expected, this output is a constant during the periods when the delay control voltage is unchanging.

92

A.2 Layout Techniques

Due to the analog nature of many of the circuits, great care was taken during the layout of these circuits. All capacitors in these blocks were implemented using spherical common-centroid layouts to render the design invariant to two-dimensional process gradients. Since the target process was intended for digital applications, neither

Metal-insulator-Metal (MiM) nor Poly-Poly capacitors were available. Therefore, sandwich capacitors were implemented using various combinations of the six available metal layers. The same spherical common-centroid layout was also used for critical transistors in these blocks. To further increase transistor matching, dummy gates were added to the edges of all common-centroid layouts. Despite the use of these techniques, perfect matching cannot be guaranteed due to process variations and general manufacturability issues.

A.3 Summary

To conserve area and I/O resources, the three instances shared a common control clock generation block as well as a common amplifier bias circuitry. Although the input to the control clock generation block would typically be the global clock of the system, the test chip used an external source for this input to increase testability.

Since the frequency of the control clocks dictates the feedback rate of the system, the ability to slow these clocks allowed for the stability of the architecture to be better characterized. Finally, a single time-to-digital converter was included, with its inputs multiplexed to accept multiple sources.

Packaging options for the chip were limited because of the requirement that the top side of the chip be accessible in order to stimulate the photodetectors. The

Kyocera PGA84M package with taped lids was selected to meet this specification.

The bandwidth of the pins associated with this package is expected to be -100 MHz.

A summary of the design and packaging is shown in Table A.2. A plot of the test chip layout is shown in Fig. A-3. In this plot, the top-level metal layers that were

93

Table A.2: Test chip summary

Width

Length

2591 pum

2591 pm

6.7 mm 2 Area

FET Count (excluding I/O Buffers) 62,008

Pads 84

Package Type Ceramic

Package PGA84M used to shield the silicon from stray photons have been removed for clarity.

Although most of the I/O pins of the chip have been presented with the associated architectural instances, several of the pins were shared by the entire chip. These pins are listed in Table A.3

Table A.3: Other pins

Name

Reset

VDD_1

VDD_2

Direction

Input

InputOutput

InputOutput

Type

Digital

Analog

Analog

Comment

Shared reset signal

S

Iref

Clkin

Input

Input

Input

Digital

Analog

Mux-control for all three instances

Reference current for bias generator

Digital Clock source for generating control clocks

VDDc InputOutput Analog Clean voltage supply for sensitive circuits

Core voltage supply

I/O voltage supply

94

Figure A-3: Layout of the chip.

95

96

Appendix B

Bonding Diagram

The bonding diagram of the test chip is shown in Fig. B-1. The chip contained 84 pins, including three

VDD rails and two GND planes.

VDD1 supplied power to the core logic and pre-drivers of the I/O buffers while

VDD2 isolated the supply of the

I/O post-drivers. Due to the sensitivity of the analog circuits in the local controller, a separate power supply,

VDDC, was used for the phase detector, amplifiers, and latched comparator. The wiring connections shown on the bonding diagram use the signal names as they appear in the layout. During the writing of this thesis, several signals were referred to differently to better reflect their purpose. The most significant case of signal renaming occurred with the DODB instance, which is referred to as RTOP in the layout. Additionally, the comparator outputs are referred to as CompU and

CompD in this thesis, but appear as OutA and OutB in the layout.

97

N

(.v

II v' (qI N' -H *

C1

0

0 qi qi

4. 0

-1~-

0

0

-10 ('4

0

Q -H

(a L U 0 6 .

9

.

.

N.

0 0 00

H41 H

0

H- .- 4

CLPF

OLP_An mpOutA

L

OLPF

OLPI

2i vssi

K 22

|

L J2 KIJI

M2Hj r.3 r2GIF F E EF2 012 H

16HHH

C2 02

H

[

ZB R1 VSS2

VDD1 K

VDD2

OLP_Am poutB L2 4

Ocntl K4[

-

2S

EDK2Z2

|

81Z83

|

UZPe

E R9 04

RTOPOutA2

RTOPTurbo2

RTOPikIn

RTOPOutd4

Lcntl LLC

OLPZ K

'E

7

E7/Zg

IrefK

RTOP_Turboi vssi

LS =

E7'/

E7lZ5

RTOPOutAl

5

VDD2

VSS2

VSS2 J-

VDD2

J632E

3

VDDI

VDD1

L8E33

Ei/3iZB vssi

VDDc L1t34

Z72ZVDD1

VDDc

K7t 5 vssi J7 jjj

CLP_Z

L8II7Z

/ 7 VDD2

VSS2

LI70Z]7

EDM Z

CLP_ DLcntl KGE

=

ClkIn

T2DDout2

CLP_cr tlout Le EZ82Z

Ocnt l

U.4

VDDI

K9 vssi l l~fJ

/

.

1Z

T2DDouti

Z'/ IB

T2D ref

VDD1 vssi

V

('4

H

JM

KU11

( 4 JS

:3

CI)

Jl

:3

M l HU

4(

4J

F4

4J

M

0

M1 lml

Flo

0 V 0 4.)

H V'0. 0

( 3

00 ) ) .~(I0

F9

Si

E9 EU ED Fli

COI H4 H H

EO nI 0

0)

011

(N

1-'

MO01

0

I

M- a

EI a

C q ('4

H Ui)

AU

('4

EnC

0 0 -- -- -- -

I.

0.

H

U

Uas

I

I U .EI

-

U

E. ('0

00555550505czl

04

N U

6.

I-.N

--- TOP...

VIEW-

0 0 a a 0a 0 a

PG84M (35Z MIL s5

ERVITY)

Figure B-1: Bonding diagram.

98

Bibliography

[1] T. Chen. Clocking solutions beyond 10 GHz. In ISSCC 2002, Microprocessor

Design Workshop, Feb. 2002.

[2] B. Clymer and J. Goodman. Timing uncertainty for receivers in optical clock distribution for VLSI. Optical Engineering, pp. 944-954, Nov. 1988.

[3] B. Clymer and J. Goodman. Optical clock distribution to silicon chips. Optical

Engineering, pp. 1103-1108, Oct. 1986.

[4] P. Restle and A. Deutsch. Designing the best clock distribution network. In

Symposium on VLSI Circuits Digest of Technical Papers, pp. 2-5, 1998.

[5] P. Restle et al. A clock distribution network for microprocessors. In Symposium

on VLSI Circuits Digest of Technical Papers, pp. 184-187, 2000.

[6] S. Tam et al. Clock generation and distribution for the first IA-64 microprocessor.

IEEE Journal of Solid State Circuits, pp. 1545-1552, Nov. 2000.

[7] D. Harris and S. Naffziger. Statiscal clock skew modeling with data delay variations. IEEE Transactions on VLSI Systems, pp. 888-898, Dec. 2001.

[8] S. Tewskbury and L. Hornak. Optical clock distribution in electronic systems.

pp. 1-28.

[9] L. Kimerling. Department of Materials Science and Engineering, Massachusetts

Institute of Technology, Cambridge, MA.

99

[10] S. Sam. Characterization of optical interconnects. MS Thesis, Department of

Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, MA, May 2000.

[11] D. Boning et al. Variation aware design of on-chip optical interconnect. In The

Interconnect Focus Center Workshop Department of Electrical Engineering and

Computer Science, Massachusetts Institute of Technology, Cambridge, MA, Mar

2000.

[12] S. Sam et al.. Variation issues in on-chip optical clock distribution. In IEEE

International Workshop on Statistic Methodology, pp. 64-67, 2001.

[13] J. Maneatis. Low-jitter process-independent DLL and PLL based on self-biased techniques. IEEE Journal of Solid State Circuits, pp. 1723-1732, Nov. 1996.

[14] J. Maneatis and M. Horowitz. Precise delay generation using coupled oscillators.

IEEE Journal of Solid State Circuits, pp. 1273-1282, Dec. 1993.

[15] B. Streetman. Solid State Electronic Devices. Prentice Hall, 4th Edition, 1995.

[16] J. Schaub et al. A high speed Si photodiode grown on epitaxial lateral growth.

In Lasers and Electro-Optics Society Annual Meeting, pp. 83-84, 1998.

[17] J. Schaub et al. 'Multi-Gbit/s, high-sensitivity all silicon 3.3V optical receivers using PIN lateral trench photodetector. In Optical Fiber Communication Con-

ference and Exhibit, pp. PD19-01-3, 1999.

[18] D. Miller. Rationale and challenges for optical interconnects to electronic chips.

Proceedings of the IEEE, pp. 728-749, Jun. 2000.

[19] M. Das. Optoeletronic detectors and receivers: Speed and sensitivity limits.

In Optoelectronic and Microelectronic Materials Devices Proceedings, pp. 15-22,

1999.

[20] A. Baschirotto et al. 3ns resolution CMOS low-power time-to-voltage converter.

Electronics Letters, pp. 614-615, Apr. 1998.

100

[21] N. Abaskharoun and G. Roberts. Circuits for on-chip sub-nanosecond signal capture and characterization. In CISC 2001, Digest of Technical Papers, pp.

251-253, 2001.

[22] N. Abaskharoun et al. Strategies for on-chip sub-nanosecond signal capture and timing measurements. In ISCAS 2001, Digest of Technical Papers, pp. 174-177,

2001.

[23] P. Dudke et al. A high-resolution CMOS time-to-digital converter utilizing a vernier delay line. IEEE Transactions on solid state circuits, pp. 240-247, 2000.

[24] E. Raisanen-Ruotsalainen et al. An integrated time-to-digital converter with

30ps single shot precision. IEEE Journal of Solid State Circuits, pp. 1507-1510,

2000.

[25] V. Gutnik and A. Chandraksan. On-chip picosecond time measurement. In

Symposium on VLSI Circuits Digest of Technical Papers, pp. 52-53, 2000.

[26] V. Gutnik. Analysis and characterization of random skew and jitter in a novel

clock network. PhD Thesis, Department of Electrical Engineering and Computer

Science, Massachusetts Institute of Technology, Cambridge, MA., May 2000.

101

Active Optical Clock Distribution

Active Optical Clock Distribution

Travis L. Simpkins

Submitted to the Department of Electrical Engineering and Computer

Science in partial fulfillment of the requirements for the degree of

Master of Science in Electrical Engineering and Computer Science at the

MASSACHUSETTS INSTITUTE OF TECHNOLOGY

May 2002

@

Massachusetts Institute of Technology 2002.

All rights reserved.

A uthor ........

............

Department of Electrical Engineering and Computer Science

May 24, 2002

C ertified by .............................

Anantha P. Chandraksan

Associate Professor

Thesis Supervisor

A ccepted by .........

Arthur C. Smith

Chairman, Department Committee on Graduate Students

BAMKE R

JUL 3 12002

Active Optical Clock Distribution by

Travis L. Simpkins

Abstract

Acknowledgments

Contents

List of Figures

List of Tables

Chapter 1

Introduction

1|+T = 8 ns

1.1 Background

1.1.1 Electrical Clock Distribution

waveguides rec eiver circuitry electrical clock distribution

Chapter 2

Architecture of the Optical Deskew

Buffer

Reference Clock

2.1 System Operation

2.2 The Local Controller

m_

Leading Lagging

Locked

2.3 The Delay Line

2.4 Stability of the ODB Architecture

Chapter 3

Circuit Implementation of the

Optical Deskew Buffer

3.1 Circuits of the Local Controller

Optical Reference Current

Electrical Clock

(Differential)

Phase

Detector

Output

(Differential)

CIkc

vb2

I

Ck

CI kc

InA

InB

OutA_

OutB

----------

ClockCompA

~T

,,ClockCompA

InA

ClockComp

3.1.4 Control Block

3.2 Circuits of the Delay Line

3.2.1 Bias Generator

OntI

OutB

mA0