Variation Aware Design of Data ... for On-Chip Optical Interconnect

advertisement
Variation Aware Design of Data Receiver Circuits
for On-Chip Optical Interconnect
by
Michael James Mills
Submitted to the Department of Electrical Engineering
and Computer Science
in partial fulfillment of the requirements for the degree of
Master of Engineering in Electrical Engineering and Computer Science
at the
MASSACHUSETTS INSTITUTE OF TECHNOLOGY
June 2002
© Massachusetts Institute of Technology 2002. All rights reserved.
A uth or .............................................................
Department of Electrical Engineering
and Computer Science
May 24, 2002
Certified by
...........
.....
Duane S . Boning
Associate Professor of Electrical Engineering and Computer Science
Thesis Supervisor
..
Accepted by .............
Arthur C. Smith
Chairman, Department Committee on Graduate Theses
MASSACHUSETTS INSTITUTE
OF TECHNOLOGY
JUL 3 12002
LIBRARIES
RARKER
2
Variation Aware Design of Data Receiver Circuits for
On-Chip Optical Interconnect
by
Michael James Mills
Submitted to the Department of Electrical Engineering
and Computer Science
on May 24, 2002, in partial fulfillment of the
requirements for the degree of
Master of Engineering in Electrical Engineering and Computer Science
Abstract
Optical transmission offers an attractive interconnect alternative because light has
small propagation delay and negligible crosstalk compared to electrical interconnect.
Optical interconnect integrates transmitters and receivers on the same chip, imposing
different constraints on receiver design than those faced in the telecommunications
industry. This thesis presents an optical data receiver circuit designed to handle
low signal levels and large photodetector capacitance. A focus is placed on variation
robustness and the key metric is propagation delay. Because thousands of receivers
might be integrated on a single chip, power and die area are minimized.
The receiver consists of a clocked sense amplifier operating in positive feedback.
A current mirror at the input isolates detector parasitics from the switching node, increasing evaluation speed. An extra bit is transmitted with each data bus to serve as
a reference. In 0.18 um CMOS simulations, the circuit evaluates correctly at 2.0 GHz
and produces acceptable output at more than 1.0 GHz. The circuit dissipates 310 uW
of power and consumes 130 um 2 of area. Variation analysis explores changes in evaluation speed for asymmetrical and uniform circuit variations. Asymmetrical variations
have a greater effect on performance, which makes circuit matching important. A test
chip using free space illumination for proof of concept was submitted for fabrication
in December, 2001.
Thesis Supervisor: Duane S. Boning
Title: Associate Professor of Electrical Engineering and Computer Science
3
4
Acknowledgments
I could fill more pages than are in this thesis with acknowledgments. Indeed, I could
ramble on for a year and a day about all the thousands of unbelievable people who
have made a difference in my life. However, in doing so I would detract from the two
people who deserve top billing. For all the wonderful people in my life, I will always
be grateful, but this work is for my mother and father.
Mom, Dad, I am humbled by your continuous, unconditional, and utterly selfless
dedication. In every decision, every day of my life, I think about what you would
want me to do, not because you taught me to behave a certain way, but because I
hope with each act I can become slightly more deserving of the remarkable treatment
you give me.
I am awed by the way you live your lives. Your compassion, your integrity, and
your sincere desire to do good things inspire me to become a better person. As I
journey forward, I can only hope to live my own life with as much dignity and honor.
Hopefully, at the end of the journey, I can look back and know that I've taken the
high road, that I've made you proud, and that I've fulfilled the tremendous debt I
feel, by making the most of every single opportunity afforded me.
This thesis represents the first step of my journey, and I believe it is a good start.
I have given it my best effort with the hope that you will see in this work not what I
have accomplished, but what you have accomplished, for everything in this work, and
everything I have done, is a testament to you as parents. Mom, Dad, I owe everything
to you.
Of course, this research would not have been possible without the support of
MARCO, DARPA, the Interconnect Focus Center, and a thesis advisor with a great
sense of humor. Thanks for everything, Duane.
5
6
Contents
1
2
3
Introduction
1.1 Issues . . . . . . . . . . . . . . . . . . . . . . . . .
1.1.1 Clock vs. Data . . . . . . . . . . . . . . . .
1.1.2 Challenges of Monolithic Integration . . . .
1.2 Previous Work . . . . . . . . . . . . . . . . . . . .
1.2.1 Clock Receiver Circuits . . . . . . . . . . . .
1.2.2 Examples of Clocked Data Receiver Circuits
1.3 Data Receiver Overview . . . . . . . . . . . . . . .
Design
2.1 Schaffer and Mitkas Cell . . . . .
2.1.1 Upside Down or Right Side
2.1.2 Timing and Output . . . .
2.1.3 Charge Sharing . . . . . .
2.2 Current Mirror Input . . . . . . .
2.2.1 Design . . . . . . . . . . .
2.2.2 Analysis . . . . . . . . . .
2.2.3 Transient Issues . . . . . .
2.3 Reference Circuit . . . . . . . . .
2.4 Summary . . . . . . . . . . . . .
Process, Sizing, and Simulation
3.1 Process Overview . . . . . . . .
3.2 Sizing and DC Biasing . . . . .
3.2.1 The Latch . . . . . . . .
3.2.2 Current Mirrors . . . . .
3.2.3 Current Sources . . . . .
3.2.4 Reference Circuit . . . .
3.3 Simulation Results . . . . . . .
3.3.1 Test Waveform . . . . .
3.3.2 Output Waveforms . . .
3.3.3 Simulation Measurements
.
.
.
.
.
.
.
.
.
7
. .
Up
. .
. .
. .
. .
. .
. .
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
.
. .
. .
. .
. .
. .
. .
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
15
16
16
18
18
19
19
23
.
.
.
.
.
.
.
.
.
.
25
25
25
27
30
32
32
34
36
43
47
.
.
.
.
.
.
.
.
.
.
51
51
53
54
55
58
58
59
60
62
67
4
Variation Analysis
4.1 Receiver Variation Overview . . . . . . .
4.2 Uniform Variation Results . . . . . . . .
4.2.1 Photodiode Capacitance . . . . .
4.2.2 Photocurrent . . . . . . . . . . .
4.2.3 Channel Length Variation . . . .
4.2.4 Temperature . . . . . . . . . . . .
4.2.5 Threshold Voltage . . . . . . . .
4.2.6 Supply Voltage . . . . . . . . . .
4.2.7 Summary . . . . . . . . . . . . .
4.3 Differential Variation Results . . . . . .
4.3.1 The Latch . . . . . . . . . . . . .
4.3.2 Input Transistors . . . . . . . . .
4.3.3 Input Stage and Reference Circuit
4.3.4 Input and Reference Photocurrent
4.4 Conclusions . . . . . . . . . . . . . . . .
5 Test
5.1
5.2
5.3
5.4
5.5
5.6
Chip
Building Blocks . . . . . . . . . . .
Data Generator . . . . . . . . . . .
Receiver Test Circuitry . . . . . . .
Clock Distribution . . . . . . . . .
PLL Design . . . . . . . . . . . . .
5.5.1 Phase-Frequency Detector .
5.5.2 Charge Pump . . . . . . . .
5.5.3 Voltage-Controlled Oscillator
5.5.4 Frequency Divider . . . . .
5.5.5 Stabilizing the Loop . . . .
Testing Summary . . . . . . . . . .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
. .
. .
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
73
73
80
80
81
82
83
84
85
86
88
89
90
91
92
93
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
95
96
97
98
100
102
103
105
106
109
110
113
6 Conclusion
115
6.1 Summary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 115
6.2 Final Thoughts: Contributions . . . . . . . . . . . . . . . . . . . . . . 117
A TSMC 0.18 urn Digital CMOS Process Characteristics
8
119
List of Figures
1-1
1-2
1-3
1-4
1-5
1-6
Guided wave approach to optical interconnect . . . . . . . . . . . .
Optical clock distribution scheme proposed by Lum [8] . . . . . . .
Synchronous sense amplifier using positive feedback for amplification
Waveforms for sense amplifier in Fig. 1-3 . . . . . . . . . . . . . . .
Positive feedback sense amplifier by Ayadi, et al. [2] . . . . . . . . .
Positive feedback sense amplifier by Schaffer and Mitkas [13] . . . .
2-1
2-2
2-3
2-4
Flipping Schaffer and Mitkas latch "upside down" . . . . . . . . . . .
Using NFET's for series devices reduces transistor size . . . . . . . .
Both configurations establish a differential across the input nodes. . .
When CLK goes low, the NOR gate temporarily goes high until the
signal propagates through the inverters. . . . . . . . . . . . . . . . . .
Upside down Schaffer and Mitkas cell . . . . . . . . . . . . . . . . . .
Basic waveforms for Schaffer and Mitkas cell . . . . . . . . . . . . . .
Input waveforms including charge sharing . . . . . . . . . . . . . . . .
Output waveforms including charge sharing . . . . . . . . . . . . . .
Adding a current mirror isolates the diode capacitance (left). However,
the current mirror needs bias current (middle), and this bias current
26
26
27
must be subtracted to get the correct photocurrent out (right).
2-5
2-6
2-7
2-8
2-9
.
.
.
.
.
15
17
20
20
21
22
28
28
29
31
32
. .
.
33
.
.
.
.
.
.
.
.
33
34
34
35
2-14 Approximate small signal model of current mirror input stage . . . .
2-15 VIN vS. VREF for fast time constant, T . . . . . . . . . . . . . . . . .
35
36
2-16 Transient response of VIN for an arbitrary bit pattern . . . . . . . . .
37
2-17 VIN transients for period T = 0.2T . . . . . . . . . . . . . . . . . . . .
2-18 VIN transients for period T = 0.4T . . . . . . . . . . . . . . . . . . . .
42
42
2-19
2-20
2-21
2-22
2-23
2-24
2-25
2-26
42
43
44
45
45
46
46
47
2-10
2-11
2-12
2-13
Input stage using current mirror . . . . . . . . . . . . . . .
Full receiver circuit . . . . . . . . . . . . . . . . . . . . . .
Small signal model of current mirror input stage . . . . . .
Simplified small signal model of current mirror input stage
.
.
.
.
.
.
.
.
.
.
.
.
transients for period T = 0.8T . . . . . . . . . . . . . . . . . .
VIN and VREF on a full scale voltage range . . . . . . . . . . . . .
Averaging two inputs in the current domain. . . . . . . . . . . . .
Identically sized current mirrors present the same impedance . . .
A "zero bit" does not require a diode at all . . . . . . . . . . . . .
Current averaging with photodiode input . . . . . . . . . . . . . .
Reference circuit shared by all bits in a data bus . . . . . . . . . .
Example of 128 bit optical data bus with a single reference circuit
VIN
9
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
. .
2-27 Full receiver and reference circuit schematic
3-1
. . . . . . . . . . . . . .4 49
Test setup for finding fT . . . . . . . . . . . . . . . . . . . . . . . . .
53
Layout of receiver circuit for a single data bit (12.6 um x 10.6 um) .
Dimensions and transistor names for receiver circuit. Large, unlabeled
transistors are MOS decoupling capacitors. . . . . . . . . . . . . . . .
3-4 Layout of reference circuit (7.0 um x 10.6 um) (left), along with dimensions and transistor names (right). Large, unlabeled transistors
are MOS decoupling capacitors. . . . . . . . . . . . . . . . . . . . . .
3-5 Input test pattern (top) and corresponding VIN waveform (bottom)
3-6 Test circuit for waveforms in Fig. 3-5 . . . . . . . . . . . . . . . . . .
3-7 Simulation waveforms for clock (grey) and input photocurrent (black)
3-8 Receiver CLK and corresponding RST waveform . . . . . . . . . . . . .
3-9 Simulated VIN and VREF waveforms . . . . . . . . . . . . . . . . . . .
3-10 Simulated IN1 and IN2 waveforms . . . . . . . . . . . . . . . . . . . .
3-11 Zoomed in plot for six cycles of IN1 and IN2 . . . . . . . . . . . . . .
3-12 Simulated output waveforms, Q and /Q . . . . . . . . . . . . . . . . .
54
3-13 Zoomed in plot of output waveforms corresponding to Fig. 3-11
. . .
66
Power dissipation of receiver circuit as a function of clock frequency
Average power dissipation at 1.0 GHz for data bus of varying size .
Average die area per bit for data bus of varying size . . . . . . . . . .
Example of evaluation speed definition: Cycle A evaluates correctly
but will not produce a satisfactory output. Cycle B does not evaluate
correctly. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .
68
69
69
vs. VREF with a slow time constant . . . . . . . . . . . . . . . .
Exponential VIN waveform discharging across VREF (top) and corresponding plot of total differential as a function of time (bottom) . .
Total differential for several different values of VREF..............--Total differential at three different evaluation times: before, on, and
after the zero crossing (teval-min) of total differential . . . . . . . . . .
Evaluation speed as a function of photodiode capacitance . . . . . . .
Evaluation speed as a function of detector photocurrent . . . . . . . .
Changes in evaluation speed as a function of channel length variation
Changes in evaluation speed as a function of temperature variation
Changes in evaluation speed as a function of threshold voltage variation
(Numbers refer to magnitude of VTN and VTP) . . . . . . . .
. .
. .
Changes in evaluation speed as a function of supply voltage variation
Absolute value of changes in evaluation speed relative to source variation percentage, as given by regression models in Table 4.1 . . . . . .
Changes in evaluation speed as a function of differential variation between input and reference side of latch . . . . . . . . . . . . . . . . .
Changes in evaluation speed as a function of differential variation between left and right input transistors . . . . . . . . . . . . . . . . . .
74
3-2
3-3
3-14
3-15
3-16
3-17
4-1
4-2
4-3
4-4
4-5
4-6
4-7
4-8
4-9
4-10
4-11
4-12
4-13
VIN
10
55
59
61
61
62
62
63
64
64
66
70
75
77
79
80
81
82
83
85
86
87
89
90
4-14 Changes in evaluation
tween input stage and
4-15 Changes in evaluation
tocurrent.......
speed as a function of differential variation bereference circuit . . . . . . . . . . . . . . . . .
speed as a function of input and reference pho..................................
5-1
5-2
5-3
5-4
5-5
5-6
5-7
5-8
5-9
5-10
5-11
5-12
5-13
5-14
5-15
5-16
General testing strategy for data receiver circuit . . . . . .
Schematic of DFF used in testchip . . . . . . . . . . . . .
Schematic of two input mux using transmission gates . . .
Schematic of FIFO buffer . . . . . . . . . . . . . . . . . . .
Data generator for driving off-chip laser source . . . . . . .
Receiver test circuitry using FIFO buffer . . . . . . . . . .
Multiplexing external and internal clock signals . . . . . .
Clock distribution . . . . . . . . . . . . . . . . . . . . . . .
Clock distribution waveforms (corresponding to Fig. 5-8) .
Phase-locked loop block diagram . . . . . . . . . . . . . .
Phase-frequency detector using DFF's with reset capability
Flip-flop with reset signal for use in PFD . . . . . . . . . .
Schematic diagram of charge pump . . . . . . . . . . . . .
Current-starved inverter with variable propagation delay .
VCO consisting of five current-starved inverters . . . . . .
Control circuitry for VCO: IIN sets offset frequency, and VIN
5-17
5-18
5-19
5-20
......................
.......
output frequency .....
VCO frequency as a function of control voltage for different
Frequency divider using toggle flip-flops . . . . . . . . . . .
Lead-lag loop filter for PLL . . . . . . . . . . . . . . . . .
Loop transmission bode plots using lead-lag loop filter . .
A-1
A-2
A-3
A-4
A-5
A-6
A-7
A-8
A-9
A-10
A-11
A-12
A-13
I-V Characteristics for NFET with W = 0.5 um, L = 0.18 um
I-V Characteristics for NFET with W = 1.0 um, L = 0.18 um
I-V Characteristics for NFET with W = 5.0 um, L = 0.18 um
I-V Characteristics for PFET with W = 0.5 um, L = 0.18 um
I-V Characteristics for PFET with W = 1.0 um, L = 0.18 um
I-V Characteristics for PFET with W = 5.0 um, L = 0.18 um
Current gain vs. frequency for 0.18 urn TSMC process . . . .
. . . . . . . . . . . .
fT crossover for minimum length NFET
fT crossover for minimum length PFET . . . . . . . . . . . . .
Transconductance, g,, of NFET with L = 0.18 um . . . . . .
Transconductance, gm, of NFET with L = 0.36 um . . . . . .
0.18 urn . . . . . .
Transconductance, gm, of PFET with L
Transconductance, gm, of PFET with L = 0.36 um . . . . . .
11
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
. . . . .
controls
92
93
96
96
97
97
98
99
100
101
101
103
104
104
106
107
107
108
offset levels 109
. . . . . . 110
. . . . . . 111
. . . . . . 112
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
.
120
120
120
121
121
121
122
122
122
123
123
124
124
12
List of Tables
1.1
Design requirements for optical clock and data receiver circuits . . . .
17
2.1
Equations for output of a low-pass filter driven by a pulse train
. . .
38
2.2
Net input current to latch for high and low bits . . . . . . . . . . . .
43
3.1
3.2
52
3.3
3.4
Summary of TSMC 0.18 um Digital Logic Process . . . . . . . . . . .
Circuit performance as a function of bias current and geometry for
input stage transistor, M10 .. ......
.......
....
........
Transistor sizes for receiver and reference circuit in 0.18 um technology
Summary of bias and performance characteristics in 0.18 um CMOS .
56
60
71
4.1
Comparison of regression models for different variation sources . . . .
87
5.1
Example PLL values using lead-lag loop filter
13
. . . . . . . . . . . . .
113
14
Chapter 1
Introduction
This thesis describes the design and testing of an optical data receiver circuit, with
special emphasis on variation robustness. Optical transmission provides an attractive
interconnect alternative in modern VLSI chips because of its speed and lack of parasitics. The ideas presented herein, and in previous theses by Sam and Lum [12] [8],
differ from those long pursued in the telecommunications industry in two important
ways. First, the communication is intra-chip, rather than inter-chip. Second, the
receivers are designed with monolithic CMOS integration in mind. Before optical
transmission can serve as a viable substitute for metal interconnect, researchers must
find (economical) ways to integrate it into standard CMOS processes.
Silicon Dioxide
Polysilicon
Output
Current
Input
Signal
Wafer
Photodiode
Laser Diode
Figure 1-1: Guided wave approach to optical interconnect
Figure 1-1 describes an optical interconnect scheme. Those familiar with optical interconnect call this the "guided wave" approach because waveguides physically
15
direct the optical signal across the chip, much like a fiber optic cable.
The transmission scheme in Fig. 1-1 works as follows. First, a laser diode generates
optical pulses. These pulses travel along a waveguide constructed from two sets of
materials with different dielectric constants, such as polysilicon and silicon dioxide.
At the end of the waveguide, light strikes a photodetector, which converts photons
into current. A receiver circuit amplifies the small output current into a rail-to-rail
digital signal.
The following sections outline several issues concerning optical interconnect, as
well as some previous work in the area. Section 1.3 concludes with a roadmap for the
following chapters and a short summary of the design strategy.
1.1
Issues
Optical interconnect presents unique design challenges because it requires monolithic
integration of all components. Furthermore, different applications have different design criteria. This section describes the challenges and constraints for two specific
applications - optical clock distribution and optical data transmission. The subtle
differences between the two motivate several design decisions in Ch. 2.
1.1.1
Clock vs. Data
Clock distribution presents a peculiar engineering problem. The signal is periodic,
predictable, almost everything on the chip needs it, and delay does not matter, so long
as the clock arrives everywhere at the same time. Unfortunately, in real life the clock
does not reach all gates simultaneously, but rather the signals are skewed relative to
one another. Generally speaking, clock skew is the single most important metric for
clock distribution, while power consumption and area are secondary concerns (due to
the relatively small number of receivers).
Figure 1-2 shows an optical clock distribution scheme [8]. This method distributes
the clock optically at a global level, but electrically at the local level where metal
interconnect causes little skew. Note that while either an on-chip or off-chip light
16
source can provide the signal, implementing the laser driver circuitry off-chip saves
real estate and power, and reduces noise.
Optical Clock Source
Waveguides
Local Electrical
Clock Network
Figure 1-2: Optical clock distribution scheme proposed by Lum [8]
Data transmission involves a different set of challenges. The data need not arrive
at exactly the same time, only within a certain window. Latency becomes the key
factor, because reducing latency allows the circuit to run at higher clock frequencies.
Also, with possibly thousands of copies of the receiver on a single chip, area and
power consumption become important issues.
Despite these challenges, a designer has one important advantage when developing a topology for a data receiver circuit - a clock signal. The pre-existence of a
synchronous signal forms the basis for many data receiver designs, including the one
presented in this thesis.
Data Issues
LATENCY is key metric
Presence of clock signal allows more
versatility in design
Will be replicated many times, must be
low power and small in size
Clock Issues
SKEW biggest factor
Latency a non-issue
Relatively small number of receivers,
less demanding on size and power
Table 1.1: Design requirements for optical clock and data receiver circuits
17
Table 1.1 summarizes the key issues pertaining to optical receiver design. An
effective clock receiver circuit minimizes skew, without much regard for power or size.
In comparison, a good data receiver minimizes latency while paying careful attention
to power consumption and real estate.
1.1.2
Challenges of Monolithic Integration
Monolithic integration means fabricating the transmitter, waveguide, detector, and
receiver circuit together in a single process. In order to provide an economical alternative to metal interconnect, these items should require as few process modifications
as possible. This set of constraints presents several design challenges.
First, since optical interconnect targets VLSI (CMOS) chips, the receiver circuit
should use MOS devices and minimize the number of passive components. Most
digital logic processes do not contain large built in capacitors, limiting the designer
to parasitics and MOS capacitors. Furthermore, such processes often use silicide to
reduce polysilicon or diffusion resistance, so resistors consume a tremendous amount
of area.
Second, CMOS integration can severely limit the quality of photodetectors. Silicon
detectors cannot be used with silicon waveguides because if one is transparent to a
certain wavelength of light, the other one will be as well. Thus, some sort of process
modification is inevitable. One particularly promising technology involves growing
germanium photodiodes directly on a silicon substrate [7].
1.2
Previous Work
This section looks at previous work in the field of optical interconnect, for both data
and clock applications. The first section outlines a pair of clock receiver designs
that illustrate some of the nuances of on-chip optical communication. These designs
form a basis for the variation analysis in Ch. 4, and some of the variation aware
implementation decisions in Ch. 3.
Next, Section 1.2.2 gives an overview of two data receiver designs, both of which
18
use sense amplifiers operating in positive feedback. In particular, the second circuit
serves as a basis for the design in Ch. 2.
1.2.1
Clock Receiver Circuits
Sam's thesis discusses the design of an optical clock receiver circuit, and sources of
skew in optical clock distribution. During her treatment of skew, Sam maps out a
set of four process and environmental variation sources that serve as a basis for the
variation analysis in Ch. 4: power supply, temperature, channel length, and threshold
voltage. In Sam's circuit, these variation sources contribute to clock skew by causing
asymmetrical clipping in the amplifier [12].
At the last amplification stage of the clock receiver, a sine wave passes through an
inverter, turning it into a full scale square wave. If the input signal of the inverter is
not biased exactly at the switching threshold, the inverter clips asymmetrically. This
produces subtle differences in rise and fall times, which ultimately translate into skew
and duty cycle variations [12].
Lum builds upon Sam's work by designing specifically with variation robustness in
mind. Additional feedback biasing keeps signals centered around the switching threshold of the inverters, while a linear voltage regulator rejects power supply variations.
A bandgap reference biases the entire circuit in an effort to eliminate temperature
variation effects [8].
Lum succeeds in eliminating skew due to environmental variations (power supply
and temperature), but he has less success designing around process variations (channel
length and threshold voltage). In the end, process variation sources still present the
biggest headaches for designers [8].
1.2.2
Examples of Clocked Data Receiver Circuits
As Sec. 1.1.1 points out, a pre-existent clock signal provides a competitive advantage
in data receiver design. The two designs in this section highlight that concept, using
the principle illustrated in Fig. 1-3.
19
Power
LATCHI
LATCH
IReference Diod~e
VDD
'_1111Ainput
Diode
Figure 1-3: Synchronous sense amplifier using positive feedback for amplification
The circuit operates as follows. A low signal on LATCH turns off the inverters.
Light shines on the input photodiode, inducing a small amount of current. This
current causes the input node to drift relative to the reference node, building up a
small differential voltage. When LATCH goes high, the inverters turn on again and
positive feedback takes over, amplifying the small differential all the way to the rails.
Also, with a high value on LATCH the circuit holds state, so it doubles as a flip-flop
for part of the cycle - a desirable feature since most data buses end in a flip-flop.
Input Node
VDD
--------A -
GND
Reference Node
LATCH
Light Input
1
-1
Figure 1-4: Waveforms for sense amplifier in Fig. 1-3
20
Figure 1-4 illustrates waveforms for the circuit in Fig. 1-3. Both the input and
reference node start at some metastable point. Optical input causes current to flow
onto the input node, increasing the voltage slightly. Positive feedback amplifies this
small potential difference when LATCH goes high.
The first design using this concept, shown in Fig. 1-5, consists of two inverters
in positive feedback connected to virtual power supplies controlled by a signal called
STORE. The reset signal provides a way to precharge the input and reference nodes to
the same level [2].
VDD
/STORE
VDD
VDD
Input
__
V
Diode *
-
"
w Reference
Diode
Q RST -/Q
STORE-
Figure 1-5: Positive feedback sense amplifier by Ayadi, et al. [2]
Signal STORE starts out low, while RST goes high for a short period of time. This
precharges the input and reference nodes (Q and /Q) to the same (metastable) state.
Optical input causes node Q to drift above /Q,
building up a potential across the
inputs. As soon as STORE goes high, the inverters turn on and the circuit rapidly
amplifies the input differential using positive feedback.
Unfortunately, the "virtual rails" in this circuit store charge when STORE turns off.
The inverters can still function for short periods of time using this charge reservoir.
Figure 1-6 shows the schematic diagram for a second design, which fixes the
charge storage problem by precharging all nodes to a predictable value. In this dia21
gram, Q and /Q precharge to ground, while the input and reference nodes charge to
(VDD
-
VTP)
[13]-
VDD
Reference
Input
/RST
/LAT
/LAT-
Figure 1-6: Positive feedback sense amplifier by Schaffer and Mitkas [13]
A high signal on /LAT disconnects the NFET and PFET of each inverter while
precharging Q and /Q to ground. Meanwhile, a short low pulse on /RST resets the
input and reference nodes. Switching /LAT from high to low reconnects the inverters,
and charge redistributes between /Q and the input node (and between Q and the
reference node).
Since Q and /Q both precharge to the exact same voltage, they take the same
amount of charge off the input and reference nodes. Thus, although charge sharing
still occurs, it happens in equal proportions so as not to cause bit errors.
The development here centers around the evaluation of a "one" bit in response
to a high optical input. However, the circuit must also evaluate "zeroes" in response
to low (or absent) optical inputs. There are several ways to accomplish this. One
might design mismatch into the circuit, creating an unbalanced sense amplifier that
evaluates zeroes by default. Differential signaling provides another alternative because
light always shines on at least one of the diodes.
22
For the sake of variation robustness, this thesis pursues a third option. A specialized reference circuit biases the reference side halfway in between a high and low
input, so that the input triggers the circuit on "one" bits, and the reference triggers
the circuit on "zero" bits.
1.3
Data Receiver Overview
The next chapter details the design of an optical data receiver circuit based on the
sense amplifier by Schaffer and Mitkas in Fig. 1-6. A few slight modifications improve
the performance of this circuit, but ultimately capacitance on the switching nodes
limits speed. Specifically, the photodiode adds 100 fF or more to nodes that normally
only have 5-10 f F.
This thesis proposes to isolate the large photodiode capacitance using a current
mirror. This technique reflects the small signal input current from the photodetector
to the input side of the sense amplifier, which remains a low capacitance node.
A special reference circuit drives an identical current mirror on the other side of
the sense amplifier. This circuit "averages" low and high input levels to produce
a signal halfway in between.
On high bits, the input signal exceeds the reference
and the circuit evaluates a "one," while on low bits the reference exceeds the input,
resulting in evaluation of a "zero." This strategy requires that an additional reference
bit accompany each optical data bus.
Chapter 3 outlines implementation and simulation of the data receiver in 0.18 um
CMOS technology. Discussion focuses on choosing bias parameters to maximize speed
while maintaining low power, small size, and variation robustness.
In an effort to quantify circuit performance, Ch. 3 defines the term evaluation
speed. Simulation results show an evaluation speed of 2.0 GHz for 10 uA of input
photocurrent, a 100 fF photodiode capacitance, and 305.74 uW of power dissipation.
Each receiver bit occupies an area of 133.56 um 2 , while the reference circuit takes up
74.20 uM 2.
Using these parameters, Ch. 4 performs variation analysis on the receiver. The
23
chapter begins by attempting to quantify how circuit parameters determine evaluation
speed. Then it presents simulation results for both uniform and differential variation
sources.
Uniform variation refers to changes that affect the whole circuit, such as a uniform
increase in supply voltage. Changes in evaluation speed are plotted as a function of
photodiode capacitance, input current, and the four variation sources outlined by
Sam: power supply, temperature, channel length, and threshold voltage [12].
Differential variation describes how the circuit behaves when the input and reference sides vary asymmetrically. The treatment in Ch. 4 looks at changes in evaluation
speed as a function of differential channel length variation in three different parts of
the circuit, as well as mismatches between the optical power of the reference and
input signals.
Chapter 5 details the implementation of a test chip submitted for fabrication in
December, 2001. The testing strategy uses an on-chip signal source to drive an offchip laser, ensuring synchronous optical input signals. The chapter also discusses the
design of a phase-locked loop for on-chip clock multiplication, and explains how to
stabilize the loop using an external loop filter.
Finally, Ch. 6 reviews the problem statement, along with summarizing the analysis, implementation, and simulation of the receiver.
It also presents some final
thoughts on the contributions of this thesis, and how these contributions are generally useful to future designers and researchers.
24
Chapter 2
Design
This chapter outlines the design and operation of an optical data receiver circuit. The
design is based on a synchronous sense amplifier by Schaffer and Mitkas (Fig. 1-6,
page 22) [13].
Discussion focuses first on the Schaffer and Mitkas latch, and some slight modifications that improve performance. After that, Secs. 2.2 and 2.3 describe the two main
contributions of this thesis, namely isolating photodiode parasitics with a current
mirror, and creating a reference circuit using current domain arithmetic.
2.1
Schaffer and Mitkas Cell
The Schaffer and Mitkas latch provides a good starting point because of its small
size and power consumption. However, optimizing the cell for maximum performance
requires a more in-depth exploration of its operation. Note that this thesis uses the
terms "sense amplifier" and "latch" interchangeably, since the Schaffer and Mitkas
sense amplifier also holds state.
2.1.1
Upside Down or Right Side Up
Figure 2-1 proposes flipping the Schaffer and Mitkas latch "upside down." In the
original version, the charge up path of Q and /Q goes through two PFET's, whereas
25
the charge down path goes through a single NFET. In the flipped version the two
series devices are NFET's.
LAT -
IN2
IN1
/A
/LAT-]
Q
/Q
/RST
/LAT
LAT
RST
-Q
ElF
IN1
IN2
Figure 2-1: Flipping Schaffer and Mitkas latch "upside down"
Assuming the mobility of electrons is roughly twice that of holes, two series
PFET's would have to be about four times as wide as a single NFET to ensure
equal rise and fall times, as shown in Fig. 2-2. Unfortunately, capacitance scales with
geometry, and these large transistors limit switching speed, so designers prefer to use
NFET's when multiple devices must be placed in series.
4W/L
dj
W/L
VOUT
4W/L
W/L
VOUT
W/L
W/L
Figure 2-2: Using NFET's for series devices reduces transistor size
26
Note that flipping the "polarity" also requires inverting the control signals. In
Fig. 2-1, /LAT becomes LAT and /RST becomes RST. The inputs, on the other hand,
need not be reversed. Since it is only necessary to build up a differential across the
inputs, the input stage can source or sink current. In other words, both configurations
in Fig. 2-3 work equally well. This fact becomes important in Sec. 2.2 because an
n-channel current mirror is much faster than a p-channel.
VDDnVDD
OR
Figure 2-3: Both configurations establish a differential across the input nodes.
2.1.2
Timing and Output
In a synchronous system, a clock replaces the latch signal, and the reset signal is
generated from the clock. For example, the circuit in Fig. 2-4 creates a pulse on the
falling edge of the clock, where the propagation delay of the inverters determines the
width of the pulse. In simulation, three inverters usually produce a pulse long enough
to reset the latch.
The following discussion on timing refers to the diagram in Fig. 2-5, which shows
the "upside down" Schaffer and Mitkas latch with transistor names and control signals
relabeled. Figure 2-6 plots the associated waveforms for this circuit.
Bit evaluation consists of two phases. First, CLK goes low and RST pulses high.
This turns off M5 and M6 while simultaneously resetting the input nodes (IN1 and
27
IN2) to the same value, notably VTN. The voltage never drops any lower than this
point because M7 and M8 turn off.
CLK
RST
Figure 2-4: When CLK goes low, the NOR gate temporarily goes high until the signal
propagates through the inverters.
VIDD
CLK-
M4 M2
M1 M3
Q
/Q
M6
M5
CLK
RST
IN 1
Input
Photodiode
IN2
M-1
wReference
M7
M8
Photodiode
Figure 2-5: Upside down Schaffer and Mitkas cell
After RST returns to zero, the cell begins building up a differential across the
inputs. During the first clock period in Fig. 2-6, the optical input
(IPHOTO)
is high.
This example uses photodiodes as current sinks, so a high optical pulse drains current
off IN1. The low optical input in the second clock period has no effect, so IN1 stays
constant.
In both clock cycles, IN2 drifts down slightly, indicating some sort of reference
input that sinks an intermediate amount of current, in between an optical one and
an optical zero. Section 2.3 discusses how to design such a reference.
28
----
-
-~
u--I.-.-
Undefined
Undefined
7ei
PHOTO
-
CLK
RST
R S T~IN1
IN1 &
(Input)
.4
-
IN2 (Ref)
IN2
-DD
DD
- VTN
VTN
Figure 2-6: Basic waveforms for Schaffer and Mitkas cell
In the second phase of evaluation, CLK turns on but RST stays low, connecting the
latch like a pair of cross-coupled inverters. Positive feedback amplifies the differential
until IN1 and IN2 saturate.
For example, look at the first clock period. Although IN2 drifts down slightly, IN1
(the input node) drops by more. When M5 and M6 turn back on, the two inverters (M3
and M7 on the left and M4 and M8 on the right) quickly amplify the differential using
positive feedback.
The reverse happens in the second clock period. With no optical input, INI stays
constant, but IN2 still drifts down slightly by default. Positive feedback once again
amplifies the differential, but this time in the other direction.
There are three important points about these waveforms that need to be mentioned. First, note that IN1 and IN2 saturate at (VDD
-
VTN)
on the positive side,
and ground on the negative side. NFET's make good pull-down devices, but bad
pull-up devices because they stop providing current once the source gets within a
threshold voltage of the gate.
In this case, M5 to M8 are all NFET's, so the inputs swing between zero and
29
(VDD
-
VTN).
On the other hand, outputs Q and /Q charge up through PFET's and
charge down through NFET's, so they swing all the way from zero to VDD. If the
outputs did not swing rail to rail, then transistors in the next stage would always be
partially on, dissipating large amounts of static power.
Second, notice in Fig. 2-6 that optical input has no effect when CLK is high. The
circuit functions with or without return-to-zero signaling.
Finally, Sec. 1.2.2 claims that, although charge sharing does occur in the Schaffer
and Mitkas latch, it cannot cause bit errors. The next section describes this important
phenomenon in more detail.
2.1.3
Charge Sharing
The output nodes, Q and /Q, have the same capacitance, CQ, and precharge to the
same voltage, VDD.
VDDCQ.
As a result, they both contain an amount of charge equal to
The input nodes, on the other hand, charge to different voltages. If IN2
charges to VIN2, then IN1 charges to (VIN 2 +AV), where AV represents the differential
built up across IN1 and IN2 due to an optical input. Both input nodes have the same
capacitance, CINWhen the clock signal goes high, M5 and M6 short the input and output nodes
together. The total charge on the input side becomes the sum of the charge on /Q
and IN1. Likewise, the charge on Q and IN2 pools together on the reference side.
Equations 2.1 and 2.2 express these relationships.
Qinput
=
VDDCQ + (VIN2 + AV)CIN
(2.1)
Qref
=
VDDCQ + VIN2CIN
(2.2)
The total charge on each side redistributes over the new capacitance, which is
effectively the sum of the capacitances on the individual nodes which are now shorted
together. Equations 2.3 and 2.4 provide expressions for the new voltages, Vj " and
V,'ef, on each side.
30
Qnput(23
CIN ± CQ
V'
=
ref
(2.4)
Qref
CIN + CQ
The difference between these new voltages represents the total differential across
the latch after charge sharing occurs. As shown in Eq. 2.7, the new differential, AV',
is smaller than the original differential by a factor cz
CQ,
but has the same sign. So
although charge sharing dilutes the magnitude of the differential, it does not change
the direction, and therefore cannot cause bit errors.
AV'
Vinput
(2.5)
Vef
(2.6)
AV'= Qi"pu
ref
7
CIN + CQ
AV'
(2.7)
CIN
=AV(
CIN + CQ
Note that decreasing the magnitude of the differential can potentially decrease
speed. Luckily, the capacitance CIN tends to be larger than CQ, so
C-Ic-
is usually
between 0.5 and 1.0.
--
VDDVTN
IN2
----
IN1
-
- VTN
Figure 2-7: Input waveforms including charge sharing
Figure 2-7 updates the waveforms to account for charge sharing. When CLK goes
high, charge immediately redistributes from the output nodes to the input nodes, and
the voltage on nodes IN1 and IN2 jumps up to a new value.
Charge sharing also produces glitching on the output nodes, Q and /Q. In other
31
words, when the clock turns on, Q and /Q charge down, IN1 and IN2 charge up, and
they meet in the middle at their new values, V
and V'ef. Figure 2-8 shows all
four waveforms superimposed upon one another.
VDD
IN2
-
-----
4
IN1
-
DD -VTN
VTN
Figure 2-8: Output waveforms including charge sharing
2.2
Current Mirror Input
Connecting a photodiode directly to the inputs of the sense amplifier places the
largest capacitance in the circuit - the junction capacitance of the photodiode directly on the switching nodes, IN1 and IN2. Adding a current mirror solves this
problem by isolating the photodiode capacitance, while still reflecting the small signal
photocurrent to the input of the latch.
2.2.1
Design
Figure 2-9 shows the evolution of a current mirror input stage. The left diagram
demonstrates the basic idea of a current mirror. In reality, M10 needs some sort of
bias current, which also gets reflected, as shown in the middle diagram. Adding an
identical source of bias current on top of M11 subtracts this offset away, leaving only
IPHOTO
at the output (right diagram). In a real design, p-channel transistors replace
the ideal current sources. Figure 2-10 shows the final configuration.
The current mirror input stage attaches to IN1 of the sense amplifier. An identical
set of transistors (M14 and M15) must be connected to the other side to preserve
balance.
32
VDD
VDD
VDD
ZBIAS
VDD
'BIAS
'BIAS
'PHOTO
M10
Ml
Ml
M10
M10
Ml
Figure 2-9: Adding a current mirror isolates the diode capacitance (left). However,
the current mirror needs bias current (middle), and this bias current must be subtracted to get the correct photocurrent out (right).
VDD
V
P
VDD
M13
M12
PHOTO
M11
M10
Figure 2-10: Input stage using current mirror
Figure 2-11 shows the new receiver circuit. The grey dotted line effectively outlines
a new sense amplifier, with inputs VIN and VREF. In this document, the term "sense
amplifier" or "latch" (or "Schaffer and Mitkas cell") refers to the basic sense amplifier
consisting of transistors M1 to M9. Transistors M11, M13, M14, and M15 are called the
"input transistors" because they directly drive the input nodes, IN1 and IN2. Finally,
the "input stage" refers to transistors M10 and M12, as well as the photodiode.
On a similar note, most analysis in this document deals with the voltages VIN and
VREF in Fig. 2-11. These nodes directly drive the input transistors, and therefore
control the input nodes, IN1 and IN2.
33
VDD
CLK--
VDD
V
M12
M1ME
VI0
M4 M2
M1 M3
CLK
RST
VDD
M13
VDD
IN1
IN2
V,
M14 HVE
M8
M7
M15
Figure 2-11: Full receiver circuit
2.2.2
Analysis
Figure 2-12 gives an equivalent small signal model for the current mirror input stage.
The photodiode consists of a small signal current source, IPHOTO, a junction capacitance, Cj, and a junction resistance, Rj.
Photocliode
.0
'PHOTO
jC
CGS10
VGS10
1
G.10
r1012
-F
VIN
CGS11
%J
Figure 2-12: Small signal model of current mirror input stage
Shorting the drain and gate of M10 (diode connected) causes the transconductance
source to look like a resistor with value 9m 1.10 Grouping the resistors and capacitors
in parallel condenses the circuit to the model in Fig. 2-13.
34
I PHOTO
Rjlr 0 lolIr1/g M10
C
+
CGS10
+
CGS11
VIN
Figure 2-13: Simplified small signal model of current mirror input stage
Reasonably sized integrated transistors usually have gate capacitances on the order
of 1-10
f F, whereas
most photodiodes have junction capacitances of 100 fF or more.
So, the other capacitors are insignificant compared to C,. Also,
1
gmio
will be small for
a well designed current mirror. Since the other resistors in parallel are quite large,
they can be ignored. These assumptions produce the approximate small signal model
shown in Fig. 2-14.
1gM10
PHOTO
CJ
VIN
Figure 2-14: Approximate small signal model of current mirror input stage
As the input photocurrent pulses high and low, VIN charges and discharges with
a time constant given by Eq. 2.8.
T
=-
(2.8)
Figure 2-15 shows VIN charging up and down with a time constant fast enough for
VIN
to saturate on every cycle. Unfortunately, a large photodiode capacitance often
limits the speed of the time constant, meaning that VIN does not saturate on every
cycle, causing the DC offset to drift around. This turns out to be one of the biggest
design challenges of all, so the next section takes an in depth look at the transient
behavior of VIN.
35
PHOTO
REF
IN
Figure 2-15: VIN vs-
2.2.3
VREF
for fast time constant,
T
Transient Issues
Define VLO as the steady state value of VIN with no optical input, and VHI as the
steady state value of VIN with maximum optical input. In other words, VLO is the
gate to source voltage on M10 required to sink the bias current, and VHI is the gate
to source voltage required to sink the bias current as well as the photocurrent. The
two voltages differ by an amount approximately equal to the photocurrent times the
small signal resistance, as given by Eq. 2.9.
VHI
~
VLO + IPHOTO
9m10
(2.9)
Comparing Eq. 2.9 to Eq. 2.8 reveals a tradeoff between voltage swing and speed,
since speed goes as the inverse of the time constant. Figure 2-16 displays a more
realistic set of waveforms for VIN in response to an arbitrary bit pattern. Notice that
VLO provides a lower bound, VHI an upper bound, and that the time constant is not
fast enough for VIN to swing the full distance between them in one clock period.
The dashed lines in Fig. 2-16 represent the "trajectory" of VIN, or rather the path
it would follow with an infinite clock period. The actual waveform always lies on one
of these paths, but jumps from curve to curve as the input switches back and forth.
In mathematical terms, the waveform in every period behaves as an exponential, only
time shifted and with a different set of initial conditions.
36
VHI
V IN
LO
PHOTO
Figure 2-16: Transient response of VIN for an arbitrary bit pattern
Near VLO, the waveform charges up faster than it charges down, causing the
DC value of VIN to drift upwards. Likewise, near VHI the DC value tends to drift
downwards. Amazingly, one can show that this "low frequency" drift actually has
the same time constant, T, as the dashed exponential traces in Fig. 2-16.
More formally, VIN is the output, v0 , of a single pole system satisfying the differential equation in Eq. 2.10. For simplicity, treat the driving term, iINR, as a simple
voltage source, v, (Eq. 2.11).
+ vo
=
iINR
(2.10)
RCi 0 + vo
=
v,
(2.11)
RC
0
The homogeneous solution to this differential equation is an exponential with time
constant T = RC. Assume a square wave drives the system, such that v, = (VHI-VLo)
on odd periods, and v, = 0 on even periods. In both cases, a constant term equal
to v, provides the particular solution. Then Eqs. 2.12 (charge up) and 2.13 (charge
down) represent the total solutions, where A depends on the initial conditions.
vo
=
v, + Ae-'t/
(2.12)
Vo
=
Ae-t/T
(2.13)
37
For a given period length, T, let x = !. Input v, starts by going high in the first
period, so solving the "charge up" equation with initial condition vo = 0 gives the
expression in Eq. 2.14 for vo during the interval (0 < t < T). At the end of the first
period (after time T), the value of v0 is given by Eq. 2.15.
vO(t)
=
v(1
-
e-t/T)
(2.14)
vO(t = T) =
v(1
-
e-x)
(2.15)
Starting at time T, the input goes low, and vo begins discharging. Using the new
set of initial conditions from Eq. 2.15 and the "charge down" equation yields the
result in Eq. 2.16 for (T < t < 2T). Notice that the exponential is time shifted by
an amount T. Equation 2.17 gives the final value for this period, which can then be
used as an initial condition for the next time period, (2T < t < 3T).
vO(t)
=
V8 (1
- e-X)e-(tT)/I
v0 (t = 2T) = v5(e-x -
(2.16)
(2.17)
-2x)
In this manner, one can derive the formula for vo at any arbitrary point in time.
Table 2.1 summarizes the first five periods.
Time Period
0 < t<T
VS[1-
T < t < 2T
v.[
2T < t < 3T
3T < t < 4T
4T < t < 5T
e- + e- 2 x )e-(t-2)/r
1- e- + e~ 2 x - e-3 x e(t-3 T)/I
v,[1- (1 - e-x + e- 2 x - e-3x + e-4xW-(t-4)/]
v0 (t)
1e-/
]
1 - e-x)e-(t-T)/
(
v,[1- (
(
vs [
1
-
Table 2.1: Equations for output of a low-pass filter driven by a pulse train
The expressions in Table 2.1 follow a distinct pattern. Namely, the formula for
odd periods (1, 3, 5, ...) follows the "charge up" form in Eq. 2.18, and the formula for
even, "charge down" periods (2,4,6,...) follows Eq. 2.19. Equation 2.20 defines a,
where n is an integer that refers to the period, (1 < n).
38
v0 (t) = v'(1 - ae-(t-nT)/r), n
vO(t)
=
n
vsae-(t-nT)/,
n
a=
=
0, 2, 4, ...
1, 3, 5, ...
-
(2.18)
(2.19)
(2.20)
(-e~x)k
k=O
Several other results follow naturally from these expressions. First, define 0 as
the final value of a as n goes to infinity (Eq. 2.21), which reduces to the expression
in Eq. 2.22.
00
/ =a(n -+ oo)
=
(-e-x)k
(2.21)
k=O
1
=
In steady state, v, = v,(1
-
+
(2.22)
) at the beginning of any charge up cycle, and
vo = v.0 at the beginning of any discharge cycle. Equations 2.23 and 2.24 express
this mathematically. Note that these equations only apply in steady state for a square
wave input.
vO(nT) = v,(1 - 3), n even
vo(nT)
= v, #,
n odd
(2.23)
(2.24)
From these two expressions, one can derive the size of the envelope, or rather the
fraction of the total voltage swing, (VHI - VLO), that VIN occupies. For example, look
forward at Figs. 2-17 to 2-19 (page 42). The two black lines on each graph outline
the envelope, and the actual signal can be seen by tracing the dotted exponential
"trajectories" back and forth between the the upper and lower envelope.
Equation 2.26 shows that envelope size is directly proportional to /, making
#
an important design parameter. In an ideal world, / would be one, and VIN would
occupy 100% of the voltage range.
39
envelope
=
envelope
v8# - v5(1 - 0)
(2.25)
v.(23 - 1)
(2.26)
Finally, Eqs. 2.27 to 2.33 derive the time behavior of the envelope, or in other
words, the DC drift. Equation 2.27 gives the value of v, at the beginning of any
discharge cycle (n = 1, 3, 5, ...). Ultimately this approaches v,/3, which expands to
the geometric series in Eq. 2.28.
Equation 2.29 defines y as the difference between v 0 (nT) and its final value. Expanding this expression and factoring out an e-(n+1)x yields Eq. 2.31, where the series
of exponentials is simply
#
(Eq. 2.32).
Finally, Eq. 2.33 makes the substitution
nT = t. As claimed, the DC drift depends only on the time constant T.
vO(nT)
Svs
(1 - e-
+ e- 2 x
-
-+ e-(n-)x _ e-nx), n = 1, 3, 5,...
(2.27)
=
vs(1 - e-x + e- 2x
y
-
v0#- v0 (nT), n = 1, 3, 5,...
(2.29)
y
-
Vs(e-(n+)x _ e-(n+2)x + - -
(2.30)
y
Svse-(n+)x(1
y
=
vse "xe-X
(2.32)
y
=
vssee-/T
(2.33)
v5f3
_- --
_ e- x
+
+ e-2x
e-(n+)x
-
e-(n+2)x +-
-
-)
(2.28)
(2.31)
If y represents the distance of the envelope from its final value as a function
of time, then (v3 - y) represents the equation for the upper envelope. Similarly,
- y) gives an expression for the bottom envelope. Equations 2.35 and 2.37
(vs(1 - ,3)
show expressions for the envelope as a function of time.
vtop
=
VS0 - y
(2.34)
vtop
=
vsO(1 - e-xe-t/)
(2.35)
40
Vbot
Vbot
(2-36)
Vs(1-)-y
s
-
vsi3(1 +
(2.37)
6 x 6 tT)
Once again, Figs. 2-17 to 2-19 plot the exponential "trajectory" curves in dotted
lines with the envelope superimposed on top of them in solid lines. Each diagram
shows the waveforms for a period of 6T, enough time for the exponential to reach
99.75% of its final value. A waveform like the one in Fig. 2-16 (page 37) can be seen
by tracing the exponentials back and forth between the upper and lower envelope,
starting at zero.
The key parameter in these diagrams is the ratio of the clock period, T, to the
time constant,
T.
In Fig. 2-17, T is only a fraction of
T.
This results in a narrow
envelope, and the system takes many clock periods to settle. The envelope takes up
vS(2,3 - 1) = 0.10Ov, or 10% of the total possible voltage swing.
Figures 2-18 and 2-19 show what happens as the period becomes an increasingly
larger fraction of the time constant. It takes fewer clock periods to settle, and the
envelope broadens. The envelopes occupy 20% and 38% of the total voltage range in
Figs. 2-18 and 2-19, respectively.
In fact, Fig. 2-19 exhibits a large enough swing for VIN to cross the 50% point on
every cycle. With VREF biased at 50%, this is a necessary but not sufficient feature for
correct evaluation. Chapters 3 and 4 discuss how the shape of VIN affects evaluation
speed and how to choose parameters for a specific process.
41
100 -
80-
!
Li.
60
---
--
40 5----
20-0
0
1
2
3
Time (Units of t)
4
5
Figure 2-17: VIN transients for period T
6
0 .2-
100
80-
0
-
-0-
20
00
1
2
3
Time (Units of t)
.1
4
"
Figure 2-18: VIN transients for period T
-
-.-
6
5
0 4T
- --
10080-
60
7540
0
0
1
2
3
Time (Units of t)
4
Figure 2-19: VIN transients for period T
42
5
0.8T
6
2.3
Reference Circuit
A sense amplifier cannot evaluate "ones" and "zeroes" correctly without an appropriate reference. One strategy involves constructing an imbalanced sense amplifier so
that the circuit has a tendency to evaluate one way or the other. The work in this
thesis goes another direction, keeping the amplifier itself perfectly balanced, while
providing a reference input halfway between a one and a zero.
1
IPHOTO
__
__
__
VDD
V IN
-
_
---
1__ VH1
VREF
-----
NLO
GND
Figure 2-20:
VIN
and
VREF
on a full scale voltage range
Figure 2-20 shows an ideal reference. The input waveform,
VIN,
swings back and
forth between its minimum (VLo) and maximum (VHI) values, while the reference
voltage splits it right down the middle.
Low Bit
High Bit
IIN1
9m l(VHI
9m14(VREF
IIN2
(IIN1 -
IIN2)
-
VLO)
= IPHOTO
-
VLO)
=
1'PHOTO
9m11(VLO
gml4(VREF
'IPHOTO
-
-
VLO)
VLO)
=
=
0
"IPHOTO
-jIPHOTO
Table 2.2: Net input current to latch for high and low bits
Table 2.2 explains the "ideal" choice of a reference voltage. When
no net current flows into IN1 of the latch. Small changes in
43
VIN
VIN
=
VLO,
around this point
change the input current by gmllXAVIN.
An input of VIN
VHI
designates a full
power optical input, so IPHOTO flows into IN1.
Voltage VREF, on the other hand, stays at
2(VHI
+ VLO),
So
$'PHOTO always flows
into IN2. The net current into the sense amplifier, (IIN1 - IIN2), has exactly the same
magnitude for high and low bits, but a different sign.
In conclusion, an ideal reference circuit should average VHI and VLO, or in the
current domain, average the "on" and "off" photocurrent. Unfortunately, this requires
knowledge of IPHOTO. Therefore, a single extra bit accompanies every optical data
bus to provide a DC reference of steady state optical power. This additional bit is
continually on, so the reference photodiode always sources a current equal to IPHOTO.
Given the linear relationship between current and voltage in small signal, creating
a reference voltage becomes trivial. Consider the scheme shown in Fig. 2-21.
'ZeroBit
'OneBit
Zero-Bit + 'OneBit
'ZeroBit + 'One Bit
2
Z
+
z
1---
'Zero Bit + IOne Bit
2
Z2
2
Figure 2-21: Averaging two inputs in the current domain.
In the current domain, addition consists simply of connecting two wires together.
In Fig. 2-21, the currents for a "one bit" and a "zero bit" add linearly and split up
into the bottom two branches based on the relative terminating impedances. If the
two impedances are equal, then current splits half and half.
Figure 2-22 replaces the impedances with current mirrors. Since M18 and M19 are
identical, they present equivalent loads and current still splits half and half. Both
44
transistors in Fig. 2-22 receive the desired reference current, so either one can provide
the reference voltage, VREF
2 (VHI
+ VLO)
OneBit
'ZeroBit
'ZeroBit + IOneBit
Zero Bit +
IOne
j
Bit
1
2
REF
ZeroBit + IOne Bit
2
REF
Figure 2-22: Identically sized current mirrors present the same impedance
Figure 2-23 proposes schematics for the ideal current sources in Figs. 2-21 and
2-22. Obviously, the input current for a high bit equals
(IPHOTO
+
'BIAS).
However,
a low bit consists only of bias current, so the photodiode can be omitted.
Bit
M17
ZeroBit
M16
One
M17
h
--
M16
Figure 2-23: A "zero bit" does not require a diode at all
Replacing the ideal current sources with the virtual inputs from Fig. 2-23 produces
the circuit in Fig. 2-24. Obviously, these transistors share a lot of nodes, including
VREF.
Figure 2-25 shows a condensed version of the full reference circuit. The sizes
of transistors M16 to M19 must match their counterparts in the receiver circuit.
45
VDD
VDD
M17 h-V,
M16
21BIAS +
PHOTO
B
_PHOTO
BIS+2
8
VREF
V,
BIS
PHOTO
2
VREF
Figure 2-24: Current averaging with photodiode input
VDD
DC Optical
VP-+IVI
N" M17
M7
V
A*-'
Input
19
VREF
Figure 2-25: Reference circuit shared by all bits in a data bus
All bits in the data bus share this reference circuit. For instance, the 128 bit bus
in Fig. 2-26 consists of 129 optical signal lines. The extra signal always transmits a
high bit, providing a reference of steady state optical power for the reference circuit.
A designer might choose to use several optical reference lines to increase variation
robustness (see Chapter 4 on variation). For each additional signal, one simply adds
another copy of the circuit in Fig. 2-25 in parallel.
46
----
EN
,..-~.-E-I-
128
2
1
0
VDD
VDD
VDD
--
-~
~ILiiJ
VDD
AC Optical
Inputs
VDD
Reference
DC Optical
Input
VVREF______________
R
_
Figure 2-26: Example of 128 bit optical data bus with a single reference circuit
2.4
Summary
Inverting the polarity of the Schaffer and Mitkas sense amplifier ensures that the
two series devices are NFET's rather than PFET's. A clock signal replaces the latch
signal, and the circuit in Fig. 2-4 (page 28) generates a reset signal from the clock.
During the first clock phase, reset pulses high, precharging IN1 and IN2 to the
same value. A reference input always sinks
I1PHOTO of current from IN2, while the
optical input controls the amount of charge flowing into IN1.
After the clock goes high, charge redistributes from the output to input nodes,
causing an immediate increase in IN1 and IN2 and a decrease in Q and /Q. This charge
sharing dilutes the magnitude of the differential across the latch, but not the sign,
so the circuit still evaluates correctly using positive feedback, as shown in Fig. 2-8
(page 32).
Adding a current mirror on IN1 isolates the photodiode capacitance, allowing the
latch to switch faster. Equation 2.8 (page 35) expresses the time constant on the
input node in terms of the junction capacitance and the transconductance of the
diode connected transistor, M10.
Section 2.2.3 discusses the transient behavior of VIN due to an arbitrary input
waveform. Specifically, the DC offset of VIN tends to drift around. On any given
47
cycle, VIN charges up or down exponentially, occupying some fraction (Eq. 2.26,
page 40) of the total voltage swing between VLO and
VHI
(Eq. 2.9, page 36). To avoid
this effect, the time constant should be small compared to the clock period.
A reference circuit averages the photocurrent from a high and low optical signal
to create a reference voltage that is equivalent to an input of 1I'HOTO. This requires
an extra, reference optical path. All bits in the data bus can share the same reference
voltage.
Figure 2-27 shows the final schematic of the receiver and the reference circuit. All
references to transistor names and node names in this thesis refer to this diagram.
48
1
Sense Amplifier
VDD
CLK -<M1 M3WM
(D1
Shared Reference Circuit
Current Mirror Input Stage
VDD
V
M5CLKM6
VDD
r~
(D1
iOptica
~
M21
M2Kc
N* M12
Data ln,, VIN
RST
EF
--L
1
N2N2M15
M9
M13
M7
VDD
VDD
V
M8
M1
DC
M17W
Optical
Reference
50
Chapter 3
Process, Sizing, and Simulation
This chapter describes implementation of the data receiver circuit in TSMC's 0.18 um
digital CMOS process, provided through the MOSIS prototyping service. Simulations
are done using Avant! Star-HSPICE@ and device models provided by MOSIS. The
next few sections provide a brief description of the process and sizing considerations,
followed by simulation results. Simulations use a test photodiode providing 10 uA of
bias current with a junction capacitance of 100 f F.
3.1
Process Overview
TSMC's 0.18 um digital CMOS process contains one poly layer and six metal layers.
The physical gate length is 0.16 um with an oxide thickness of 32 A for a 1.8 V supply.
The process also provides an additional set of transistors with a 70 A gate dielectric
for I/O interface at 3.3 V. For digital design, TSMC claims densities of over 100,000
gates per mm 2 , logic speeds of over 400 MHz, and a ring oscillator delay of 28 ps.
Table 3.1 summarizes some useful process characteristics [6].
Despite the relatively analog nature of receiver design, a digital process is the
appropriate testing platform because it demonstrates the feasibility of integrating
optical interconnect into VLSI logic chips. A digital process ordinarily imposes serious
constraints on the quality of passive components, but the design in Ch. 2 avoids
passive components altogether. These process constraints are a big reason why.
51
Parameter
Supply Voltage
Interconnect
Drawn Gate Length
Physical Gate Length
Gate Oxide Thickness
6T SRAM Cell Size
Ring Oscillator Delay
Leakage Current
Value
1.8 V
6 Metal, 1 Poly
0.18 um
0.16 um
32 A
4.65 um 2
28 ps
0.1 nA/urn
Salicide
CoSi 2
Metal
Via Fill
AlCu
Tungsten
Table 3.1: Summary of TSMC 0.18 um Digital Logic Process
Effective design requires an understanding of device performance, and how it
changes based on process parameters and bias conditions. In this case, information
about the current-voltage (I-V) characteristics, unity current gain frequency
(fT),
and transconductance (gm) proves useful. TSMC does not provide information about
these factors, so they must be extracted through simulation. Appendix A compiles
graphs of these parameters.
Figures A-1 to A-3 on page 120 show the I-V characteristics of a minimum length
NFET as width varies from 0.5 um (minimum) to 5.0 um. Page 121 shows the same
graphs for a minimum length PFET. According to the graphs, a minimum width
PFET can comfortably deal with currents on the order of 50-100 uA, while an NFET
can sink almost three times as much.
Figure 3-1 shows the test setup for finding fT. A DC voltage source biases the gate
through a huge inductor. This convenient trick essentially removes the voltage source
from the circuit during AC simulation. The fT frequency occurs where current gain
crosses one. Figure A-7 (page 122) plots current gain versus frequency for transistor
widths ranging from 0.5 um to 5.0 um. As expected, DC parameters (such as width)
have little or no effect on
fT.
The crossover points in Figs. A-8 and A-9 (page 122) show an
fT
of roughly
50 GHz for an NFET, and about a factor of three less for a PFET (17-18 GHz).
52
VDD
'OUT
LHUGE
V-BIAS
U
ACI N
Figure 3-1: Test setup for finding
fT
Finally, consider how the tradeoff between bias current and size affects gm. Figures A-10 to A-13 (pages 123 and 124) show how g,
varies for a diode connected
transistor with lengths of 0.18 um and 0.36 um. As expected, doubling the length
substantially reduces transconductance.
Also, notice that current tends to have more effect on g, in Fig. A-10, while
geometry is more effective in Fig. A-12. The trick is figuring out which parameter
gives a greater benefit relative to cost at the current operating point.
3.2
Sizing and DC Biasing
Extremely high performance chips operate around a gigahertz in 0.18 urm technology,
so this implementation targets 1.0 GHz as a nominal operating frequency. The following sections choose DC parameters for the latch, current mirrors, current sources,
and reference circuit.
Figure 3-2 shows the receiver layout for a single data bit, measuring 12.6 um wide
and 10.6 um tall. Figure 3-4 displays the layout of the reference circuit, measuring
7.0 um wide and 10.6 um tall. Sections 3.2.1 to 3.2.4 refer to these layouts.
53
Figure 3-2: Layout of receiver circuit for a single data bit (12.6 um x 10.6 um)
3.2.1
The Latch
In Ch. 4 (page 89), simulations show the latch to be fairly variation resistant, unless
transistor strengths are grossly mismatched. In general, as long as an appropriate
differential builds up across the inputs, the latch evaluates correctly. Therefore, increasing geometries in the latch provides little benefit in terms of variation robustness.
On the other hand, larger geometries in the latch decrease speed.
Increasing
transistor size increases capacitance, and the circuit requires more time to evaluate.
After all, the goal is to minimize capacitance on the switching nodes.
Given this tradeoff, the design utilizes minimum sized transistors in the latch,
with efforts to minimize capacitance on key nodes whenever possible. For example,
refer to Fig. 3-3, which shows the names and sizes of the transistors from Fig. 3-2.
54
0
0
0.360
0.360
0
0
[
0
360
0
M
0.500
0500
[00.! 500
0.500
10
0.500
o
S
Mo
Z
-0
-00
.
.1-1.500
1. 500
0.5W0
0.180
-
1.500
Figure 3-3: Dimensions and transistor names for receiver circuit. Large, unlabeled
transistors are MOS decoupling capacitors.
Devices M1 to M9 form the central latch. The four PFET's stack in series, with the
non-shared (higher capacitance) nodes connected to VDD. Transistors M5, M6, and M9
are also laid out in series. Once again, the critical nodes, IN1 (between M5 and M9)
and IN2 (between M9 and M6), occupy the shared nodes to minimize capacitance. All
devices measure the minimum size of 0.5 um in width and 0.18 um in length.
3.2.2
Current Mirrors
Chapter 2 derives an approximate expression for the critical time constant in the
receiver,
T-
--
9m10
(Eq. 2.8, page 35). A smaller r means faster transients, and faster
evaluation, with one catch. The product of input current, IPHOTO, and the small
signal resistance on VIN (
1
) determines the maximum voltage swing on VIN, as
shown in Eq. 2.9 (page 36).
Increasing gmio too much makes the voltage swing too small, and decreasing gmio
55
too much makes the input waveform too slow. Both of these effects can decrease the
magnitude of the differential that builds up across IN1 and IN2.
Geometry and bias current both control gm.
Increasing transistor size increases
capacitance on IN1, making the latch switch slower, but increasing bias current increases static power dissipation. So, both control knobs have tradeoffs.
In addition, the output waveforms of the latch affect dynamic (switching) power
dissipation. As shown in Fig. 2-8 (page 32), Q and /Q experience charge sharing with
IN1 and IN2. This glitching can cause unnecessary transitions in logic connected to
the output nodes. As Eq. 2.7 (page 31) demonstrates, charge sharing depends on
CIN,
which is dominated by the capacitance of the current mirrors.
Performance Factor
Power Dissipation
Speed - Phase 1
(Clock is low)
Dependencies
Reasons
Static power = VDDIBIAS
+ IBIAS
+ W, + L
1
more glitching at output
-> more dynamic power dissipation
t g.mo
4T
t gmio >4T
+ IBIAS
+ W, - L
- W, - L
Speed - Phase 2
-
'BIAS
t capacitance on VIN =- TT (negligible)
t gio =: 4 voltage swing on VIN
> 4 differential across IN1 and IN2
t gmio => 4 voltage swing on VIN
=> 4 differential across IN1 and IN2
(Clock is high)
-
W, + L
-
W, - L
CIN =>
t
CIN =
latch switches slower
Table 3.2: Circuit performance as a function of bias current and geometry for input
stage transistor, M10
Table 3.2 summarizes this complex set of dependencies. In essence, one must make
a tradeoff between speed and power dissipation, which requires considering operation
in both clock phases. For instance, in the first phase, a fast transient navigates VIN
around VREF quickly, but that means less voltage swing, which in turn means a smaller
differential for the sense amplifier to evaluate when the clock goes high. Furthermore,
all of these factors are functions of transistor size and bias current, making evaluation
speed a complicated, non-linear function. However, with the aid of simulation and
some educated guesses, a designer can converge on an acceptable solution.
56
First, recall the tradeoffs discussed in Sec. 3.1 for determining gm. Doubling the
length greatly reduces the transconductance. Also, NFET's have higher transconductance than PFET's. Given this information, and the unfavorable set of dependencies
for length in Table 3.2, it seems obvious to use minimum length NFET's for the current mirrors. Increasing length would make the circuit more variation resistant, but
the cost is too high.
Closer inspection of Fig. A-10 (page 123) reveals some interesting trends. Section 3.1 mentions that, depending on the bias point, sometimes current gives more
control over gm, and sometimes geometry has more effect. For a minimum length
NFET, current has a bigger effect. Decreasing returns on width start kicking in
around one micron for low bias currents, and around two to three microns for higher
bias currents.
A process of careful simulation finally converges on a value of 50 uA for IBIAS
with a transistor width of 3.0 um. This particular operating point yields a respectable
gIo of 625 uMHO. With a photodiode capacitance of 100
f F, the
time constant
comes out to about 0.16 ns, allowing VIN a 16 mV swing with very little DC drift at
a gigahertz. In fact, according to Eq. 2.26 (page 40), the envelope utilizes over 99%
of the total voltage swing.
These choices maximize differential build-up speed in the first clock phase, while
still providing a nice balance with evaluation in the second phase because the 3.0 um
wide current mirror does not load down the inputs too much. Power dissipation stays
just under 300 uW, which seems like a reasonable budget for a single bit. Of course,
power and speed can always be traded off for one another.
Figure 3-3 shows that capacitance on IN1 can be further reduced by folding the
input transistors. Voltage VIN occupies the drain node of M10, which effectively has
the capacitance of a transistor only 1.5 um wide. Likewise, IN1 occupies the shared
node of M11. A centroid style layout could reduce this capacitance even more.
57
3.2.3
Current Sources
The current sources, consisting of transistors M12, M13, M15, M16, and M17, supply
bias current to the current mirrors. As seen in the previous section, bias current
plays a large role in setting the DC operating point. Therefore, it would be nice to
scale up the geometries of these transistors, making them more variation resistant.
Unfortunately, capacitance scales with geometry.
This tradeoff can be partially circumvented using clever layout. As seen in Fig. 33, folding a transistor doubles the effective width, without increasing the size of the
drain. In fact, the shared node has a channel on both sides, so folding a transistor actually decreases total drain capacitance by one unit of sidewall capacitance. Doubling
length, on the other hand, approximately doubles the capacitance.
In conclusion, scaling the transistors up to
- =L 0.36 urmpoevaitnrimproves variation ro-
bustness, while increasing the capacitance by a factor less than two. Considering the
current sources contribute very little capacitance on the input nodes to start with,
this turns out to be a reasonable tradeoff.
3.2.4
Reference Circuit
In order to function as an appropriate sense amplifier, it is imperative that the circuit
be balanced on both sides. Furthermore, all of the analysis done in Ch. 2 relies on
the assumption that IN1 and IN2 have the same capacitance. Therefore, M14 and M15
must match M11 and M13 exactly. Likewise, the reference circuit (M16 to M19) must
match its set of input transistors (M14 and M15), or the reference voltage becomes
useless.
Figure 3-4 shows the layout of the reference circuit on the left, and the names,
locations, and sizes of the transistors on the right. Note that this layout makes every
effort to match the transistors perfectly, even folding transistors the same way as in
the receiver layout. Chapter 4 discusses the importance of symmetrical layout in a
sense amplifier.
The extra, unlabeled transistors in Figs. 3-2, 3-3, and 3-4 are MOS capacitors
58
used for local decoupling of power supplies. Adding these capacitors wherever there
is free space minimizes high frequency voltage supply noise.
0.360
0 0.360
0
0.50
0
0.500
X[
1.500
S1.500
Figure 3-4: Layout of reference circuit (7.0 urn x 10.6 urn) (left), along with dimen-sions and transistor names (right). Large, unlabeled transistors are MOS decoupling
capacitors.
In a real design, the photodiode has its own layout, which can vary tremendously
in size. Typical sizes range from 10 urn by 10 urn to 100 urn by 100 urn.
Table 3.3 (page 60) summarizes transistor sizes for both the receiver and reference
circuits, according to the labeling conventions in Fig. 2-27 (page 49). Note that adding
more bits only replicates transistors Ml to M15. A single reference circuit (M16 to M19)
can service the whole bus.
3.3
Simulation Results
Simulations use Avant! Star-HSPICE@. Due to the nonlinear nature of circuit operation, AC analysis is not possible, so the next section focuses on finding an appropriate
test waveform for transient analysis. The following sections present simulated out59
Transistor
Ml
M2
M3
M4
M5
M6
M7
M8
M9
Width
0.5 um
0.5 um
0.5 um
0.5 um
0.5 um
0.5 um
0.5 um
0.5 um
0.5 um
Length
0.18 um
0.18 um
0.18 um
0.18 um
0.18 um
0.18 um
0.18 um
0.18 um
0.18 um
M10
3.0 um
0.18 um
Transistor
M1l
M12
M13
M14
M15
M16
M17
M18
M19
Width
3.0 um
1.0 um
1.0 um
3.0 um
1.0 um
1.0 um
1.0 um
3.0 um
3.0 um
Length
0.18 um
0.36 um
0.36 um
0.18 um
0.36 um
0.36 urn
0.36 um
0.18 um
0.18 um
Table 3.3: Transistor sizes for receiver and reference circuit in 0.18 um technology
put waveforms and results. All simulations use a photodiode that supplies 10 uA of
photocurrent with a parasitic junction resistance of 100 f F.
3.3.1
Test Waveform
A good test pattern should exercise all the worst case scenarios, and hopefully exhibit
a 50% bit density to allow some representative power measurements.
Figure 3-5 proposes an input waveform, and plots the corresponding response of
VIN,
as per the test circuit in Fig. 3-6. As will be seen, this waveform indeed exercises
all three "worst case scenarios," namely a high bit after a series of low bits, a low bit
after a series of high bits, and an alternating sequence of ones and zeroes.
After a long series of high bits, VIN settles to its maximum value, VHI. A subsequent low bit requires the input node to discharge all the way down from its maximum
value, making this the most difficult low bit to evaluate. Usually, VIN starts the cycle
somewhere below VHI, so it does not have to discharge as far to evaluate an incoming
zero.
A similar situation occurs when a high bit follows a long string of low bits. The
input settles to VLO, the minimum possible value, making this combination the most
difficult high bit to evaluate.
Finally, the circuit must behave predictably in steady state, even if the input asks
60
it to flip back and forth repeatedly. The circuit should never fail on an alternating
pattern if it does not fail for the two previously mentioned patterns, because VIN
never reaches VLO or VHI. Nonetheless, a "1010..." pattern adds some redundancy
to the tests and shows off the DC drift behavior.
10
05
0
0
5
10
15
20
25
time (ns)
30
35
40
45
H
50
0
5
10
15
20
25
time (ns)
30
35
40
45
50
I~
0.61
0.605
0.6
0.595
Figure 3-5: Input test pattern (top) and corresponding VIN waveform (bottom)
VDD
VDD
1PHOTO
BIAS
C
4
IBIAS
E7>~
M10
M10
VIN
VIN
Figure 3-6: Test circuit for waveforms in Fig. 3-5
As claimed, the waveform shown in Fig. 3-5 exercises all three test cases, and has
approximately a 50% bit density as well. The input starts with a zero, followed by
20 consecutive ones, 20 zeroes, and five alternating "10" sequences.
61
Output Waveforms
3.3.2
As in Fig. 3-6 (page 61), a capacitor and a current source replace the photodiode.
Figure 3-7 shows the clock and input waveforms, as generated by Spice, and Fig. 3-8
shows the reset waveform plotted against the clock. The clock runs at one gigahertz,
and all sizing corresponds to that derived in Sec. 3.2.
I
L.-
1.8
10
1.5
1.2
Y
0.9
-J
5
0.6
0 15 5
0
20
2
30
5
40
45
-5
0.3
0
0
0
5
10
15
20
25
time (ns)
35
30
40
45
5
Figure 3-7: Simulation waveforms for clock (grey) and input photocurrent (black)
..........
1.8
CLK
RST
1.6
1.4
1.2
0.8
0.6
0.4
0.2
0
0
1
2
3
5
4
6
7
8
9
10
time (ns)
Figure 3-8: Receiver CLK and corresponding RST waveform
The simulated input and reference waveforms in Fig. 3-9 look similar to those
62
developed in Ch. 2, except with more noise due to capacitive feedthrough from the
switching nodes. Two important properties of these waveforms merit discussion.
V IN
610
_ VREF
VHI
605
-4
600
0
VLO
595
. .. . . . . . . . . . .
... .
. . . . .. . . . . . .
. ....... . .. ..
590.
0
10
5
15
20
25
time (ns)
30
35
40
45
50
Figure 3-9: Simulated VIN and VREF waveforms
First, the voltage range between VLO and VHI spans only 14.58 mV instead of
the predicted 16 mV. This discrepancy arises because gm10 changes over time as a
function of current. Plus, the output resistances of M10 and M12 have a small effect
on the total small signal resistance. Taking the other resistances into account and
averaging gm10 between the two operating points gives a more accurate predicted
swing of 14.76 mV.
Second, due to their small size, the waveforms are particularly susceptible to
voltage spikes that propagate to the node through parasitic capacitors. These spikes
constitute generic switching noise from CLK, RST, and even the evaluation of INI and
IN2.
Luckily, only the time average of VIN over the clock cycle matters, because that
determines the total amount of charge placed on IN1 and IN2. This time average
matches the ideal value pretty closely for the string of high and low bits, but the
alternating sequence has some destructive interference that causes it to utilize less of
the total voltage swing than expected (The calculations on page 57 predict over 99%
using Eq. 2.26).
63
These results justify the alternating pattern of the test waveform, as well as the
decision to increase gmio and push envelope utilization over 99%. Aiming for a utilization close to one hundred percent ensures that the actual waveform will meet a
specification slightly less.
1.8
IM
1.6
....-
IN2
VDD & GND
1.4
1.2
0.8
0.6
0.4
0.2
10
15
0
2
10
15
20
3
3
4
4
30
35
40
45
0
0
5
25
time (ns)
50
Figure 3-10: Simulated IN1 and IN2 waveforms
I-II
--- -- -- -- --- -- -.-..- -- -- -- -- -- .- --.-- -.-- -- --- -.
. - -- --- -- --- -- -- -- --- -- -- --. -
1.8
INM
IN2
1.6
VDD& GND
1.4
1.2
1
0)
- ...
-.
.
.
0.8
0.6
..
-
-..
-.......
-.
.....
. . ... . .. .. .7 . . . . . . .
0.4
0.2
0
1i 8
19
20
21
22
23
24
time (ns)
Figure 3-11: Zoomed in plot for six cycles of IN1 and IN2
Figure 3-10 shows the waveforms on IN1 and IN2 for the entire test pattern, and
Fig. 3-11 shows an enlarged portion for six cycles where the input switches from
64
a sequence of high bits to a sequence of low bits. These plots are the simulation
counterpart of Fig. 2-7 (page 31).
Figures 3-10 and 3-11 clearly demonstrate four things. First, IN1 and IN2 do not
go all the way to
VDD.
As Ch. 2 explains, M5 and M6 make bad pull-up devices, so
the output should be taken off Q and /Q.
Second, the "gain" from VIN tO IN1 is inverting. A high optical signal (first three
cycles in Fig. 3-11) causes VIN to increase, which turns on M11 and drains charge off
IN1, causing it to go down.
Third, Fig. 3-11 clearly shows the differential building up. As expected, the smallest one occurs on the fourth cycle when the input switches from high to low. It takes
some amount of time for VIN to discharge below VREF in Fig. 3-9. As a result, an
incorrect differential initially builds up, at least until VIN crosses VREF and begins
erasing it, ultimately building up the correct differential. In the other five cycles, VIN
starts in the right place, so a correct differential begins building up right away (as
soon as the reset signal turns off).
Finally, look at the spikes at the beginning of each evaluation phase in Fig. 3-11.
These result from the charge sharing discussed in Sec. 2.1.3 (page 30). When CLK goes
high, M5 and M6 short the outputs and inputs of the latch together. They immediately
redistribute charge and assume the same value, causing a spike in IN1 and IN2, and
a drop in Q and /Q.
Next, look at the output waveforms, shown in Fig. 3-12 for the entire test pattern,
and in Fig. 3-13 for the same six enlarged cycles as Fig. 3-11. Consider the following
two points about these waveforms.
First, Q and /Q exhibit the same charge sharing behavior just discussed for the
input nodes. Specifically, each time the clock goes high, Q and /Q charge down to
meet IN1 and IN2. This glitching can cause unnecessary transitions in combinational
logic attached to the output. This results in excess power dissipation, or in some
cases, logical errors. For instance, dynamic logic requires a glitch free input.
Second, notice that the output itself is a type of dynamic logic. Both outputs
start high, and one selectively discharges low. This also means that the signal only
65
U
U
U1
mU
*
*
i
-
stays latched for half the clock period. During the other half, both outputs are high.
One could possibly take advantage of this "dynamic logic" behavior to asynchronously signal data acquisition. For example, an XOR gate connected to Q and /Q
only evaluates high after the receiver successfully acquires a signal. This might allow
some sort of asynchronous optimization inside a single clock period, but the circuit
itself relies on a synchronous environment to function.
I!I SI I I ;
1.8
-1
-Q
/.
...
VDID & GND
1.6
1.4
1.2
0.8
0.6
0.4
0.2
I ..1..
0
0
5
10
20
15
25
time (ns)
.11 1
35
30
40
45
50
Figure 3-12: Simulated output waveforms, Q and /Q
-
1.8
Q
-
1.6
/Q
VDD&GND
1.4
1.2
0)
(D 1
0.8
......
.. ... . .. .. ... -
0.6
0.4
0.2
................
0
8
19
21
20
22
23
24
time (ns)
Figure 3-13: Zoomed in plot of output waveforms corresponding to Fig. 3-11
66
--
-
3.3.3
Simulation Measurements
The last section presented and explained the simulation output waveforms.
This
section looks at how the circuit performs in terms of power, speed, and size.
Total power consumption breaks down into two parts. Static power refers to power
dissipated continuously due to DC bias currents. Dynamic, or switching, power occurs
due to the charging and discharging of capacitors during operation, so it depends on
frequency (Eq. 3.2). The sum of the two gives total power dissipation, as shown in
Eq. 3.1.
Ptotal =
Ptotal
Pstatic + Pdynamic
+ /3 fCLK
=
(3.1)
(3.2)
Static power dissipation can be calculated by multiplying the supply voltage times
the total DC bias current. The receiver uses three legs of bias current, each supplying
the same amount of current, IBIAS. The reference circuit, on the other hand, uses two
units of bias current plus one unit of photocurrent. Remember that light continually
shines on the photodiode. This produces a DC current, so it must be included in
static power calculations. Equations 3.4 and 3.6 give calculated values for static
power consumption.
Preceiver-static =
VDD(
Preceiver-static =
270
3
IBIAS)
uW
(3-3)
(3.4)
Preference-static =
VDD(2IBIAS + IPHOTO)
(3.5)
Preference-static
198 UW
(3.6)
=
Figure 3-14 plots simulated values of power dissipation for frequencies ranging
from 750 MHz to 1.25 GHz. As expected, power increases linearly with frequency.
Running a linear regression using least squares gives the dotted line in Fig. 3-14, and
values for a and
3 in
Eq. 3.2. The reference circuit dissipates only static power, so it
does not depend on frequency. Equations 3.7 and 3.8 give extracted values for power
67
dissipation in the receiver and reference circuit as a function of frequency.
314
-
Measured Values
Fitted Values
312-310308- 306-
.
3040
a-302 -300 -298LL
0.75
0.8
0.85
I1
LL
0.95
1.05
Frequency (GHz)
0.9
1.1
1.15
1.2
1.25
Figure 3-14: Power dissipation of receiver circuit as a function of clock frequency
Preceiver
(uW)
Preference(UW)
=
279.456 + 26.433fcLK(GHz)
(3.7)
=
196.454
(3.8)
Simulated values for static power dissipation match closely with the calculated
values. The constant term in Eq. 3.7 differs from the calculated value by 9 uW, while
the measured value in Eq. 3.8 differs from the calculated value by only 2 uW.
It helps to know how much the reference circuit affects power and area consumption for a whole data bus. The average power dissipation per bit, as shown in Fig. 3-15,
consists of the power dissipation of the receiver circuits plus the power dissipation of
a single reference circuit averaged over the number of bits. As the number of bits
becomes large, the average power dissipation approaches the power dissipation of a
single receiver circuit.
Figure 3-16 shows a similar plot for receiver and reference circuit areas.
Each
individual receiver circuit takes up an area of 133.56 um 2 , and each reference circuit
takes up 74.20 um 2 . However, as the size of the data bus increases, the cost of the
reference is averaged out over all the bits.
68
- (K I
Avg Receiver Power
Avg Reference Power
Total Avg Bit Power
400
C
0
-~--- -
ca 300
- - - -
--- -
.
- - - -
(200
0.
0
a
S100
...........
0C
0
10
. . .. . . . . . . . . . . . . . . . ..
20
30
Number of Bits
. . . . . . . . . .I. . . . . . . . . . . . . . .
40
50
60
Figure 3-15: Average power dissipation at 1.0 GHz for data bus of varying size
200
-
-
Avg Receiver Area
Reference
.Avg Area
Total Avg Bit Area
s 150
"E
Z
a 100
-------------~
'-----------
---
--
+-------
- --- -
.-
50
0
0
10
20
30
Number of Bits
40
50
60
Figure 3-16: Average die area per bit for data bus of varying size
For variation robustness, a designer might choose to have multiple reference circuits for a large data bus. In this case, Figs. 3-15 and 3-16 become very important
because they clearly demonstrate the average cost of a reference circuit given a certain
number of bits.
The final, and most important parameter of interest is evaluation speed. Variation
analysis in Ch. 4 deals predominantly with how variation affects the waveforms on
VIN
and VREF, the most sensitive nodes in the circuit. Hence, the most useful definition
of evaluation speed should measure whether the circuit accurately compares
VREF,
not whether the circuit produces glitch free output.
69
IN
and
-------
U
--
~-~---,~
Therefore, evaluation speed is defined as the maximum clock frequency for which
all bits are evaluated correctly. In other words, the circuit makes a correct decision.
The nodes of interest in this measurement are IN1 and IN2.
1.8
-
1.6 ~
IN1
--
IN2
--
VDD&GND
1.41.2-
0
0.2
--
10
12
16
14
18
20
22
time (ns)
Figure 3-17: Example of evaluation speed definition: Cycle A evaluates correctly but
will not produce a satisfactory output. Cycle B does not evaluate correctly.
For example, take the plot in Fig. 3-17. Despite the messy waveforms, cycle
A "evaluates" correctly. Nodes Q and /Q will not display satisfactory outputs, but
the circuit begins evaluating in the right direction, indicating that VIN and VREF
established a correct differential across the inputs of the sense amplifier. Therefore,
A is considered a correct evaluation. Cycle B, on the other hand, evaluates the wrong
way, indicating an incorrect differential. Evaluation speed is the maximum clock
frequency for which no errors (like cycle B) occur on any bit.
Under the operating point established in Sec. 3.2, the receiver circuit reaches an
evaluation speed of 2.000 GHz. Table 3.4 summarizes the bias point and performance
parameters discussed in this chapter.
70
Bias Parameters
Latch W/L
Current Mirrors W/L
Current Sources W/L
IBIAS
Simulation Results
0.5 um / 0.18 um
3.0 unm / 0.18 um
1.0 unm / 0.36 um
50 uA
Simulation IPHOTO
10 uA
Simulation C
Receiver Power Dissipation (1.0 GHz)
Reference Power Dissipation
Receiver Area
Reference Area
Maximum Evaluation Speed
100 fF
305.74 uW
196.45 uW
133.56 urn2
74.20 um 2
2.000 GHz
Table 3.4: Summary of bias and performance characteristics in 0.18 um CMOS
71
72
Chapter 4
Variation Analysis
A practical design must not only function correctly, but do so despite a myriad of
variations in the environment and the process.
There are two types of variation. First, uniform variations can occur throughout the chip for a variety of reasons. Resistive drops in the package can decrease
the effective internal supply voltage. Slight process errors might increase all drawn
dimensions by a small amount. The temperature of the operating environment can
vary tremendously from the test environment, and so on.
Second, differential variation can seriously affect circuits which rely heavily on
transistor matching, including the receiver circuit presented in this thesis. If the
two sides of a sense amplifier vary non-uniformly, the circuit becomes mismatched,
possibly causing incorrect operation.
This chapter begins by discussing how variation affects circuit operation, followed
by a presentation of simulated results for uniform and differential variation, respectively. The last section summarizes key points, and suggests some ways to design
around variation.
4.1
Receiver Variation Overview
Chapter 2 talks about the benefits of a fast time constant on VIN-
This section
expands on that by trying to quantify the differential across IN1 and IN2 as a function
73
U-
--
of the time constant, T, clock period, T, and reference voltage level, x.
Voltage VIN controls the gate of M11, and therefore the current into IN1. Likewise,
VREF supplies the gate voltage of M14, and determines the current into IN2. The total
differential across IN1 and IN2 equals the net charge divided by the capacitance.
Since current expresses the rate of charge flow, the change in differential, AViff , is
proportional to the difference in currents (Eq. 4.1), which in turn is proportional to
the difference in VIN and VREF (Eq. 4.2).
AViff
oc
(IIN1
AVdiff
OC
(VIN
Vdiff
IIN2)
(4.1)
VREF)
(4.2)
-
-
c J(VIN
-
VREF)
dt
(4.3)
In other words, the quantity (VIN - VREF) determines the rate and sign of the net
charge flow onto the input nodes. Integrating this quantity over time gives the total
differential, as shown in Eq. 4.3.
PHOTO
AR
IN
--
~*~~REF
Figure 4-1: VIN vs- VREF with a slow time constant
For example, look at the second clock cycle in Fig. 4-1. The integral across the
whole clock period equals the area of region B minus the area of region A. In this case,
the integral equals zero, so no differential builds up across the input nodes. However,
with a faster time constant (or longer clock period), region B would be bigger than
region A, resulting in a net positive differential.
74
'12
uflm
zM
L
-~
-
Figure 4-2 shows a more realistic plot of VIN for a period of 4T. The voltage swing
between VLO and VHI is normalized to 1.0, such that VLO
VHI
0-0, VREF = 0.5, and
=
= 1.0. The "normalized differential" in the bottom graph refers to the geometric
area of the dark shaded region in the top graph minus the light shaded region.
1
0)
-
0.8
REF
.-
0
0.4 .. . . .
00.2 .IN
z
01
0
0.02
0)
-
--
> 0.6
- . .. .
-
-.-.-.-
..
V--
0.1
-.
.. . .
..
-..
0.4
0.3
Time (ns)
0.2
0.5
0.6
0.5
0.6
.-.-.-. .-.-.: .-..- .-.-.
.-.
Total Differential:
0-
0
0
0.1
0.3
Time (ns)
0.2
0.4
Figure 4-2: Exponential VIN waveform discharging across VREF (top) and corresponding plot of total differential as a function of time (bottom)
The graphs in Fig. 4-2 operate as follows.
At some point in time, the input
switches low after a large number of high bits. As Ch. 3 explains, this constitutes the
worst case scenario for evaluating a low bit.
Voltage VIN has previously settled to VHI, but now begins discharging exponentially with time constant, r. Since (VIN
-
VREF)
> 0, a positive differential accumu-
lates on the input nodes. This differential increases until (VIN
-
VREF)
= 0, at which
point the total differential curve in Fig. 4-2 reaches a local maximum.
After VIN crosses VREF, the differential begins decreasing because (VIN-VREF) < 0-
At some point, the differential returns to zero, indicating that the (incorrect) positive
differential has been entirely erased. After that point, VIN remains below VREF and
the differential continues decreasing.
75
Chapter 3 defines evaluation speed as the maximum clock frequency for which
all bits are evaluated correctly (page 70). This requires only that the circuit make a
correct decision, not produce usable output. However, defining evaluation speed this
way makes it useful for measuring the calibration between VIN and
VREF,
a critical
metric for variation analysis.
In fact, making a correct decision only requires that a correct differential build up
across IN1 and IN2. In Fig. 4-2, the circuit can evaluate correctly any time after the
zero crossing of total differential. Therefore, by definition, the zero crossing of the
total differential curve determines evaluation speed.
A differential builds up during the first half of the clock period, and the sense
amplifier evaluates it during the second half. So one half of the clock period, T
must be greater than teval-min, the minimum time required to establish a correct
differential. Of course,
teval-min
is the zero crossing of Vdiff(t), the total differential
as a function of time (Eq. 4.4). This relationship between clock period and
teval-min
translates into the expression for maximum evaluation speed in Eq. 4.5.
Vdiff(teval-min)
=
feval-max
=
(4.4)
0
1
Integrating
(VIN -
(4.5)
teval-min
over time provides an analytic expression for
VREF)
which can be solved to find
2
feval-max.
Vdiff
(t),
The following derivations continue to ignore
constant terms, instead normalizing the voltages to one. Equations 4.6 and 4.7 give
expressions for VIN and
VIN
VREF
as a function of time for a low optical input (assuming
starts at VHI).
VIN
=
VREF
=
(4.6)
X0
(4-7)
< X <x )
Equation 4.8 expresses total differential as an integral of
(VIN -
VREF),
ignoring
constant terms. Substituting Eqs. 4.6 and 4.7 into this expression and integrating
76
from 0 to I gives the expression for Vdiff(t) in Eq. 4.11.
j
VIN()
VREF(t')
Vdiff(t)
=
Vdiff(t)
=
Vdiff(t)
=
-Te-t'/'
Vdif (t)
=
T(1 - e-t/T)
-
(4-8)
dt'
(4.9)
x dt'
e-t' I -
xt'
-
I
(4.10)
xt
(4.11)
Notice that Eq. 4.11 expresses Vdiff in terms of x, the value of the reference
voltage. Ideally, x = 0.5, corresponding to a value of VHI+VLO.
VREF
However, in real life,
can move around due to variations, making either high or low bits evaluate
more slowly. Figure 4-3 plots total differential as a function of time on high to low
transitions for VREF levels ranging from 0.25 (vH'rVLO) to 0.75
(3(VHI
VLO)).
0.08
0.25
0.06
-
-
0.02
--
---
--
Z -0.04
-
-
-0 .0 4 ...
0
.V
0.2
-
-
0.75
-
-
-.-.--.-.-.--
-.-.-
---.-.-.-
0.3
0.5
-
--
-
0.1
REF=
-
-
-
-
0
0
0 -0 .02
- -REF
-.--.-..
-
_ REF
-
-
0.4
Time (ns)
-
-
0.5
0.6
0.7
Figure 4-3: Total differential for several different values of VREF
Finally, given the time constant, one can set Eq. 4.11 equal to zero and find
teval-min. Chapter 3 calculates a T of about 0.16 ns (page 57). Solving Eq. 4.4 with
this value (and x = 0.5) gives teval-min = 0.2546 ns, corresponding to a maximum
evaluation speed of 1.964 GHz. This compares favorably with the simulated value of
2.000 GHz from Ch. 3.
Note that these calculations come out correctly without the constant scalars.
77
Adding a constant in front of Eq. 4.11 changes its magnitude, but not the location of its roots. So, normalizing everything to one does not affect the outcome of
the calculations.
As a final illustration, Fig. 4-4 plots VIN, VREF, and total differential using the
value of
T
calculated in Ch. 3. The term
teval
represents three possible times at which
the circuit could evaluate. In other words, the clock goes high at teval.
In the top panel, teval comes before the zero crossing, so
circuit (incorrectly) evaluates a high bit. In the second panel,
Vdiff(teval)
teval
> 0. The
occurs exactly at
the zero crossing of Vdiff(t), namely 0.2546 ns. In this case, no differential exists;
the circuit produces an indeterminate (and unpredictable) result. In the third panel,
teval
occurs after the zero crossing. The circuit correctly evaluates a low bit.
The results in this section apply to both high and low input transitions. Due to
the symmetric nature of the charge up and charge down waveforms, teval-min is the
same in both cases, assuming x = 0.5.
78
1I
.....................................................
COM~~
0
.. . . ..
>0.6
VI
-
evalq
0.
. . . ... .. . . V REF.
1...
..
(D
N
.. . . . . . ... . .I
ZO .2
0
0
0.6
0.5
0.4
0.3
0.2
0.1 1
IN
I
'D 0.4
. 0 . . . . . . . . . . . . ... . ... . .. .. . .. . . ..
0
0. 4
012030.4
.. . .. .
.5..0.6.
. ..
..
.. . . . . .. . . . . . .
E
E
0 0 .4 . ..
. .I.. . . .. ....
.. .. .. . .. .. . .. .
0
0.1
0
M
1
0.03..............
0.2
1
0.3
.....................................
I
0.0.........
0 .6
0.5
0.4
Tota
.......... ... .............
.. ..
.. . . . ... . .
-0.01....
0- 0.2...................
. . . . . . . . . . . . . . . . . . ... . . .
. ..
Df
etal
...
....... ......... ............... ...............
01.1
-0.03
zero
crossin
the....
(tea...n
Total ifferential a he ifrn vlaintms eoe n
Figure 4-4:
of tota.diferetia
.0 . . ...
0.......
9
I... ......
..
n
fe
4.2
Uniform Variation Results
Uniform variation means the same parameter changes by the same amount everywhere in the circuit. This section first discusses how evaluation speed changes with
photocurrent and photodiode capacitance. These parameters determine speed over a
much larger frequency range, so they act more like design parameters than variation
sources.
After discussing photocurrent and capacitance, Secs. 4.2.3 to 4.2.6 focus on the
four variation sources originally outlined by Sam: supply voltage, channel length,
temperature, and threshold voltage [12].
4.2.1
Photodiode Capacitance
Photodiode capacitance determines the time constant on the input node. Increasing
the capacitance by a factor of ten increases T by ten, which should decrease evaluation
speed by a factor of ten.
'4.
'02.5
. . ..
.... . . . .
.. . . . . . . . .. .. .. .. ..
. .. ... ..... . .. .. .
. . . . . . . ... . . .
. .. .
.......
a)
0)
C
0
02
. .. .. ..
. . .. .. .. . .. .. ...
.. ..... .. .. .... ... .. . ..... .. . .. . . .. .. .. ..... .. . .
E 1.5
CR
40
60
80
100
120
140
160
Photodiode Capacitance (fF)
180
200
220
240
Figure 4-5: Evaluation speed as a function of photodiode capacitance
Figure 4-5 shows the simulated frequency versus capacitance graph for values of
C, ranging from 25 fF to 250 f F. The curve is slightly convex to the origin because
of the inverse relationship between capacitance and evaluation speed. However, at
80
lower values of Cj, parasitic capacitance in the transistors becomes significant, and
the inverse relationship breaks down.
Photocurrent
4.2.2
Changes in input current only scale the magnitude of the input waveforms, not the
time dependence. So although scaling down the input photocurrent degrades overall
circuit performance due to smaller differentials, it should not affect evaluation speed,
which only requires making a correct decision.
However, in reality photocurrent does affect evaluation speed, for two reasons.
First, the changing photocurrent causes small fluctuations in gm10, and therefore the
time constant.
More importantly, the simulated waveforms have a lot of switching noise on them
(see Fig. 3-9, page 63). Increasing optical power enlarges the voltage swing on VIN,
making noise less significant.
E
0.5
2
4
6
12
14
8
10
Input and Reference Photocurrent (uA)
16
18
20
Figure 4-6: Evaluation speed as a function of detector photocurrent
Figure 4-6 shows how evaluation speed changes with photocurrent. At high current
levels, additional increases provide only a marginal advantage (by increasing gm10).
At low current levels, evaluation speed starts dropping off quickly because switching
noise begins to dominate.
81
Remember that this analysis assumes the reference circuit and data bits both
receive the same amount of photocurrent. Section 4.3.4 looks at what happens when
the currents are mismatched.
4.2.3
Channel Length Variation
In general, designers use the minimum value for channel length. This makes length
particularly susceptible to variation because it is generally the smallest drawn dimension on the chip.
Figure 4-7 plots percent changes in evaluation speed for increases in length ranging
from 0% to 40%. Unfortunately, TSMC does not provide models for lengths less than
0.18 um, restricting simulations to one-sided variation. Note that a 5% increase in
length corresponds to an increase of 0.009 um in all transistors, even those with
lengths greater than 0.18 um.
40
-
30 ......
U)
........................
0......................................
a)
0
.........
1- -
.
-..-.
....... ..........--
Measured Values
Fitted Values
......
- .-.-- . - - .-.-.
. .
-.-.- . . -.
- -.-.--.-
Ql).
CO
o
.
0
10 ............................
-40 0 - .
0
....
5
.......
....
...............................
2..2..3.....4
1.
10
20
15
25
30
% Change Channel Length (based on minimum L = 0.18 um)
35
40
Figure 4-7: Changes in evaluation speed as a function of channel length variation
Increasing gate length has two effects.
First, the transconductance decreases,
effectively making transistors slower. Second, it increases gate capacitance, which
also tends to make circuits slower. For these reasons, the speed of digital circuits
usually depends heavily on gate length.
82
Data receiver performance, on the other hand, is limited by the photodiode capacitance, not parasitics. So increasing gate length in a uniform manner causes only
slight performance degradation, mostly because gmio goes down.
Frequency and channel length exhibit an approximately linear relationship. A
least squares regression produces the dashed line in Fig. 4-7, and the expression in
Eq. 4.12.
=
A feval-max(%)
4.2.4
(4.12)
-1.4670 - 0.1592AL(%)
Temperature
During normal operation, environmental temperature can vary over a wide range.
Furthermore, depending on the kind of circuitry surrounding the receiver, local temperature can greatly exceed that introduced by the environment alone. Modern chips
give off a tremendous amount of heat, and dense clusters of logic running at high
speeds can exhibit hot spots in excess of 75 degrees Celsius [3].
40
-Measured Values
- - Fitted Values
30 ..-..------
0
10 - -
-
-
-7
-0)20-
-40
-20
-15
-10
0
-5
5
10
15
20
% Change in Temperature (from 300 K)
Figure 4-8: Changes in evaluation speed as a function of temperature variation
Therefore, it is extremely desirable that the circuit be resistant to temperature
variation over a wide range. Unlike channel length and other sources, where a 20%
or 40% variation is unlikely to ever occur, a circuit might reasonably be expected
83
to perform over the entire range of temperatures in Fig. 4-8 (ranging from -33.15 to
86.75 degrees Celsius).
Changing temperature affects circuit performance in a variety of ways. Most notably, transconductance depends strongly on temperature. As temperature decreases,
9mio
increases, and T decreases, boosting evaluation speed. Evaluation speed varies
approximately linearly with temperature, as given by the regression in Eq. 4.13.
Afeval-max(%)
4.2.5
=
-1.1952 - 0.3531A T(%)
(4.13)
Threshold Voltage
For most digital circuits, increasing threshold voltage slows down the logic because
it reduces the effective gate overdrive, (VGS
-
VT).
In analog circuits, bias currents
usually set the small signal parameters, making performance more or less independent
of threshold voltage.
However, Fig. 4-9 shows a rather peculiar relationship between frequency and
threshold voltage. Note that an "increase" in threshold voltage really refers to an
increase in magnitude, since VTP is negative for PMOS devices. These measurements
sweep AVT from -0.1 V to +0.1 V, which comes out to approximately plus or minus
20% for VT on the order of 0.5 V.
Two things stand out in Fig. 4-9. First, the relationship is positive rather than
negative. Increasing threshold voltage actually increases speed. Second, the relationship is quite strong. Evaluation speed changes by nearly 30% as threshold voltage
sweeps a range of 40%.
Although many factors might contribute to this strange relationship, it appears
to occur mainly because of reduced swing on IN1 and IN2. For instance, if threshold
voltage increases to 0.6 V, then IN1 and IN2 precharge to 0.6 V rather than 0.5 V.
Likewise, when they charge up to (VDD
-
VTN),
they only charge up to 1.2 V rather
than 1.3 V 1 .
'The actual values are somewhat lower due to the backgate effect
84
40
-
30 -.--
Measured Values
Fitted Values
CD,
210.........
C
0
-20
-15
-10
-5
0
5
Approximate % Change |VTNI and |VpI
10
15
20
Figure 4-9; Changes in evaluation speed as a function of threshold voltage variation
(Numbers refer to magnitude of VTN and VTP)
The reduced swing on INi and IN2 translates into less switching noise on VIN and
VREF, and thus improved performance. Equation 4.14 gives a linear regression model
for evaluation speed as a function of threshold voltage variation.
Z
4.2.6
fevai-max(%)
=
-2.9991 + O.6995ZAjVTN,pI(%)
(4.14)
Supply Voltage
Supply voltage varies for any number of reasons. For example, external power supplies
might not provide exactly 1.8 V, or the value might vary with temperature. Furthermore, IR drops in the package and interconnect can lead to differences between the
VDD the circuit observes and that measured on the outside.
In most digital circuits, increasing supply voltage increases speed. A high logical
bit takes on the value of VDD, which acts as the VGs of the next logic gate. Higher
VDD therefore translates into more gate overdrive and more speed. However, once
again the receiver circuit displays a counterintuitive trend. In Fig. 4-10, evaluation
speed exhibits a strong negative dependence on supply voltage.
85
20
-_
...
--.. .....
.. ..
-
Measured Values
Fitted Values
(D
(I,
0
-10
-8
-6
-4
-2
0
2
% Change VDD (from 1.8 V)
4
6
8
10
Figure 4-10: Changes in evaluation speed as a function of supply voltage variation
The reasoning here closely parallels that described on page 84 for threshold voltage, with one slight twist. Changes in supply voltage affect not only IN1 and IN2,
but the clock and reset signals as well.
An increase in supply voltage increases the voltage swing on IN1, IN2, CLK, and
RST, all of which contribute additional switching noise to VIN. This accounts for the
larger slope in Fig. 4-10 than in Fig. 4-9. Equation 4.15 gives a linear regression
model for supply voltage variation.
Afevar-max(%)
4.2.7
-1.4056 - 1.15 8 3 AVDD(%)
(4.15)
Summary
Table 4.1 summarizes the linear regression models fitted to the variation parameters
in the four previous sections. The key term is the linear term, which indicates by what
percent evaluation speed changes for a one percent change in the variation parameter.
Obviously, supply and threshold voltage variations affect performance the most
(largest linear terms). These factors directly affect the amount of switching noise on
VIN
and VREF, which can disrupt circuit evaluation.
Channel length and temperature, on the other hand, do not seem to affect circuit
86
performance much at all. The circuit's ability to function depends on its ability to
compare VIN and VREF, upon which channel length and temperature have little effect.
Linear Term
-0.1592
-0.3531
+0.6995
-1.1583
Constant Term
-1.4670
-1.1952
-2.9991
-1.4056
Variation Source
Channel Length
Temperature
Threshold Voltage
Supply Voltage
Table 4.1: Comparison of regression models for different variation sources
Note that Figs. 4-7 to 4-10 all maintain a constant aspect ratio, where the y-axis
always has twice the range of the x-axis. This allows one to visually compare the
slope of each variation effect across graphs.
-E-
-A-
451
-*-
40
-e--. ..- .
(35
. - -
......
a30' - -.
-
-
... -..
-.
22
-.----..
W20
1)
0
-40
- -. -.
...
- .........
-.
-.
-
..
-..
..
......
-
-30
-.-.-.-.-
-20
-.
-......
- -.-.-- -.
.-.-.
-. .........
---.-.-- -.- -.-- - ...
-.-.-.
-.-.-..
....
..
-10
0
....
-.. . ...
-....
..
-..
-.
....
... ...
10
I
Fitted VDD Variation
Fitted Threshold Variation
Fitted Temperature Variation
Fitted Length Variation
.
20
30
. ..
.-.. ..
-..
-..
40
% Change in Variation Source
Figure 4-11: Absolute value of changes in evaluation speed relative to source variation
percentage, as given by regression models in Table 4.1
As an additional aid, Fig. 4-11 plots all four models from Table 4.1 on the same
graph. The variation source on the x-axis ranges from -40% to +40%, with changes in
evaluation speed on the y-axis. Clearly, supply voltage variations impact circuit performance the most, followed by threshold voltage, temperature, and channel length.
87
4.3
Differential Variation Results
Differential variation means that components on the input and reference side of the
circuit change in different ways. These variations have a much more pronounced effect
on performance of the optical data receiver than uniform variations.
In Sec. 4.2, a 20% variation in supply voltage results in a 23% change in evaluation
speed, whereas channel length only produces a 3% change for the same amount of
source variation. In comparison, a differential variation of 20% in channel length can
cause up to a 50% decrease in evaluation speed.
In this section, channel length serves as a tool to explore how differential variation
affects circuit performance. Channel length makes a good barometer of differential
variation effects for four reasons. First, the simulations are easy. Second, channel
length affects circuit performance in an intuitive way. Third, it allows simulation
over variation ranges similar to those used in Sec. 4.2, whereas the circuit simply fails
for a 20% differential variation in supply voltage. Finally, some parameters, such as
temperature and supply voltage, only change gradually. Temperature simply cannot
change by 50 degrees from one side of the receiver to the other (a distance of 13.6 um).
In these cases, uniform variation is more relevant.
Chapter 2 describes naming conventions for the different parts of the receiver
circuit (page 33). Corresponding to those conventions, the next three sections present
simulation results for differential variation between transistors in the latch, between
the input transistors, and between the input stage and reference circuit. In addition,
the fourth section looks at what happens when the input and reference photocurrent
do not match.
During the following discussions, keep in mind the 3% decrease in evaluation
speed due to uniform channel length variation (of 20%).
This number serves as
a useful benchmark for comparing uniform and differential variation. In almost all
cases, differential variation effects far exceed the minor degradation caused by uniform
variation.
88
The Latch
4.3.1
The discussion on sizing in Sec. 3.2 claims that differential variation inside the latch
has little effect on circuit performance. This remains true compared to differential
variation in other parts of the circuit, but evaluation speed can still decrease by 15%
or 16% for a 20% asymmetrical variation, compared to only 3% for uniform variation.
10
0
10
0
-20
1010
.10
-0
-
-30
0
15
20
15
20
-40
--50
% Change L on Reference Side of Latch
% Change L on Input Side of Latch
Figure 4-12: Changes in evaluation speed as a function of differential variation between input and reference side of latch
Figure 4-12 plots changes in evaluation speed on the z-axis, with the two variation
sources on the x-axis and y-axis. The "input" side of the latch consists of all the
transistors on the input side (Ml, M3, M5, and M7), and the "reference" side of the
latch consists of all the transistors on the reference side (M2, M4, M6, and M8). The
reset transistor, M9, does not change.
Uniform variation occurs along the diagonal from (0%, 0%) to (+20%, +20%), and
has little effect on evaluation speed. Moving away from this line increases differential
variation, as lengths change in different proportions. Maximum differential variation
occurs at the corners ((0%, +20%) and (+20%, 0%)), where the circuit exhibits
decreases in speed due to mismatch.
89
Although the magnitude of the differential across IN1 and IN2 remains the most
important ingredient for correct evaluation, mismatch in the latch does create a
propensity for evaluating one way or the other. Overcoming this propensity requires
a larger differential, which takes more time and slows down the circuit.
Input Transistors
4.3.2
As in Ch. 2, the "input transistors" refer to the four transistors connected directly to
the inputs of the sense amplifier. Namely, the "left" input transistors, M11 and M13,
connect to IN1, and the "right" input transistors, M14 and M15, connect to IN2.
10
10
10
o.0
0
0)0
-1
-10
-1
-0,
o> -30-
20
--
-4000
-*--50
-----
10
15
15
20
%Change L (M11 and M13)
20
1-50
% Change L (M14 and M15)
Figure 4-13: Changes in evaluation speed as a function of differential variation between left and right input transistors
In Fig. 4-13, evaluation speed decreases roughly 12% along the diagonal as the
lengths vary uniformly from 0% to 20%, but drops by nearly 44% at the corners.
This happens because differential variation changes the transconductance of the input
transistors. As shown in Eqs. 4.16 and 4.17, this changes the scalar terms relating
the input currents to VIN and VREF. This has a similar effect to simply scaling VIN
and VREF themselves.
90
IIN1
X
9m11IN
(4.16)
IIN2
CX
9m14VREF
(4.17)
Increases in length on the reference side make M14 weaker, diluting the effectiveness
of
VREF.
Since VREF is responsible for evaluating low bits, this causes errors on low
transitions of the input. Likewise, increases in length on the input side make the
input signal weaker, and the circuit fails on high bits.
Along the diagonal, the circuit fails on low and high bits at the same speed,
indicating that VREF has been appropriately chosen to maximize evaluation speed for
both high and low bits.
4.3.3
Input Stage and Reference Circuit
The left axis in Fig. 4-14 designates variation in the input stage, namely M10 and M12.
Recall that g10 determines the time constant on VIN, and M12 supplies bias current.
The reference circuit refers to transistors M16 to M19.
Uniform length variations along the diagonal of Fig. 4-14 decrease evaluation speed
by less than 8%, but differential variation at the corners drops evaluation speed by
nearly 50%, or rather a factor of two.
The circuit goes from 2.00 GHz with no
variation to 1.02 GHz with 20% variation in the input stage relative to the reference
circuit.
Differential variation between the input stage and reference circuit hurts for two
reasons. First, the variation effectively shifts VIN and VREF relative to one another.
Second, the increase in length of M10 changes gmiO. This increases the time constant,
which also slows down the circuit.
91
___________________________
~1
-
10
10
a
-0
10
C
10
--.-
-2
> -30
-2-20
-20
--
-400
-0
S-50>
0
---
5
10
-40
-1
1 01 5
15
20
20
105-
50
% Change L in Reference Circuit
% Change L (M10 and M12)
Figure 4-14: Changes in evaluation speed as a function of differential variation between input stage and reference circuit
4.3.4
Input and Reference Photocurrent
Depending on the location of optical transmitters and receivers, and the nature of
the optical signal paths in between, the optical power delivered to each photodiode
can vary. The reference circuit operates under the assumption that all bits within the
same bus follow similar optical paths and deliver the same amount of power. This
section looks at what happens when variations disturb the match.
The term "input photocurrent" refers to the maximum steady state current produced by the photodiodes for a high optical signal. In the reference circuit, the diode
always produces this maximum value. The input photodiode, on the other hand,
swings back and forth between zero and the maximum value, depending on the input. Simulations in Ch. 3 assume a photocurrent of 10 uA. Figure 4-15 shows how
evaluation speed changes for small perturbations around this point.
Recall that the reference circuit essentially averages the maximum and minimum
current values. Thus, changing the reference photocurrent causes significant movement in VREF.
92
ii-
-'J~~
- -
-
-7
-
----
3-
-
-
-
-=---------
10
10
0
--
C0-
-20
01
Maximum
Input Photocurrent (uA)
Reference Photocurrent (uA)
Figure 4-15: Changes in evaluation speed as a function of input and reference photocurrent
On the other hand, changing the input photocurrent only affects the voltage swing
between VLO and VHI. This makes changes in VREF more or less significant compared
to the size of the input waveform. For example, in Fig. 4-15, changes in reference
photocurrent are more significant with a small input current, because of the smaller
voltage swing on VIN. This accounts for the much lower dip on the righthand side of
the diagram.
4.4
Conclusions
Of the uniform variations, temperature presents possibly the greatest danger. Specifications might require the chip to perform over huge temperature ranges, whereas
20% variations in other parameters are somewhat unlikely to occur.
After temperature, any uniform variation that introduces switching noise on the
critical nodes can cause problems. Supply and threshold voltage variation introduce
this kind of noise by enlarging the "digital" waveforms in the circuit. Reduced swing
on the clock and reset signals could potentially alleviate some of this problem, but at
93
the cost of speed or possibly power. Of course, increasing photocurrent or decreasing
diode capacitance can always offset the effects of uniform variation.
In terms of differential variation, the input stage and reference circuit present
the biggest problems, along with mismatches in photocurrent between the input and
reference photodiodes. Variation between the input transistors also causes a fairly
serious degradation in speed, while variation in the latch has little effect.
Obviously, differential variation presents the greatest challenge in the design of
an optical data receiver circuit. However, this is not actually new information. Designers have always been aware of the need to accurately match components in sense
amplifiers.
Good layout can go a long way to improve transistor matching. Beyond that,
scaling up transistor dimensions makes them more resistant to geometric variations.
If the distance between the receiver and reference circuits becomes a problem, then
introducing multiple reference bits might help. For instance, using a reference bit for
every bank of 32 data bits ensures that no receiver is farther than 16 bits away from
a reference circuit.
Alternatively, if current matching becomes an issue, a designer could install multiple reference circuits in parallel. Averaging several optical reference signals, instead
of just one, reduces deviation from the true mean.
In the end, a designer must determine an acceptable set of performance specifications, and make the sacrifices necessary to achieve them. Luckily, sense amplifiers
have been around for a long time, so a wealth of information on dealing with them
already exists.
94
Chapter 5
Test Chip
In order for the data receiver circuit to function properly, incoming optical data
must be synchronized with the receiver clock. In a completely integrated solution,
the same clock drives optical transmitters and receivers, automatically enforcing this
synchronicity requirement.
However, the test chip design uses free space illumination from an external laser to
test functionality of the receiver alone. This presents the challenge of synchronizing
output from an external laser with the internal test chip clock. Figure 5-1 shows a
testing strategy to overcome this constraint.
An on-chip "data generator" stores test data to drive the laser. Using the two
input signals, Program and Data In, one can manually program a test pattern into the
generator. Driving both the receiver and the generator with the same clock circuitry
ensures synchronization between the two.
This chapter discusses operation of the components in Fig. 5-1 one by one, starting
with general digital building blocks, the data generator, and the receiver circuitry.
Section 5.4 discusses clock distribution hardware, which includes a phase-locked loop
(PLL). Stabilizing this loop can prove difficult, so Sec. 5.5 outlines PLL design in
more detail. The last section summarizes the testing strategy, and discusses a test
chip submitted for fabrication in December, 2001.
95
On-chip Components
Program
Data In
Data
Generator
_ _ __
SClock
SCircuitry
CLK Select
CLK
VDD
VDLa
Ofi
O f-chip
L ser
'
Receiver
IPHOTOREF
Circuitry
____
'BIAS
* Data LJUL
Figure 5-1: General testing strategy for data receiver circuit
5.1
Building Blocks
This section describes several key digital components.
First, Fig. 5-2 shows the
schematic of a "D" flip-flop (DFF). This common implementation for a DFF dissipates little power, allows for simple layout, and runs comfortably at speeds in the
gigahertz range. Note that Q resolves when CLK goes high, making this a rising edge
flip-flop.
VDD
VDD
/CLKj
CLKj
D-
Q
CLK
/CLK--
Ij
Figure 5-2: Schematic of DFF used in testchip
Next, look at the two input multiplexor (MUX) in Fig. 5-3. A high input on SEL
96
turns on the top transmission gate, and a low input turns on the bottom one. The
use of both an NFET and a PFET in the transmission gate ensures rail-to-rail swing
on the output.
IN1
SEL -
-OUT
INO -
Figure 5-3: Schematic of two input mux using transmission gates
Finally, Fig. 5-4 shows the schematic of a first-in, first-out (FIFO) buffer consisting
of 63 "D" flip-flops. The data generator and receiver circuit both use this buffer to
store sequential data. Section 5.4 explains the choice of 63 bits during the discussion
of clock distribution.
0
DO
DQ -D
1
61
2
Q
DQ -
DQ -DQ
62
-062
CLK
Figure 5-4: Schematic of FIFO buffer
5.2
Data Generator
The data generator stores arbitrary bit patterns programmed by the user, and generates a synchronized signal to drive the laser. As shown in Fig. 5-5, building such a
structure requires only a MUX and a FIFO buffer.
97
PRG (Program)
0
Data InOu
-D
162
Q -D
Q
-----
D
Q
-
+ Data Out
CLK
Figure 5-5: Data generator for driving off-chip laser source
Setting PRG high breaks the loop, and the buffer begins accepting external data.
This constitutes the programming phase. Each time CLK clicks high, the first flip-flop
stores the value on Data In, and all the flip-flops shift their values to the right by
one. Doing this 63 times programs the entire array.
A low value on PRG closes the loop, causing the bit pattern to continually cycle.
Thus, the data generator drives the off-chip laser with a periodic sequence of 63
pre-programmed data bits. This synchronizes optical input with the on-chip clock
because the same clock signal drives both the data generator and the receiver circuit.
5.3
Receiver Test Circuitry
The receiver test circuitry captures and stores receiver output for later use. In other
words, a tester can program an arbitrary bit pattern into the data generator at low
speed, run the chip at high speed, and then read the results later at low speed.
This storage mechanism also allows a person to take multiple output samples and
calculate bit error rates. For example, consider a bit error that occurs 10% of the
time. To the naked eye, the waveforms appear correct because 90% of the transitions
look right, but one out of ten individual samples should exhibit the bit error. By
storing many different output samples for the same input, one can speculate on how
often these errors occur.
98
RUN/HOLD
D Q
VDD
CLK ---
Input
CLK
RST
62
0
Optical ,
/
a
Diode In
CLKK
L>
__
Data
Receiver
CLK
RST
-0G
-----
-D
D-
Q
Data ut
&Dt
u
CLK
VP
VREF
VDD
Reference REF
Circuit
PHOTOREF
IPHOTOREF
V-
BIAS
Figure 5-6: Receiver test circuitry using FIFO buffer
As shown in Fig. 5-6, this data storage mechanism closely resembles the data
generator from Sec. 5.2. To understand this diagram, first look at the receiver and
reference circuit blocks. An off-chip current source, IBIAS, sinks current from M1,
which biases Vp. A second off-chip current source, IPHOTOREF, provides reference
photocurrent in lieu of a second photodiode.
A current input replaces the reference photodiode for two reasons. First, one
cannot expect the same kind of match between two discrete, off-chip laser sources as
from two laser diodes built next to each other on the same chip. A reference optical
path works because it mirrors a data bit exactly. However, when the lasers exist in
completely different packages, this mirroring breaks down, and the reference optical
path no longer makes sense.
Second, an off-chip photocurrent reference allows greater versatility in testing.
Subtle adjustments in the reference current can tweak circuit performance and provide
insight into circuit operation inside the chip. Also, by finding the reference current
level that maximizes evaluation speed for high and low transitions, one can estimate
how much current the input photodiode produces 1 . In addition, a tester can conduct
'The optimum reference current should be exactly half of the maximum input photocurrent
99
primitive variation analysis by varying the reference current up and down by small
amounts while measuring evaluation speed.
Next, look at the outputs of the data receiver. Only one of the outputs is needed
to verify functionality, but both must see the exact same load. Once again, any
mismatch between the sides of the receiver can cause bit errors, so if Q drives a flipflop, then /Q must drive a flip-flop as well. One of them hangs unconnected, and the
other drives a buffer similar to the data generator.
A high select signal on the MUX causes the circuit to "run," and data streams from
the receiver into the FIFO buffer. When the select signal goes low, the MUX switches
to feedback and no new data flows into the buffer. Instead, it cycles repeatedly
through 63 previously stored bits, effectively "holding" the data.
5.4
Clock Distribution
Figure 5-7 illustrates a means for supplying both low and high frequency clock signals
to the test chip. Setting CLK Select low allows high frequency clock multiplication
using a phase-locked loop (PLL), while setting it high passes the external clock input
directly to the on-chip components.
CLK Select
CLK
PLL
_
1
On-chip Clock
Figure 5-7: Multiplexing external and internal clock signals
Normally, one uses the low frequency clock for programming the data generator
and reading information from the receiver buffer at low speeds. However, some offchip clock sources might be able to drive the internal chip circuitry directly. In other
words, while the PLL can only provide a high frequency clock, the external input
100
can potentially provide both. The rest of this section discusses issues related to clock
distribution, while Sec. 5.5 describes the design of the PLL.
Figure 5-8 illustrates the clock distribution scheme. The output of the multiplexor
from Fig. 5-7 drives the input of a three layer tree. Each level of the tree branches
four times, for a total of 43, or 64 leaves.
/CLK
(Input)
63 nodes
driving 63
flip-flops
CLK
(Flip-flop)
A
64th node
drives
receiver
/CLK
(Receiver)
RST
Figure 5-8: Clock distribution
Sixty-three leaves of the tree drive flip-flops, while the last node drives the receiver
circuit itself. As a result, the flip-flops latch at the same time or right before the
receiver circuit resets. This ensures that the flip-flops latch the evaluated signal from
the previous cycle, not the reset value of the receiver. Figure 5-9 illustrates these
timing constraints more clearly.
Inverter Delay
/CLK (Input)
NOR Gate Delay
CLK (Flip-flop)
(2x Inverter Delay)
Node "A"
/CLK (Receiver)
RST
Figure 5-9: Clock distribution waveforms (corresponding to Fig. 5-8)
101
Each unit of time in Fig. 5-9 represents one inverter delay, while the NOR gate
takes approximately two inverter delays. Three delays after the input goes low, the
flip-flop clock goes high, followed by node "A" going low. Two time periods after
that, the receiver clock and reset signals switch at about the same time. This gives
the flip-flops approximately three inverter delays to latch a signal before the receiver
begins resetting. In simulation, this provides sufficient time.
Notice that an inverted clock signal (/CLK) drives the receiver circuit. The receiver resets and begins a new evaluation cycle on the falling edge of its input clock,
essentially making it a "falling edge" element. However, the rest of the test circuitry
operates on the rising edge of the clock. To reconcile this, the clock tree drives the
receiver with an inverted clock signal.
5.5
PLL Design
A phase-locked loop provides an on-chip means of multiplying the input clock frequency. In particular, the test chip uses the PLL architecture in Fig. 5-10, which
multiplies the input clock signal by 64. Essentially, the loop tries to lock the phases
of fIN and fFB, which requires locking their frequencies as well. This forces fouT to
run 64 times faster than fFB and the input, effectively multiplying the frequency.
The loop operates as follows. A phase-frequency detector (PFD) senses the phase
difference between the input and feedback signals, and encodes the result in a pair of
pulse trains on UP and /DN. These signals drive a charge pump, which sinks or sources
current into the loop filter, Z(s). The output voltage of this filter represents the
relative phase difference between the input and feedback signals, which then drives a
voltage-controlled oscillator (VCO).
For example, consider a phase increase. The PFD detects the change in phase
and increases the density of the "up" train relative to the "down" train. The charge
pump turns this into a current, and the loop filter takes the average value. Due to the
increased density of "up" pulses, the average value increases, and the VCO increases
frequency slightly, allowing the feedback signal to catch up to the input in phase.
102
Off-chip
Loop Filter
z(S)
Charge
Pump
IN
fFB
OUT
Divide
by 64
Figure 5-10: Phase-locked loop block diagram
The test chip implementation uses an off-chip loop filter, along with external pins
to control charge pump gain and VCO offset. This allows a tremendous amount of
versatility, but also introduces a lot of variables. The following four sections describe
the implementation of each element in the loop and introduce their transfer characteristics. Section 5.5.5 discusses stability concerns, and gives an example of how to
stabilize the PLL for operation at a gigahertz.
5.5.1
Phase-Frequency Detector
A phase-frequency detector (PFD) differs from a simple phase detector in that it
can track frequencies over a wide range as well as measuring phase difference for
signals of similar frequencies. For example, an XOR gate can detect phase for two
signals at the same frequency, but a great number of different frequency combinations
produce the same output patterns. Thus an XOR gate is a phase detector, but not a
phase-frequency detector.
On the other hand, consider the PFD in Fig. 5-11. When
"up" signal turns on, and when
fFB
>
fIN
103
fFB < fIN,
only the
only the "down" signal turns on. As a
result, the loop always gravitates toward the center frequency, where the PFD acts
predominantly as a phase detector [11].
fIN
CLK
0
I>-UP
Q
DN
DFF
w / RST
RST
RST
DFF
w/RST
fFB
CLK
Figure 5-11: Phase-frequency detector using DFF's with reset capability
Figure 5-11 does not show inputs for the "D" flip-flops because they are always
connected to VDD- In other words, the flip-flops have only two states. When they
reset, Q goes to zero, and when the clock goes high, Q goes to one and stays high until
reset. Figure 5-12 shows a circuit that implements this behavior [11].
CLK
0
RST
Figure 5-12: Flip-flop with reset signal for use in PFD
104
Many sources characterize a PLL in terms of phase input and phase output, in
which case the phase detector merely normalizes the phase difference to one. Since
phase ranges from zero to 21r, this means dividing by 27r [11] [15].
However, the loop in Fig. 5-10 characterizes the PLL in terms of frequency. In
this case, the PFD not only normalizes the phase, but also converts from frequency to
phase, equivalent to an integral in the time domain and a pole at zero in the frequency
domain.
Equation 5.1 first multiplies by 27r to convert from hertz to radians per second,
and then divides by s to integrate frequency into phase. However, the PFD normalizes
phase to one, so dividing Eq. 5.1 by 27r gives the transfer function for the PFD, as
shown in Eq. 5.2.
4<(radians) =
GPFD(s)
5.5.2
=
_2wr
-f(Hz)
(5.1)
S
(5.2)
Charge Pump
Every time the "up" signal from the PFD turns on (whenever /UP goes low), the
charge pump sources current, and every time the "down" signal turns on, the charge
pump sinks current, such that the net current out of the charge pump is proportional
to the difference between the two pulse trains. The average value of this output
current represents the phase difference between fIN and fFB. Figure 5-13 shows the
charge pump schematic.
In this diagram, inverters Ii and 12 act as power supplies for two current sources,
M7 and M8. A low input on /UP "turns on" M7 by connecting it to the positive rail,
and a high input on DN "turns on" M8 by connecting it to ground.
Transistors M3 to M5 bias the gates of M7 and M8 so they provide an amount of
current equal to Icp. However, due to a finite voltage drop in 11, the source of M7
never goes all the way to VDDAdding transistors M1 and M2 compensates for this non-ideality. By mimicing the
105
pull-up PFET inside I1, they ensure that all three PFET's (M3, M4, and M7) have
exactly the same gate to source voltage. Transistor M6 provides a similar service for
the NFET current mirror. These extra transistors ensure extremely close matching
between Icp and the output current.
VDD
M1
M2
M3
M7
'OUT
M5
M8
VDD
VDD
'CP
M6
F
DN -
12
Figure 5-13: Schematic diagram of charge pump
Equation 5.3 expresses the output current, IOUT, in terms of UP and DN, the
output signals of the PFD. Mathematically, these signals take on a value of either
zero or one (where /UP takes on the opposite), consistent with the claim that the
PFD normalizes phase difference to one. Therefore, as shown in Eq. 5.4, the charge
pump merely converts the output of the PFD into a current, making the gain ICp.
5.5.3
IOUT
=
ICp(UP - DN)
(5.3)
Gcp(S)
=
Icp
(5.4)
Voltage-Controlled Oscillator
Voltage-controlled oscillators tend to introduce non-linearity and high gain into the
PLL, making loop stabilization difficult.
These constraints have motivated some
advanced PLL architectures [11]. Luckily, the test chip can get by with a fairly low
performance, ring-oscillator architecture. Varying the propagation delay of individual
106
inverters in the ring changes the frequency of oscillation. For example, consider the
current-starved inverter shown in Fig. 5-14.
VDD
VP
M1
M3
OUT
INM4
VN
M2
Figure 5-14: Current-starved inverter with variable propagation delay
This "inverter" cannot source or sink more current than M1 and M2 provide. Changing the voltages VN and Vp changes the current, and therefore the propagation delay
through the inverter. Hooking an odd number of these stages together forms a ring
oscillator, as shown in Fig. 5-15.
Current-Starved
Inverter
C-S
C-S
INV
INV
C-S
INV
~'
,
VDD
VP
IN
'1>&
C-S
C-S
INV
INV
VN
fOUT
Figure 5-15: VCO consisting of five current-starved inverters
107
OUT
All inverters in the ring share the same control voltages, VN and Vp. The circuitry
shown in Fig. 5-16 generates VN and Vp based on an input voltage, VIN, and an offset
current, IN-
VDD
M2
VDD
-e-
VP
IN
VIN
VN
NM
M4
Figure 5-16: Control circuitry for VCO: IIN sets offset frequency, and VIN controls
output frequency
In Fig. 5-16, M1 provides between zero and 50 uA of current as VIN varies between
0.6 V and 1.8 V. For input voltages less than this range, M1 goes into the subthreshold
region and acts highly non-linear. In comparison, IIN sinks between 200 uA and
300 uA under normal operation.
In other words, IIN sinks most of the current, while VIN causes only small changes.
This keeps the gain from VIN to
fOUT
low, while at the same time allowing a large
frequency range of operation by changing the offset, IIN- External tuning also ensures
that the VCO can be tweaked to operate at the desired frequency despite process
variations (which affect this VCO architecture quite a bit). Figure 5-17 plots VCO
frequency as a function of VIN for several different values of of the offset, IIN.
The dashed lines in Fig. 5-17 represent fitted linear regressions using a least squares
method. Equation 5.5 gives the equations for these lines.
fouT(GHz)
=
0.7565 + 0.1032VIN
IIN
=
200uA
0.9010 + 0.0802VIN
IIN
=
250uA
1.0166 + 0.0590VIN
IIN= 300uA
108
(5.5)
1.15
-- Measured Values
= 250 uA
NI
00.6
08s
1.2
VIN (V)
1
1.4
16e
1.8
Figure 5-17: VCO frequency as a function of control voltage for different offset levels
Once again, many sources model the PLL in terms of phase, in which case the
VCO acts as an integrator [11] [15]. However, with a frequency output, like the block
diagram in Fig. 5-10, the VCO acts as a simple gain stage, and the integrator belongs
in the PFD. Equation 5.6 expresses the transfer function of the VCO as a linear gain
from VIN to fouT. Varying IIN changes the gain slightly, but it remains on the order
of 108 H z/V.
J0.1032
GHz/V
Gvco(s)
=
0.0802 GHz/V
0.0590 G=H2z/V
5.5.4
IN
=200u
IN =
IIN
A
250uA
(5.6)
300uA
Frequency Divider
The frequency divider in the feedback path takes fouT from the VCO as an input,
and produces fFB as an output. Figure 5-18 shows how to construct a frequency
divider using six "toggle" flip-flops. A toggle flip-flop consists of a DFF with inverter
feedback, so that the output inverts each time the clock goes high.
109
-
D
Q
D
Q -- D
Q
-
-D
Q-
fOUT AFB
Figure 5-18: Frequency divider using toggle flip-flops
It takes two clock periods for the output of a toggle flip-flop to cycle from one to
zero and back to one again. Thus, each toggle flip-flop divides the frequency by a
factor of two, and six stages divide the input frequency by 26, or 64. Equation 5.8
expresses the transfer function of the frequency divider.
fFB
-
GDIV64(s)
5.5.5
=
fOUT
64
1
1
(5.7)
(5.8)
Stabilizing the Loop
The previous sections summarized all of the loop components except the loop filter.
The output current from the charge pump, IOUT flows into the loop filter, creating
a voltage to drive the VCO. Thus, the impedance of the loop filter, Z(s), represents
the "gain" between IOUT of the charge pump and VIN of the VCO (Eq. 5.10).
VIN(VCO)
GLF(S)
= IouT(CP)Z(s)
=
Z(s)
(5.9)
(5.10)
Combining the gain of the loop filter with the gains of the other four stages
gives the loop transmission, as shown in Eq. 5.11. To remain stable, the PLL loop
transmission must have sufficient phase margin at crossover. The following discussion
shows how to stabilize the loop for operation at a gigahertz with approximately 60
degrees of phase margin.
110
L(s)
=
L(s)
GPFD(s)Gcp(s)Z(s)Gvco(s)GDIV64(s)
(5.11)
0.0802ICP Z8
4
Z(s)
64s
(5.12)
Stabilizing the loop consists of five steps. First, choose the VCO offset current.
The frequency plots in Fig. 5-17 show that an offset current of 250 uA puts the center
frequency right in the middle of the useful input voltage range. From Eq. 5.6, this
gives a VCO transfer function of 0.0802 GHz/V. Equation 5.12 combines this value
with the transfer functions of the other four elements.
Second, find a topology for the loop filter. For example, Fig. 5-19 shows a lead-lag
filter commonly used in phase-locked loops. This circuit has a pole at the origin,
so L(s) starts with -180 degrees of phase at DC. However, a zero "leads" in before
crossover to boost the phase, which goes back down after the "lag" pole kicks in.
OUT
,
+
(CP)
l
S C
VIN
(VCO)
R
C2
I_
Figure 5-19: Lead-lag loop filter for PLL
Equations 5.13 and 5.14 give expressions for the frequency of the lead zero and
the lag pole in radians per second for the circuit in Fig. 5-19.
1
Wlead-zero
= R(C 1 + 02)
1
Wlag-pole
=
111
RC2
(5.13)
(5.14)
Third, decide on a crossover point for the loop. For fouT running at a gigahertz,
the loop operates 64 times slower (the speed of
fFB
and
fIN),
or in other words
floop ~ 15.6 MHz. In order to minimize jitter, the crossover of the loop should be at
least 100 times slower than
floop.
Plotting the loop transmission from Eq. 5.12 with
a charge pump gain of 20 uA gives a crossover frequency around 106 rad/s, or about
160 kHz.
Fourth, calculate component values for the loop filter.
In this case, choosing
C1 = 1000 pF, C2 = 100 pF, and R = 2.7 kQ places the pole and zero a factor of
eleven apart with a maximum phase boost of 56.4 degrees occurring at 1.1167 Mrad/s.
Finally, adjust the DC gain elements in the loop to fine tune crossover and achieve
maximum phase margin. In this example, increasing the charge pump gain, Icp, to
26.7 uA moves crossover to 1.1164 Mrad/s. Figure 5-20 shows bode plots for the
final parameter values.
Gm=84.855 dB (at 2.69e+08 rad/sec), Pm=56.442 deg. (at 1.1164e+06 rad/sec)
150
100-
50 -
-?-50
-100
CD
-130 -
a
-140 -150-160-170-
-180
10'
10'
108
1'
10
10'
Frequency (rad/sec)
Figure 5-20: Loop transmission bode plots using lead-lag loop filter
Table 5.1 compares the important frequencies for this set of parameters. Note
that the lead zero occurs a factor of V/_1 lower than crossover, and the lag pole kicks
112
in a factor of v1_ higher. In other words, the maximum phase boost occurs at the
geometric mean of the pole and zero.
C2
Value
2.7 kQ
1000 pF
100 pF
ICP
26.7 uA
Parameter
R
C1
Frequency
flead-zero
fcrossover
flag-pole
fIN
and floop
Value
53.6 kHz
177.8 kHz
589.8 kHz
15.6 MHz
fouT
1.0 GHz
Table 5.1: Example PLL values using lead-lag loop filter
This discussion represents just one example of a stable PLL configuration. With
an externally variable charge pump gain, VCO offset, and loop filter, the PLL can be
customized to lock over a wide range of target frequencies.
5.6
Testing Summary
Figure 5-1 (page 96) gives an overview of the testing strategy. The clock circuitry
accepts an external clock signal and either multiplies it using a PLL or feeds it directly
through to the internal chip circuitry. This clock network drives both the receiver
and the data generator.
The data generator cycles through a pre-programmed test pattern to drive the
laser. Driving the laser with an on-chip component ensures synchronization between
incoming optical data and the local receiver clock.
The receiver circuitry consists of the receiver circuit and a buffer. As optical
data streams in from the laser, the receiver evaluates each bit and stores the result
in a buffer. Flipping a control signal causes the buffer to hold its values for later
inspection. An external "photocurrent reference" replaces the reference photodiode
for convenience and testing versatility.
On December 3, 2001, a test chip was submitted for fabrication. In addition to
the optical testing scheme described here, the chip also contains two other types of
113
test circuits. One takes an electrical input, and the other takes a manual (external)
input.
The "electrical input" comes from the data generator. In addition to driving an
off-chip laser, the data generator also controls a charge pump. Based on an external
bias current, the charge pump supplies input current to a receiver. On high bits it
sources current, and on low bits it does nothing. This input source, combined with a
160 fF capacitor 2 , simulates a virtual photodiode input.
The manually driven circuit represents a "bare bones" implementation. All inputs
come from external pins, and all outputs go to external pins. The on-chip clock drives
this circuit, but everything else comes from outside.
All three circuits share the same clock signal, power grids, and bias currents, but
each receiver uses its own photocurrent reference.
2
Expected capacitance of test photodiodes
114
Chapter 6
Conclusion
This chapter begins by summarizing the contributions of the thesis and then moves
to a discussion of how these contributions benefit future designers of optical receiver
circuits and other applications.
6.1
Summary
This thesis targets the design of a variation robust data receiver circuit for on-chip
optical interconnect. However, monolithic integration in a digital CMOS technology
presents several unique design challenges.
Monolithic integration limits photodetector design to the materials and doping
levels available in the process. This can lead to lower efficiency and larger parasitics.
Laser diode integration limits the amount of available optical power, which often
means small current levels at the photodetector.
An effective data receiver circuit circumvents these shortcomings to provide the
fastest propagation delay possible while still maintaining low power and area costs.
Furthermore, the design must function amidst numerous, inevitable variations in the
process and the environment.
Luckily, monolithic integration also means synchronization between the transmitter and receiver. Thus, a designer can leverage the presence of a local clock signal as
a powerful design tool.
115
This thesis draws upon a previous set of designs that address similar specifications.
Many designs of this sort use a latching sense amplifier because it takes advantage
of the clock and holds state at the end of each cycle. Making a few modifications
to such a circuit significantly enhances performance in the face of large photodiode
parasitics.
Specifically, adding a current mirror between the photodiode and the sense amplifier input node isolates the large photodiode capacitance from the switching nodes.
A specialized reference circuit drives the second input of the sense amplifier so that
it correctly evaluates both high and low bits.
One extra optical path accompanies each set of data bits to serve as a measure of
steady state optical power for the reference circuit. This reference averages the steady
state current of a high and low bit to produce a reference voltage that is exactly half
of the voltage swing on the photodiode input node. This minimizes propagation delay
for both high and low input transitions.
A 0.18 um digital CMOS technology provides a testbed for circuit implementation.
Increasing transistor width in the current mirrors increases transconductance and
speeds up photodiode transients. Transistors acting as current sources are sized at
twice the minimum width and length to add bias point stability in the face of process
variation. The latch itself exhibits surprising robustness, and operates handily at
more than a gigahertz using minimum sized transistors.
Effective quantification of receiver performance requires defining "evaluation speed,"
the maximum frequency for which all input bits are evaluated correctly. This definition measures only whether the sense amplifier makes an accurate logical decision,
not whether it produces acceptable logic levels at the output. Therefore, evaluation
speed measures mostly the functionality of the input stage and reference circuit.
With a photodiode capacitance of 100 fF and an input photocurrent of 10 uA,
the circuit achieves an evaluation speed of 2.0 GHz, and provides practical output at
frequencies beyond 1.0 GHz. At a gigahertz, each receiver bit dissipates 305.74 uW
of power and occupies 133.56 um 2 , while the reference circuit dissipates 196.45 uW of
power and occupies 74.20 uM 2 . Process and environmental variations can drastically
116
change these numbers.
Uniform channel length and temperature variations over wide ranges produce little
effect. Threshold voltage and power supply variations, on the other hand, cause larger
changes in evaluation speed, mainly due to the introduction of extra switching noise
at the photodiode input.
Differential variation degrades performance more than uniform variation. While
uniform length variations only decrease evaluation speed by about 3%, asymmetrical
variations between the input stage and reference circuit can decrease speed by as
much as 50%, or rather cut it in half.
The receiver's variation sensitivity makes matching an important concern. Good
layout techniques, larger transistors, and multiple reference circuits can all help reduce
the impact of process and environmental variations.
In the end, the tradeoff is between cost and performance. Adding more hardware
increases variation robustness, but consumes chip real estate. Similarly, better photodiodes increase evaluation speed, but require costly process modifications. A designer
must balance these tradeoffs to achieve the desired performance given a certain budget
constraint.
6.2
Final Thoughts: Contributions
This thesis makes two specific contributions, namely using a current mirror to isolate
capacitance, and current domain arithmetic to construct a reference circuit. These
ideas should be thought of not just in terms of optical receivers, but as generally
useful circuit tools.
Most digital circuits rely on the ability to switch nodes quickly. The current mirror
technique in this thesis provides a means of coupling high capacitance sources into
critical nodes without decreasing switching speed. For example, one might use this
technique to improve the performance of sense amplifiers in random access memories,
where high capacitance bit lines greatly resemble photodiodes.
Likewise, current domain arithmetic provides a useful and powerful tool for circuit
117
analysis. This thesis merely uses the technique to average two signals, but Sec. 4.4
hints at the true potential when it suggests averaging more inputs. In reality, one
can construct virtually arbitrary expressions consisting of addition, subtraction, and
division using only current mirrors and wires.
Great things are accomplished in small steps, and the most important things to
take away from any work like this are the small contributions that designers can
store away in their toolbox and one day use to build something great. Hopefully,
the reader can take away from this thesis not only an enhanced knowledge of optical
data receivers, but an assemblage of ideas that can transcend the field of optical
interconnect given a little ingenuity.
118
Appendix A
TSMC 0.18 um Digital CMOS
Process Characteristics
119
350V 0.2 = 1.8 v
300-
VGS = 1.6V
250VGS = 1.4 V
200-
-P150 V GS = 1.0 v
100-
0.8 V
vG
50-
VGS=0.6 V
0'
0
0.2
0.4
0.6
1
0.8
1.2
1.4
1.6
1.8
VDS (V)
Figure A-1: I-V Characteristics for NFET with W
=
0.5 um, L
=
0.18 um
=
0.18 um
700-
V Gs = 1.8 V
600
400-
=
0GS
500 -
14
G
.G
,30
1.6V
GS = 1.2 V
-
2300- -
VGS =0.8 V
100 -
vG =
0'
0
0.2
04
0.6
0.8
1
0.6 V
12
1.A
1.6
1.8
VDS (V)
Figure A-2: I-V Characteristics for NFET with W
=
1.0 um, L
35001VGS = 1.8 V
3000 -
vrs = 1.6V
2500 -
VGS = 1.4 V
-2000 -
01500
V GS = 1.2 V
-
VGS = 1.0 V
1000 -
VGs =
500 -
0.8 V
VGS =
0
0.2
0.4
0.6
0.8
1
1.2
0.6 V
1.4
1.6
1.8
VDS (V)
Figure A-3: I-V Characteristics for NFET with W = 5.0 um, L = 0.18 um
120
140
120-
VsG - 1
V
100-
VSG
1.6V
-~80-
<
-9
-
1.4 V_
VSG =
60
60
VsG = 1.2 V
40 -
VS
= 1.0
20 -
0-
0
Vs =
0.2
0.4
0.6
0.8
1
VSD (V)
V
0.8 V
1.2
1.4
1.6
1.8
Figure A-4: I-V Characteristics for PFET with W = 0.5 um, L = 0.18 um
250
VSG
18
200v sG = 1.6V
VSG = 1.4 V
;150 -
VS0
100 -
=
1.2 V
_
v SG = 1.0 V
5050
C
0
0.2
0.4
0.6
0.8
1
VS, (V)
1.2
1.4
1.6
1.8
Figure A-5: I-V Characteristics for PFET with W = 1.0 um, L = 0.18 um
1400= 1.8 V
v
1200-
VsG = 1.6V
1000800-
V
600-
VsG = 1.2 V
400-
vsG = 1.0 v
SG = 0.8 V
200 0
0
=1.4V
0.2
0.4
0.6
0.8
VS, (V)
1
1.2
1.4
1.6
1.8
Figure A-6: I-V Characteristics for PFET with W = 5.0 um, L = 0.18 um
121
10
-NFET
- -PFET
|
|
102
$10,
.... -..
10,
10
10
10
Frequency (Hz)
10l
10
le
10
1
10
Figure A-7: Current gain vs. frequency for 0.18 um TSMC process
0.5
1.0 urn
urn
W=5.0 umn
W
0
0C
44
45
46
47
48
49
50
Frequency (GHz)
51
52
53
54
Figure A-8: fT crossover for minimum length NFET
-W
--.-.
= 0.5 urn
W=1.Ourm
W=5.0urn
010
I
13
14
i
i'
I
15
16
17
i
i
18
19
Frequency (GHz)
i
20
-
21
22
23
Figure A-9: fT crossover for minimum length PFET
122
1200
--
1200
-
-
-1000
:00000
2
00800
600
600
S400,
20000
0
8
100
5
80
60
200
X4
40
-
Bias Current (uA)
0
3
0
Transistor Width (um)
1200.1000
Figure A-10: Transconductance, gm, of NFET with L = 0.18 um
1200
C-
2
:
1200 ,1000 ,
--1000
--
S800
,
S600
,
-
-800
8
600
S400 ,
200 ,-
0
100
80
-5
60
3
40
20
Bias Current (uA)
-200
4
1
0
0
2
Transistor Width (urn)
Figure A-11: Transconductance, gm,, of NFET with L = 0.36 um
123
0
700
600
700
--
-
- 600 ,
0
500
2 500400
4300100
8
300
.
200,.s
7
100
Ml-
O>
1600
4
60
40
200
0100
3
200
Bias Current (uA)
0
0
Transistor Width (urn)
Figure A-12: Transconductance, g,, of PFET with L = 0.18 um
2700
-
-
- .600
-
7001-
100
500
0
2
500
-
0400
-
4.
-..
400
c
100
-
200
200
100,
805
4
60
-3
40
201
Bias Current (uA)
20
0
0
Transistor Width (umn)
Figure A-13: Transconductance, gm,, of PFET with L = 0.36 um
124
-100
Bibliography
[1] S. B. Alexander. Optical Communication Receiver Design. SPIE - The International Society for Optical Engineering Press, Bellingham, 1997.
[2] K. Ayadi, M. Kuijk, P. Heremans, G. Bickel, G. Borghs, and R. Vounckx.
A monolithic optoelectronic receiver in standard 0.7 um CMOS operating at
180 MHz and 176 f J light input energy. IEEE Photonics Technology Letters,
9(1):88-90, 1997.
[3] A. P. Chandrakasan. 6.374: Analysis and Design of Digital Integrated Circuits
Lecture Notes, Fall 2000. 6.374 is a graduate class at the Massachusetts Institute
of Technology.
[4] J. A. del Alamo. Integrated Microelectronic Devices: Physics and Modeling.
Lecture notes for 6.720J/3.43J at Massachusetts Institute of Technology, August
2000.
[5] Taiwan Semiconductor Manufacturing Co., LTD. TSMC 0.18um Logic 1P6M
Salicide 1.8V/3.3V Design Rule. Correspondence with MOSIS and TSMC, March
1999. Intellectual Property of TSMC.
[6] Taiwan Semiconductor Manufacturing Co., LTD. Technology and manufacturing
- 0.18 micron. World Wide Web, April 2002. Information freely available on
TSMC's website.
[7] H. C. Luan. Ge Photodetectorsfor Si Microphotonics. PhD thesis, Massachusetts
Institute of Technology, 2001.
125
[8] A. Lum. An On Chip Low Skew Optical Clock Receiver. Master of Engineering
thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, 2001.
[9] D. A. B. Miller. Optical Interconnects to Silicon. IEEE Journal on Selected
Topics in Quantum Electronics, 6(6):1312-1317, 2000.
[10] R. Ram.
Personal communication, 2001.
Ram is an Associate Professor of
Electrical Engineering and Computer Science at the Massachusetts Institute of
Technology.
[11] B. Razavi. RF Microelectronics. Prentice Hall PTR, Upper Saddle River New
Jersey, 1998.
[12] S. L. Sam. Characterization of Optical Interconnects. Master of Science thesis,
Massachusetts Institute of Technology, Department of Electrical Engineering and
Computer Science, 2000.
[13] M. E. Schaffer and P. A. Mitkas. Smart photodetector array for page-oriented optical memory in 0.35 um CMOS. IEEE Photonics Technology Letters, 10(6):866868, 1998.
[14] S. M. Sze. Physics of Semiconductor Devices. John Wiley and Sons, New York,
1981.
[15] M. H. Perrott, M. D. Trott, and C. G. Sodini. A general PLL modeling approach
for E-A frequency synthesizers.
Correspondence: Charles Sodini or Michael
Perrott, Massachusetts Institute of Technology, Cambridge, MA.
[16] H. Zimmermann. Integrated Silicon Opto-electronics. Springer, Berlin, 2000.
126
Download