Fundamental Building Blocks for a Compact Optoelectronic Neural Network Processor by Benjamin Franklin Ruedlinger B.S., Physics Case Western Reserve University (1999) M.S., Computer Engineering Case Western Reserve University (1999) Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Doctor of Philosophy at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY May 2003 © Massachusetts Institute of Technology 2003. All rights reserved. A uthor ................... Departmtent of Electrich Engineering and Computer Science May 19, 2003 Certified by ..... ................ Cardinal Warde Professor fThe ' Supervor Accepted by........... Arthu C. Smith Chairman, Department Committee on Graduate Students MASSACHUSETTS INSTITUTE OF TECHNOLOGY JUL 0 7 2003 LIBRARIES 2 Fundamental Building Blocks for a Compact Optoelectronic Neural Network Processor by Benjamin Franklin Ruedlinger Submitted to the Department of Electrical Engineering and Computer Science on May 19, 2003, in partial fulfillment of the requirements for the degree of Doctor of Philosophy Abstract The focus of this thesis is interconnects within the Compact Optoelectronic Neural Network Processor. The goal of the Compact Optoelectronic Neural Network Processor Project (CONNPP) is to build a small, rugged neural network co-processing unit. This processor will be optimized for solving various signal processing problems such as image segmentation or facial recognition. These represent a class of problems for which the traditional logic-based architectures are not optimized. The CONNPP utilizes the processing power of traditional electronic integrated circuits along with the communications benefits of optoelectronic interconnects to provide a three-dimensionally scalable architecture. The topic of interconnects includes both optoelectronic interconnects between processing planes as well as wire based interconnects within the processing planes. The optoelectronic inter-plane interconnects allow the CONNPP to achieve 3-dimensional scalablility. These interconnects employ an array of holograms designed and fabricated using an analytic model that has been developed. This analytic model takes into account reading and writing the hologram at different wavelengths as well as non-idealities of the optoelectronic devices being used. The analytic model was tested and showed agreement with experiment to within 10% of the calculated values. The photodetectors used for the testing of this system have been designed within standard process technologies using a lateral PiN structure and a novel lateral BJT structure. In addition, highly-linear transimpedance amplifiers were designed to condition the signal from the photodetector for further processing. Finally, a new chip-level interconnect architecture has been proposed for the CONNPP that utilizes a system bus to provide global connectivity between the neurons within the plane of the chip. Software models were built to simulate various chip-level connectivity schemes. These simulations show the significant potential benefits of the global bus-based chip-level architecture that has been proposed. Thesis Supervisor: Cardinal Warde Title: Professor 3 4 Acknowledgments First, I would like to thank Professor Cardinal Warde for his guidance and metoring over the past 4 years. Without his vision for the Compact Optoelectronic Neural Network Processor Project, the work presented here would not be possible. In addition, I would like to thank Professor Clifton Fonstad. I have worked with Professor Fonstad over the past four years and he has aided in the design and testing of the circuits and devices produced during my tenure at MIT. To Dr. Thomas Knight, thank you for serving on my doctoral committee and providing very useful feedback. Your treks over to Building 13 have been much appreciated over the past year and a half. Thank you to my wife, Emily Nelson, who I both met and married at MIT. She deserves the largest thanks as she has put up with me for the past four years of graduate school. Without her love and support, I don't know if I could have completed this journey. Travis Simpkins has been very instrumental as both a colleague and a friend over the past several years. As suitemates at CWRU, I doubt we could have conceived of the fact that years later we would be sharing an office at MIT as Ph.D. candidates. Thank you for all your time spent over the last couple years on both a professional and a personal level. Thank you also to my parents, Richard and Jeanne Ruedlinger. Without them, of course, absolutley none of this would be possible. Over the past twenty-seven years, they have done a great job of supporting me in all my endeavours. For this reason, it is only appropriate that they also share in the accomplishment. 5 6 Contents 1 Introduction 17 1.1 Why Neural Networks? . . . . . . . . . . . . . . . . . . . . . . . . . . 18 1.2 Electrical vs. Optical Interconnects 23 1.3 The Compact Optoelectronic Neural Network Processor Project 1.4 2 . . . . . . . . . . . . . . . . . . . . . . 27 1.3.1 High-level overview of the CONNPP . . . . . . . . . . . . . . 28 1.3.2 Low-level overview of the CONNPP . . . . . . . . . . . . . . . 30 Thesis Sum m ary . . . . . . . . . . . . . . . . . . . . . . . . . . . . . Holographic Interconnects 2.1 2.2 2.3 32 35 Holographic Interconnect Overview . . . . . . . . . . . . . . . . . . . 36 2.1.1 W hat are holograms? . . . . . . . . . . . . . . . . . . . . . . . 36 2.1.2 How are holograms made? . . . . . . . . . . . . . . . . . . . . 38 2.1.3 Holographic interconnects explained . . . . . . . . . . . . . . . 39 Holographic Interconnections for the Compact Optoelectronic Neural Network Processor Project . . . . . . . . . . . . . . . . . . . . . . . . 42 2.2.1 Vertical cavity surface emitting lasers (VCSELs) . . . . . . . . 42 2.2.2 Spacing between emitter planes and holographic elements . . . 47 2.2.3 Uniformity and linearity of VCSELs . . . . . . . . . . . . . . . 48 2.2.4 Photodetectors . . . . . . . . . . . . . . . . . . . . . . . . . . 50 2.2.5 Holographic element specifications . . . . . . . . . . . . . . . . 51 Modelling Holographic Interconnections . . . . . . . . . . . . . . . . . 53 2.3.1 Interference theory . . . . . . . . . . . . . . . . . . . . . . . . 53 2.3.2 Recording holograms . . . . . . . . . . . . . . . . . . . . . . . 55 7 2.3.3 Bragg theory 2.3.4 Writing and reading holograms with the same wavelengths . 62 2.3.5 Writing and reading holograms with different wavelengths . 65 2.3.6 How to write a hologram with a divergent beam input and a . . . . . . . . . . . . . . . . . . . . . . . . . . . convergent wave output 2.3.7 . . . . . . . . . . . . . . . . . . . . . . . . 73 Volume aspects of holograms . . . . . . . . . . . . . . . . . . . 74 Hologram Writing Systems . . . . . . . . . . . . . . . . . . . . . . . . 78 2.4.1 System using a microscope objective . . . . . . . . . . . . . . 79 2.4.2 Single lens system . . . . . . . . . . . . . . . . . . . . . . . . . 83 2.4.3 Computer controlled system with beam splitter . . . . . . . . 85 2.4.4 Model verification using the beam splitter setup . . . . . . . . 90 2.4.5 Suggestions for future work 92 2.3.8 3 97 3.1 Background ......... 3.2 Photonic Systems Group Chip #1 3.4 4 . . . . . . . . . . . . . . . . . . . Devices and Circuits 3.3 68 How to write a hologram with a divergent beam input and a plane wave output 2.4 . . . . . . . . . . . . . . . . . . . . . 58 ................................ 97 . . . . . . . . . . . . . . . . . . . . 100 3.2.1 PSG1 Photodetectors . . . . . . . . . . . . . . . . . . . . . . . 100 3.2.2 PSG1 Amplifiers . . . . . . . . . . . . . . . . . . . . . . . . . 105 3.2.3 PSG 1 testing . . . . . . . . . . . . . . . . . . . . . . . . . . . 112 Photonic Systems Group Chip #2 . . . . . . . . . . . . . . . . . . . . 117 3.3.1 PSG 2 Am plifier . . . . . . . . . . . . . . . . . . . . . . . . . . 119 3.3.2 PSG2 Photodetectors . . . . . . . . . . . . . . . . . . . . . . . 124 Future Work . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127 Neural Network Models and Architectures 129 4.1 Specification For the CONNPP Hardware Implementation . . . . . . 129 4.2 Proposal For a CONNP With a Global Bus-based Architecture . . . . 133 4.3 Neural Network Models and Simulations . . . . . . . . . . . . . . . . 137 Background on MATLAB Neural Network Toolbox . . . . . . 138 4.3.1 8 4.4 4.3.2 Model of the CONNPP hardware using MATLAB . . . . . . . 141 4.3.3 Simulations and Results . . . . . . . . . . . . . . . . . . . . . 145 . . . . . . . . . . . . . . . . . . . . . . 150 Suggestions For Future Work 9 10 List of Figures 1-1 Illustration of one model of an artificial neuron . . . . . . . . . . . . . 20 1-2 Layer structure of an artificial neural network . . . . . . . . . . . . . 21 1-3 CPU and system bus speed as a function of time . . . . . . . . . . . . 23 1-4 A simple example of the layer structure of the CONNP . . . . . . . . 29 1-5 Single neuron view of the CONNP planes . . . . . . . . . . . . . . . . 31 1-6 One-dimensional cross-section of a single neuron communicating with multiple neurons in the next layer . . . . . . . . . . . . . . . . . . . . 2-1 A plane wave reflects off an object and the reflected wave then contains information about the structure of the object. 2-2 . . . . . . . . . . . . . 36 A plane wave is incident on the hologram and the reconstructed object wavefront results 2-3 32 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 38 The object and reference waves interfere at the holographic emulsion to record a hologram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 39 2-4 A simple holographic interconnect with a free-space propagation medium 40 2-5 (a)Emitters and detectors without any light steering element (b)Mirrors as a light steering element. Notice that detectors and emitters now mapped correctly. (c) A single hologram as the light steering element 41 2-6 Edge emitting semiconductor laser . . . . . . . . . . . . . . . . . . . 43 2-7 Structure of a vertical cavity surface emitting laser (VCSEL) . . . . . 43 2-8 Beam profiles of a single VCSEL at different operating powers . . . . 45 2-9 Two beams from the same origin point with different divergence angles hitting a hologram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11 46 2-10 How distance of VCSEL from hologram affects crosstalk . . . . . . . 48 2-11 Optical power emitted from three different VCSELs on a single chip as a function of input current . . . . . . . . . . . . . . . . . . . . . . . . 49 2-12 Two beams with the same energy, but different sizes hitting a photodetector .......... ................................... 2-13 Single holographic interconnect 51 . . . . . . . . . . . . . . . . . . . . . 52 2-14 Interference pattern of two plane waves being recorded. . . . . . . . . 54 2-15 Playback of the thin emulsion hologram with wave E2 . . . . . . . . . 57 2-16 (a)Two plane waves interfering inside a thick emulsion while a hologram is being recorded (b) The same thick emulsion after chemical processing . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2-17 The diagram illustrates the derivation of the Bragg Condition 59 . . . . 60 2-18 Desired readout geometry for hologram . . . . . . . . . . . . . . . . . 62 2-19 Calculated write geometry for hologram . . . . . . . . . . . . . . . . 63 2-20 Illustration of angles in hologram read out . . . . . . . . . . . . . . . 64 2-21 Illustration of the derivation of method to write a hologram at wavelength A, to be read at wavelength Ar. . . . . . . . . . . . . . . . .. 68 2-22 Hologram with divergent source as input and convergent beam as output 69 2-23 Read geometries for the extreme plane waves in the divergent source problem . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 70 2-24 (a)Read geometry for example (b) Calculated write geometry for example 73 2-25 (a)Read geometry for plane wave output (b) Calculated write geometry for plane wave output for example . . . . . . . . . . . . . . . . . . . . 2-26 Read and write geometries for a cross-sectional plane of the hologram 74 75 2-27 Angle of the partially silvered mirror planes as a function of z for the hologram discussed in Figure 2-26 . . . . . . . . . . . . . . . . . . . . 76 2-28 Read and write geometries for a single beam propagating through a thick hologram . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 77 2-29 Angle of the partially silvered mirror planes for a single beam propagating through the hologram . . . . . . . . . . . . . . . . . . . . . . . 12 78 2-30 (a) A cylindrical divergent beam and cylindrical convergent beam interfering within a holographic emulsion (b)The holographic grating after processing. The grating forms partial ellipses with foci 01 and 02. . . 78 2-31 Holographic grating formed by a divergent on axis source and a convergent off axis source. . . . . . . . . . . . . . . . . . . . . . . . . . . 2-32 Setup for the microscope objective system 79 . . . . . . . . . . . . . . . 80 2-33 Readout of the hologram with an 850nm VCSEL source . . . . . . . . 81 2-34 (a) Beam at large angle hits emulsion. partially blocked by objective (b) Beam at smaller angle . . . . . . . . . . . . . . . . . . . . . . 2-35 Setup for the single lens hologram writing system 2-36 How divergence angle is affected by beam size 82 . . . . . . . . . . . 83 . . . . . . . . . . . . . 84 2-37 (a)Output of microscope objective system hologram (b)Output of single lens system hologram . . . . . . . . . . . . . . . . . . . . . . . . . 85 2-38 The optical setup for the computer controlled beam splitter system 86 2-39 A photograph of the computer controlled beam splitter system . . . . 87 2-40 An example of a holographic element array . . . . . . . . . . . . . . . 88 2-41 A close-up photograph of the computer controlled stage . . . . . . . . 89 2-42 Desired read geometry . . . . . . . . . . . . . . . . . . . . . . . . . . 90 2-43 Output of hologram . . . . . . . . . . . . . . . . . . . . . . . . . . . . 91 2-44 Actual read geometry compared to desired read geometry . . . . . . . 91 2-45 Tolerance of hologram to deviation in angle 92 . . . . . . . . . . . . . . 2-46 (a)Close-up of divergent beam near the emulsion (b)How to measure distance D with a flipper mirror . . . . . . . . . . . . . . . . . . . . . 3-1 94 Illustration of how the material covered in the chapters fits in the architecture . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98 3-2 Cross-section of lateral pin unit cell . . . . . . . . . . . . . . . . . . . 101 3-3 Depletion width as a function of reverse bias voltage . . . . . . . . . . 101 3-4 Layout of the original lateral pin unit cell . . . . . . . . . . . . . . . . 102 3-5 Layout of the 75pmx75pm lateral pin 103 13 . . . . . . . . . . . . . . . . . 3-6 (a) Unit cell for the minimum HGaAs5 geometry (b) Full 75pmx75pim detector .......... 3-7 .................................. 104 (a) Unit cell for the strip pin photodetector (b) Full 72pum x 72pum strip detector .......... .................................. 105 3-8 Amplifier topology for PSG1 . . . . . . . . . . . . . . . . . . . . . . . 106 3-9 Transimpedance amplifier schematic . . . . . . . . . . . . . . . . . . . 107 3-10 Transimpedance amplifier layout . . . . . . . . . . . . . . . . . . . . . 108 3-11 Differential amplifier schematic . . . . . . . . . . . . . . . . . . . . . 109 . . . . . . . . . . . . . . . . . . . . . . . 110 3-12 Differential amplifier layout 3-13 Balanced-to-unbalanced circuit schematic . . . . . . . . . . . . . . . . 111 3-14 Complete 3-stage amplifier circuit schematic and layout . . . . . . . . 112 3-15 Layout of the full PSG1 chip . . . . . . . . . . . . . . . . . . . . . . . 114 3-16 Table showing input voltages on four different chips with input currents set to OA . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 116 3-17 (a)GaAs chip with two sided processing (b) UTSi SOS chip with all devices on a single side . . . . . . . . . . . . . . . . . . . . . . . . . . 3-18 Top level diagram of op-amp in transimpedance configuration 118 . . . . 120 3-19 Schematic for the two-stage op-amp with negative resistive feedback . 121 3-20 Transfer characteristic for PSG2 amplifier (Current input, voltage out- p u t) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 122 3-21 Derivative of the transfer function shown in Figure 3-20 . . . . . . . . 123 3-22 Close up of Derivative of the transfer function . . . . . . . . . . . . . 123 . . . . . . . . . . . . . . . 124 3-24 UTSi optoelectronic device . . . . . . . . . . . . . . . . . . . . . . . . 127 3-25 Lateral BJT photodetector layout . . . . . . . . . . . . . . . . . . . . 128 4-1 Block-level diagram of single neuron in the CONNP . . . . . . . . . . 130 4-2 The detector unit of a single neuron . . . . . . . . . . . . . . . . . . . 131 4-3 2-dimensional array of neurons with 4-way in-plane nearest-neighbor 3-23 Layout of PSG2 transimpedance amplifier connections . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 14 132 4-4 Block level diagram of new proposed neuron structure . . . . . . . . . 134 4-5 Bus based architecture with bus control block and individual neurons 135 4-6 Simplified block-level diagram of logic within neuron . . . . . . . . . 136 4-7 MATLAB model of a neuron . . . . . . . . . . . . . . . . . . . . . . . 138 4-8 MATLAB model of a layer of neurons . . . . . . . . . . . . . . . . . . 139 4-9 Simplifier MATLAB model of a layer of neurons reflecting matrix properties . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140 4-10 A 6 x 6 layer of neurons with each element numbered . . . . . . . . . 141 4-11 MATLAB model of a single CONNPP hardware layer . . . . . . . . . 143 4-12 Tansig function . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 144 4-13 Example of the hardware system to be simulated in MATLAB . . . . 145 4-14 Training patterns for the in-plane connection model simulation . . . . 147 4-15 Results for training simulation using six training patterns . . . . . . . 148 4-16 Training cycles required to store different numbers of patterns to a M SE of I X 104 . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15 150 16 Chapter 1 Introduction As an introduction to the main topic of this thesis, basic background on neural networks will be given. In addition to background, a discussion of the benefits of neural networks over traditional logic based computing for certain classes of problems will also be discussed. Next, the issue of computing interconnects will be examined. Traditionally, electrical wires of some sort have been used to connect computing nodes. However, as computing speeds increase, electrical interconnects are potentially less effective. Are optical interconnects a viable alternative to electrical interconnects in some cases? Optical interconnects have already taken the lead in transcontinental communication (fiber optics). They may well prove to be the next step in chip level interconnects also. After brief discussions of neural networks and optical interconnects, an overview of the Compact Optoelectronic Neural Network Processor Project (CONNPP) [1] will be given. The goals for this project as well as implementation details will be examined in order to provide the context for the work discussed in this thesis. Finally, a brief overview body of work completed and the following chapters of this document will be given. 17 1.1 Why Neural Networks? The human brain is in some ways one of the most complex and accomplished processors in existence. Yet, in other ways, it is inefficient, unwieldy, and slow. Humans are extremely adept at pattern recognition, visual navigation, and making inferences with incomplete data sets. However, a simple computational task such as multiplying three digit numbers in our heads can confound even the most brilliant person. The fact of the matter is that our brains are not wired for mathematical computation. As an evolving species, mathematical computation was not a feature that helped humans survive. It was much more important to be able to recognize friend from foe than it was to multiply three digit numbers. In order to compensate for the lack of computational prowess, people have sought to design and build machines to help them overcome this weakness. These machines which started as simple calculators with a single accumulator have grown into extremely complex general purpose computers with multiple processing units and vast amounts of memory. Although these processors have increased their capabilities considerably', they all still use the same basic theory of operation. By converting all data within the processor to binary format, Boolean algebra (AND, OR, NOT, etc.) is then used to implement all the complex operations within these traditional logicbased processors. This has been extremely successful and has produced applications which could not even be conceived of when the pioneers were designing the first calculator-like processors. Although these logic-based processors can be used for a variety of applications, there are several classes of problems with which traditional processors have trouble. These problems have not been able to be reduced to a set of simple logic-based rules that can be used to solve them. A few of these problems include speaker-independent voice recognition, natural language processing, and orientation and scale independent visual object recognition. While computer scientists have been trying to reduce these 'Over the past fifty years, traditional logic-based processors have increased their computational power by almost seven orders of magnitude. ENIAC could carry out 357 floating point operations per second (FLOPS) while today's Intel Pentium 4 processor can do over 2x10 9 FLOPS 18 problems to logic-based rules for over thirty years, the results in many cases are still unacceptable. Ironically, these classes of problems which logic-based processors have difficulty with are some of the same problems for which the human brain is seemingly optimized. A perfect example of this is speaker-independent voice recognition. This task consists of taking strings of phonetic sounds and parsing them into the words to which they correspond. This task is further complicated by the variation in dialect and individual speech patterns between speakers. A particular word might sound very different when spoken by a person from the United States than when spoken by someone from the United Kingdom. As English speakers we are able to understand these speech pattern differences and still recognize the words that are being spoken. Logic-based computers, however, have a very difficult time with this task. Although millions (if not billions) of dollars of funding and research has gone into solving this problem with traditional logic-based processors, the results have not been encouraging. After decades of research on this specific topic, there has yet to be produced a logic-based processor system which can even come close to matching the speaker independent voice recognition capabilities of a child. In light of this fact, researchers began looking at the human brain for inspiration in creating processors to handle these types of problems. The human brain is an extremely complex biological system which is far from being completely understood. Even so, it is still possible to use those workings of the brain which are understood to design and build models replicating a very limited subset of its functionality. Artificial neural networks (ANNs) seek to model the low-level functionality of the brain either programmatically or through hardware. The basic processing unit of the brain is a single celled structure called a neuron. Each neuron has many inputs from other neurons and also many outputs to other neurons. These inputs and outputs allow the neuron to communicate with other neurons through chemical interactions. Communication occurs by chemical reactions within the neuron that slightly alter the electrical potential of the inputs and outputs connecting to different neurons. 19 The basic model of a neuron consists several parts. In the first step, weighted inputs are presented to the neuron in the form of potentials. The neuron sums these potentials presented to the inputs. It then uses this summed value as the input to a function (f(x)). This function can take many forms such as a sigmoid function or linear function. The neuron then sets the potential of its outputs to reflect the result of this function. The output lines (as well as the input lines) have weights which determine how strong the connection is between the neurons. An artificial neuron seeks to replicate this functionality either in software or hardware. A diagram of such an artificial neuron can be seen in Figure 1-1. W2 3 U2 U3 3 x Figure 1-1: Illustration of one model of an artificial neuron In this diagram, xi represents the zth input to the neuron and wi represents the weight of the connection between these neurons. Each input (xi) is multiplied by its connection weight (wi) and these values are summed. This sum is used as the argument for the transfer function f(x). The output of this transfer function (fi) is put on the outputs where it will be multiplied by the weights of those connections (Uj). These neurons, which act as independent processing units, are then connected together in a dense network. With each neuron being able to communicate with a large number of other neurons, information is able to propagate through the network efficiently. Furthermore, the connection weights allow each neuron to know how 20 important each piece of information is that it is receiving. With relatively simple processing units, this dense interconnection network provides much of the power of both biological and artificial neural networks. In artificial neural networks, these artificial neurons are many times connected together in layered structures. Each layer might have a different transfer function or different interconnection scheme. When there is a layer structure present, there are three distinct types of layers; an input layer, hidden layers, and an output layer. The input signals to the neural network are connected directly to the input layer neurons. The input layer is then connected to the first hidden layer. The first hidden layer is in turn connected to the second hidden layer. This proceeds until the last hidden layer is connected to the output layer. The output layer then presents the output of its neurons as the result of the neural network's computation. An example of the layer structure can be seen in Figure 1-2. Input Layer Hidden Layer 1 Hidden Layer n Output Layer Figure 1-2: Layer structure of an artificial neural network One of the most important characteristics of neural networks is their ability to learn. While traditional logic-based processors are programmed with explicit instruc- 21 tions on how to accomplish a particular task, neural networks are trained. To do this, a selection is made of inputs for which the outputs are known. These inputs are then presented to the input layer of the neural network. When the input propagates through the network and a result is available at the output, it is then compared to the desired output. If the actual output exactly matches the desired output, nothing is changed within the neural network. In the more likely event that the actual output does not exactly match the desired output, the error between the actual output and the desired output is calculated. This information is then used with a particular training algorithm to slightly change the connection weights. The connection weights are modified in a manner to make it more likely the desired output is produced the next time that input is presented to the neural network. The network is generally trained repeatedly with different input/output data sets. The goal of the training is to minimize the error between the actual output and the desired output. When an acceptable level error has been achieved, the network can then be presented with inputs that have unknown outputs. If the training data is sufficiently representative of the total data set and if the network has been effectively trained, the neural network will produce a correct output when presented with an unknown or incomplete input. This ability of neural networks to operate effectively with noisy, slightly different, or incomplete data is extremely important. This is one of the main features that distinguish neural networks from traditional logic-based processors. Logic-based processors must use strict sets of incredibly complicated mathematical and statistical rules in order to work with data input that is not presented as expected. Through their numerous simple processing units, a dense network of connections, and the ability to learn, neural networks are much more adept at solving problems which require some amount of inference. The problems discussed previously such as speaker independent voice recognition and visual navigation are problematic for logic-based processors precisely because they do require some level of inference. As a result, solving these classes of problems is exactly the goal of artificial neural network research. 22 1.2 Electrical vs. Optical Interconnects Put most simply, interconnects transfer information between computing nodes within a system. These computing nodes may be logical blocks within a microprocessor, the CPU and system memory, or even remote systems. As more and more nodes, running at faster and faster speeds, are fit into smaller spaces, the electrical interconnects currently in use have significant trouble scaling in both density and speed. One example of this is system buses connecting the CPU and system memory in server and desktop computing systems. The current state of the art has the system bus running at less than 20% of the core CPU speed (Figure 1-3) [2]. This represents a major bottleneck in transferring data within a PC. Clock Speed vs. year forx86 CPUs and Buses 3500 3000 2500 2000 ----- CPU Speed Bus Speed 1000 500- 04 1978 1980 1982 1984 1986 1988 1990 1992 1994 1998 1998 2000 2002 Year Figure 1-3: CPU and system bus speed as a function of time The fact of the matter is that there are significant limitations to electrical interconnects that are starting to be approached. One of the most critical of these limitations is aspect ratio considerations of electrical interconnects. It is well known that all non-superconducting wires have both resistance and capacitance determined by the cross-sectional area and length of the wire. This RC time constant therefore determines the fastest speed at which the interconnect can be run. For a typical interconnect made of aluminum or copper, at room temperature, the bandwidth limitation (B) in bits/s is as follows 23 B 2 10o6A bits/s. (1.1) Where A is the cross-sectional area of the wire and I is the length of the wire [3]. This may not seem so bad until it is considered that the cross-sectional areas of on-chip wires are on the order of micrometers squared while the length of the interconnects can be as long as centimeters. With CPU clock speeds extending into the gigahertz range and process technologies shrinking into the nanometer range, these aspect ratio considerations represents a formidable problem. Timing and synchronization within microprocessors represents another area where electrical interconnects are running into issues. System clock skew is defined as the difference in arrival time of the system clock signal between two registers. This difference in arrival times of the system clock can result in race conditions and malfunctioning logic [4]. Even though much care is given to designing microprocessors such that nodes are equidistant from clock drivers (using hierarchal H-trees), clock skew still remains a problem. Much of this clock skew is caused by thermal effects. The resistivity of the electrical interconnects varies significantly with temperature. As different logical blocks are switched on and off, different interconnects are heated up, causing potential clock skew [5]. Industry standards traditionally dictate that the clock skew remain less than 10% of the total clock cycle [6]. As the speed of microprocessors increases, the clock period becomes smaller. As a result, the clock skew becomes more of a problem as it represents an increasing portion of the clock cycle. In addition to the problem of clock skew, power consumption of the is also a major consideration. Today's typical x86 based processors can consume between 60 and 100 Watts of power with as much as 40% of that power going to the clocking and synchronization circuitry. This power causes significant heating of the microprocessor which can lead to the thermal-based resistivity changes and transistor threshold variation. Once again, as process technology scales down, designers are able to pack more and more logical blocks on a chip, all of which need to be synchronized with a 24 system clock. As a result, power consumption of clock generation and distribution is a continuing problem. Optical interconnects represent a complete paradigm shift that can reduce or eliminate many of the previously discussed issues faced by electrical interconnects. Optical interconnects, in the most general sense, use light instead of electrical wires to transfer information between two computing nodes [7]. The information is usually encoded as an intensity modulation. The simplest optical interconnect consists of a light emitter, the medium through which the light propagates, and a receiver to detect the optical signal. The emitting device is most usually a laser or LED, though technically it can be any light emitting device. The light can be modulated either directly by the emitter (i.e. modulation the drive current of a diode laser) or by an external modulation device. The propagation medium can be either free-space or a dielectric. Many times, a device will be used to direct the light to a particular place through the propagation medium. This can be done with devices such as waveguides, MEMS mirrors, or holograms. Finally, the receiver of the optical interconnect most generally a photodetector which converts the intensity modulated optical signal into a electrical current modulation. Because optical interconnects use fundamentally different mechanisms of operation, they are able to overcome some of the inherent shortfalls of electrical interconnects. For instance, the bandwidth limiting aspect ratio problem in electrical interconnects discussed previously does not exist in optical interconnect systems. In fact, for well-engineered optical materials, there is negligible loss or distortion of the optical signal over the distances in question. Consequently, as the process technologies yield smaller features, the potential bandwidth of optical interconnects remain unaffected. The timing and synchronization issues faced by electrical interconnects can also be improved upon with the use of optical interconnects. As stated above, even though microprocessors are generally designed such that clocked registers are equidistant from the clock drivers, thermal effects can still cause impedance mismatch that results in large clock skews. While the index of refraction of the propagation material 25 (somewhat analogous to the resistivity) may change with temperature, it does not cause the nearly the same magnitude of clock skew as would be seen in an electrical interconnect system [5]. Furthermore, optical interconnects may be able to utilize more exotic geometries that would almost completely eliminate the non-uniform heating of the propagation medium. Currently, it is difficult to get high speed signals in and out of a microprocessor due to the large capacitances of the bond pads, bond wires, and package leads. As a result, the clock must be generated on-chip for modern microprocessors. This directly contributes to the large power consumptions discussed above. Optical interconnects allow the possibility of generating a high-speed clock signal off chip and bringing it on-chip using light. In this scenario, the clock speed is only limited by the response of the emitting and detecting devices. Additionally, moving clock generation and distribution off-chip frees up significant amounts valuable chip real estate. In electrical clock distribution, significant redesigns are necessary when new logical blocks are added or a process shrink is done. These redesigns are needed in order to accommodate impedance matching and signal reflections so that synchronization is performed correctly [5]. Due to the fact that these effects are absent in an optical implementation, the optical system may be able to operate at either 5 MHz or 5 GHz without any redesign. While optical interconnects do have many benefits over electrical interconnects, there are still issues to be resolved before wide spread use of the technology is seen. The largest hurdle in the adoption of optical interconnects is incompatibility with silicon based processes. Silicon is an inherently poor optoelectronic material due to its indirect bandgap structure. While silicon photodetectors perform moderately well, silicon light emitters are in their infancy. As a result, other optoelectronic devices must be grown on or bonded onto chips. Widespread acceptance of optical interconnects would require retooling of fabrication facilities and major process flow changes for the major foundries such as TSMC, IBM, or Intel. Other issues which confront optical interconnects include quality of optoelectronic devices and lack of optomechanical interface hardware. 26 1.3 The Compact Optoelectronic Neural Network Processor Project The goal of the Compact Optoelectronic Neural Network Processor Project (CONNPP)?? is to produce a small, rugged co-processing unit that can be used in conjunction with a standard PC. The PC will be able to offload appropriate tasks such as pattern recognition, visual navigation, or sensor fusion to this co-processing unit. Furthermore, the co-processing unit will be optimized, using neural network architectures, to solve problems of this nature. First, some of the operational characteristics of this coprocessing unit will be discussed. After that, the specifications and implementation of this processor will be examined in more detail. The idea of optoelectronic neural network processors is by no means a new one. There have been many proposed and implemented optical neural network architectures over the past twenty years [8-10]. One characteristic that these implementations share is that they take up the better part of an optical table in terms of physical size. While this size is perfectly acceptable for research purposes, it is less than practical for commercial systems. For this reason, one of the goals of the CONNPP is to make its optoelectronic neural network processor the size of a CD-ROM drive. This will be accomplished through the use of monolithically fabricated optoelectronic integrated circuits (OEICs). Advances in the integration of optoelectronic devices with standard CMOS electronics have only recently made this level of integration possible. MIT Professor Clifton Fonstad, along with his group, have pioneered a process known as Aligned Pilar Bonding (APB) that allows optoelectronic devices to be grown on separate substrates and subsequently bonded onto standard CMOS chips [11-13]. This, as will be seen, opens up a realm of possibilities in terms of new and innovative systems that can be built. One of the key features of the OEICs designed and built for the CONNPP is that they will accept optical input from one side of the chip and provide optical output from the other side of the chip. This allows the OEICs to be cascaded in the third 27 dimension. For the past thirty years, microprocessors have been fabricated on a single 2-dimensional substrate and connected via system buses to memory and peripherals. In the last several years, researchers have begun to think about cascading processors in the third dimension. Groups in both academia and industry are actively investigating the possibility of interconnecting these 3-dimensionally cascaded processors either electrically or optically [14]. The Compact Optoelectronic Neural Network Processor Project seeks to leverage the benefits of optical interconnects in order to create a very dense system of inter-chip interconnects (almost 15000 channels/cm2 ). Neural networks are a prime candidate for 3-dimensional cascading because of their interconnection density requirements. In order for neural networks to work effectively, the neural nodes must be able to communicate with a significant number of other neurons [15]. The most effective neural network is a globally connected network where every neuron is connected to every other neuron. While global interconnects are very effective, they are also quite impractical. For global interconnects, the number of connections needed goes as n2 where n is the number of neurons in the network. As a result, a less dense interconnect scheme for neurons is necessary to make the systems more practical. In the human brain for example, each neuron connects to approximately 10,000 other neurons. While this may seem like a large number, it is actually only .0001% of the 10 billion neurons in the brain. Interconnection schemes between neurons both within the plane of the OEIC and between OEIC planes are a major consideration for the CONNPP. This is one of the central topics addressed by this thesis. 1.3.1 High-level overview of the CONNPP As discussed previously, optics can offer significant benefits in some cases over electronics when it comes to interconnect technology. Processing and computation, however, are a different matter. Over the past thirty years, millions of man-hours and billions of dollars have gone into integrated circuit design and research. The results of this time and money have increased the processing power per square centimeter of integrated circuits by many orders of magnitude. While many optical neural network 28 architectures and implementations have used optics for both processing and communication [9,16,17], it was decided to use the power of electrical processing units coupled with the benefits of optical interconnects thereby utilizing the best of both worlds. Artificial neurons like those described in Section 1.1 will be created on a 2dimensional OEIC chip using circuits. Neurons within this 2-dimensional array are electrically connected to one antoher. Because the chips have been fabricated to accept optical input on one side and produce optical output on the other side, they can be cascaded in the third dimension. This allows the neurons in one OEIC plane to communicate optically with neurons in the next OEIC plane. Holographic elements are used to steer the light between inter-plane neurons. Figure 1-4 shows a simplified version of the cascaded processor. Optically Neutral Emitter Plane Detector Plane SpacerOptical Signal OEIC Holographic Element Plane Figure 1-4: A simple example of the layer structure of the CONNP In this figure, the layered structure of the Compact Optoelectronic Neural Network Processor (CONNP) is seen. A version of the processor with only nine neurons per layer (3 x 3) is shown for visual simplicity. Each "layer" of the processor consists of four planes: a detector plane, an emitter plane, a holographic element plane, and an optically neutral spacer. The detector and emitter planes are fabricated on a 29 single substrate and can be grouped together as the optoelectronic integrated circuit (OEIC). After the emitter plane, there is a holographic element plane that directs and structures the light beams from the emitters. Finally, there is an optically neutral spacer that allows the beams passing through the holograms to propagate at the correct angles to reach their destination on the following detector plane. This four plane "layer" can then be cascaded. Figure 1-4 shows three such layers. As was mentioned previously, in addition to being compact, ruggedness is also a requirement for the CONNPP. It would be very undesirable for the processor to come out of alignment when moved or shipped. In order to make the processor rugged, the layers will be precisely aligned and then cemented into place using an optically neutral glue or epoxy. Once all the desired layers are cemented into place, the processor will no longer be subject to possible misalignment when it is bumped or jarred. In addition to the optical interconnects between layers, there are also electrical interconnects to the the OEICs. These electrical interconnects allow the CONNP to interact with the PC controlling it. In addition to presenting input to the CONNP, the PC can potentially be responsible for training the neural network. During training cycle, the input will be presented to the CONNP and it will produce an output. This output can then be evaluated by the PC and the corrections to the connection weights can be calculated. These new updated weights can then be scanned into the CONNP for the next training cycle. Additionally, this allows the PC to store connection weights for the CONNP after it has been fully trained. By doing this, the collection of weights produced for a particular training session can be loaded just as a program is loaded by a traditional processor. 1.3.2 Low-level overview of the CONNPP Once the basic structure of the CONNP is understood, it can be examined at a lower level to uncover some of the challenges faced in its design. It is useful to examine a single neuron in order to better understand what is happening. In Figure 1-5, a single neuron view of each plane is shown. The detector plane for a single neuron consists of a single photodetector. 30 The Photodetector Detector Plane VCSEL Emitter Plane Hologram Holographic Element Plane Figure 1-5: Single neuron view of the CONNP planes emitter plane, on the other hand, has nine vertical cavity surface emitting lasers (VCSELs). The beams if these nine VCSELs then pass through nine separate holograms in the holographic element layer. The number nine comes about due to the optical interconnect scheme used. In order to fix the number of inter-plane optical interconnects while still taking advantage of optical density benefits, an interconnect scheme known as "nearestneighbors" has been chosen for the CONNPP. This means that a neuron in one layer can communicate with its nearest-neighbor neurons in the next layer. As the neurons are laid out in a grid structure, this allows a neuron in layer 'A' to communicate with nine neurons in layer 'B'. Illustrating this concept, Figure 1-6 shows a onedimensional cross section of a single neuron communicating with multiple neurons in the next layer. Extending this one-dimensional symmetry to two-dimensions, the 1:9 fan-out ratio can be seen. Another important characteristic of the individual neuron to discuss is its size. The specification for the CONNPP calls for the artificial neurons on the OEIC to be 250pmx250pm. This leads to a neuron density of about 1600 neurons/cm 2 . As mentioned, each neuron contains nine VCSELs, a photodetector, and electronic circuitry. While the size of the VCSELs has not yet been determined, the size of the photodetector will be between 40pmx40pm and 75pmx75pm. 31 Furthermore, each Detector VCSEL ~ Hologram Optically Neutral Spacer Figure 1-6: One-dimensional cross-section of a single neuron communicating with multiple neurons in the next layer neuron also has nine individually written holograms associated with it. This means that the size of the individual holograms is approximately 83pm x 83pm. 1.4 Thesis Summary The remainder of this thesis consists of three chapters encompassing the body of work completed. Chapter 2 explores holographic interconnects for the Compact Optoelectronic Neural Network Processor. Background on holograms is given, analytical holographic writing models are developed, and several systems are tested. Chapter 3 focuses on circuits and optoelectronic devices for the CONNPP. In this chapter, the two chips designed and their constituent components will be discussed. Finally, Chapter 4 examines a newly proposed hardware connection model for the artificial neurons in the CONNP. The suggested hardware architectural changes will be discussed along with neural network simulations showing the impact of these architectural changes. In 32 all sections, suggestions for future work on each of these topics will also be included. 33 34 Chapter 2 Holographic Interconnects As discussed in Chapter 1, micro-fabricated wires have traditionally been used as interconnects within microprocessors; however, as the clock speed and complexity of microprocessors increases exponentially, these wire interconnects have a difficult time keeping up. The parasitic capacitance and resistance of the wires limits the clock speed at which these wire interconnects can be driven. Likewise, the large distances that the interconnects must travel relative to the clock period and propagation velocity of the signal can cause very large skews in the arrival times of signals. Holographic interconnects allow for the possible elimination of many of the challenges that plague traditional wire interconnects. Section 2.1 will give the reader the necessary background on holograms and how they are fabricated. The research presented here fits within the larger context of the Compact Optoelectronic Neural Network Processor Project which relies heavily on holographic interconnects. Section 2.2 will build on the CONNPP overview given in Chapter 1 and show how holographic interconnects are implemented Using some of the implementation-based assumptions made in Section 2.2, a model will be developed in Section 2.3 that will determine the conditions needed to fabricate the holographic interconnections for the Compact Optoelectronic Neural Network Processor Project. Three different setups were tested to fabricate holographic interconnections. Section 2.4 will present these different setups as well as show results from the holograms 35 created. Finally, Section 2.5 will synthesize the learnings from the previous sections into suggestions for future work. 2.1 Holographic Interconnect Overview Holograms are the fundamental element of holographic interconnects. As such, a brief introduction to holograms and their fabrication will be given. Various systems in which holographic interconnects are useful will be shown. 2.1.1 What are holograms? When an optical wavefront interacts with an object, the wavefront's amplitude and phase is changed based on the physical structure of the object. The complex wavefront resulting from the wavefront-object interaction now contains a vast amount of information about the structure of the object (Figure 2-1). It is this structural information encoded on the wavefront that the human eye is able to interpret as the visual characteristics of an object (i.e. shape, texture, relative size, etc.). Incident Optical Wavefront Reflected Optical Wavefront Figure 2-1: A plane wave reflects off an object and the reflected wave then contains information about the structure of the object. It is important to note that the human eye is only able to detect the intensity of the wavefront, not the phase. Also, because of its small size, the eye is only able 36 to capture a very small part of the total wavefront intensity at once; hence, it only receives a very small amount of the total information contained in the wavefront about the object at any given moment. When the eye moves to a different position however, it now intercepts a different portion of the wavefront and receives other information about the object. Having two eyes allows humans to obtain two separate portions of the wavefront simultaneously. The brain is then able to fuse the information from each eye into a single scene with more information than either of the constituent images. Three-dimensional perception is a direct result of the ability to fuse the two sets of wavefront information captured by the eyes. A camera is analogous to a single eye. Like the eye, it has a very small aperture through which part of the wavefront can pass. Therefore, the camera captures a small portion of the wavefront at a single moment in time and records the intensity of that portion of the wavefront on film (Film, like the eye, can detect intensity, not phase). As a result, only that particular view, from that particular moment can be recovered from the photograph no matter where the eye moves with respect to the photograph. A hologram, on the other hand, is fundamentally different from a photograph. Whereas the photograph only contains intensity information, a hologram contains both intensity and phase information about an object. For a photograph, it was said that because only a small portion of the wavefront passed through the aperture of the camera, only a very small portion of the information was able to be recorded. Because holograms are recorded in a fundamentally different way, the hologram is able to record a large part of the information contained in the wavefront. Another way to think about holography is as wavefront reconstruction. If an optical wavefront was produced that was identical to the reflected optical wavefront in Figure 2-1, it would appear to the eye (or any recording medium) as if the object was in the same position as Figure 2-1. This is precisely what a hologram does. It takes an optical wavefront as its input and it outputs a reconstructed wavefront with properties identical to the object's reflected optical wavefront (Figure 2-2). Therefore, just like the object, the hologram is able to produce an image that has 3-dimensional qualities such as depth of field and parallax. 37 Incident Optical Wavefront Hologram Reconstructed Optical Wavefront Figure 2-2: A plane wave is incident on the hologram and the reconstructed object wavefront results 2.1.2 How are holograms made? As stated previously, holograms and photographs are made in fundamentally different ways. Photography relies on ray optics and simple imaging properties of lenses. Holography, on the other hand, relies on the use of wave optics along with interference and diffraction effects. This is why holograms, unlike photographs, are able to record both intensity and phase information. For now, a very high level explanation of how holograms are made will be given. A more rigorous treatment dealing with the theory and mathematics behind making holograms will be given in Section 2.3. To make a hologram, two mutually coherent optical waves, a reference wave and an object wave, are interfered in the plane of an intensity sensitive emulsion (Figure 2-3). The reference wave is usually a plane wave, while the object wave is a wave containing phase and amplitude information about an object (i.e. the reflected optical wavefront from Figure 2-1). As discussed before, photographic and holographic emulsions are not able to directly record phase information. However, by interfering the reference wave and the object wave, the phase information contained in the object wave is transformed into amplitude information and is able to be recorded in the emulsion. Due to the fact that the emulsions are extremely light sensitive, all light except the object and reference waves must be eliminated from the system. As a result, 38 Beam Splitter Reference Wave I Obj ect Object Wave Holographic Emulsion Figure 2-3: The object and reference waves interfere at the holographic emulsion to record a hologram. holograms are usually recorded in a dark room. When the room is completely dark, the light source providing the coherent waves (usually a laser) is turned on and the emulsion is exposed to the interference pattern. When the photographic emulsion has been exposed for the proper amount of time and chemically processed in the correct manner, the interference pattern will be recorded in the emulsion. To playback the hologram, the reference wave is input into the emulsion. If the hologram were an ideal Bragg transmission hologram made with a thick emulsion (Discussed further in Section 2.3), the object wave would be present on the opposite side of the emulsion. This is the situation depicted in Figure 2-2. 2.1.3 Holographic interconnects explained Holographic interconnects are a specific implementation of optical interconnects. The same basic emitter, propagation medium, detector structure that defines optical interconnects in general is also used for holographic interconnects. The characteristic that distinguishes them as their own class of optical interconnect is a holographic element in the propagation medium portion of the system. This holographic element can be used to direct the light wave, change the structure of the light wave, or encode 39 additional information in the light wave. These operations are quite important to optical interconnects as a whole. Detector Emitter Holographic Element Figure 2-4: A simple holographic interconnect with a free-space propagation medium Changing the direction of a light wave is one of the most crucial functions to optical interconnects in general. While there are many different ways to do this, holograms provide some of the most elegant solutions. It is well known that light (a plane wave for example) travels in a straight line unless acted upon by an outside optical element. For instance, a mirror use simple laws of reflection (angle of incidence equals angle of reflection) to send the beam in the desired direction. While a mirror is the most basic element used to change the direction of a beam of light, it is not always the most practical element for the job. Imagine a situation where there is a significant density of beams to be sent in different directions as in Figure 2-5(a). This is a situation where the emitters are on one chip and the detectors are on another chip. The goal is to direct the light from emitter "A" to detector "A", from emitter "B" to detector "B", etc. While it might be possible to place mirrors in the beam paths to correctly map the emitters to the detectors (Figure 2-5(b)), it would most likely be impractical. If separation of the beams is small (tens or hundreds of microns), placing and aligning the four mirrors in the individual beams would be an extremely difficult and time-consuming task. Another alternative is the use of holograms. A hologram can be made using previously discussed techniques to direct the different beams in different directions. It can then be placed in the paths of the beams and aligned as a single element 40 (Figure 2-5(c)). Emitters A B Detectors 0, C o A A B C p D C DI 0 B D (a) Hologram A Mirrors f C A B (b) C B B (c) Figure 2-5: (a)Emitters and detectors without any light steering element (b)Mirrors as a light steering element. Notice that detectors and emitters now mapped correctly. (c) A single hologram as the light steering element Because holograms affect the phase of an incoming light wave, it is possible to cause structural changes in the wave. These structural changes are known as beam conditioning or wavefront shaping [18]. For instance, if an optical signal were being broadcast to a large number of nodes, it might be useful to adjust the divergence of the beam in order to reach all the nodes with a single beam. A beam conditioning hologram would be able to accomplish this. Because holograms are able to encode both phase and amplitude information, it is possible to use them to modulate the amplitude of the resulting wave. The limitation is that the modulation imparted to the optical wave is static, or not changing with time. Once a hologram is written, it will only be able to amplitude modulate an incoming beam in one particular way. While this might not seem useful at first glance, one can imagine situations where the interconnection hologram could represent a cryptographic hardware key. Without the correct combination of amplitude modulations for the various interconnects, the device in question would not be able to function. Additionally, holograms are able to perform the functions described above simul- 41 taneously. This increases the class of problems that holographic interconnects are suited to solve. In the next section, a system for which holographic interconnects are ideally suited will be explored. 2.2 Holographic Interconnections for the Compact Optoelectronic Neural Network Processor Project In Chapter 1 an overview of the Compact Optoelectronic Neural Network Processor Project (CONNPP) was given. This section will focus on giving the reader a more in depth view of the holographic interconnect elements described in the overview. Specific challenges facing the holographic interconnect elements for the Compact Optoelectronic Neural Network Processor Project will also be discussed (i.e. non-idealities). device Later sections will show ways in which to overcome some of these challenges. Having seen the basics of the compact optoelectronic neural network processor project architecture, some of the components that make up the holographic interconnect (VCSEL, hologram, and detector) can be examined in more detail. 2.2.1 Vertical cavity surface emitting lasers (VCSELs) While extremely small semiconductor lasers have been available for many years, they have suffered from some distinct shortcomings. One of the most significant short- comings that these devices display is the fact they they are edge-emitting. This edge-emitting characteristic impacts the traditional semiconductor laser in several ways. First, the fact that the coherent light is emitted from the edge of the structure (parallel to the substrate) means that the devices must be cleaved from the wafer [19 (Figure 2-6). This limits the ability to produce two-dimensional arrays of the devices. Secondly, because of the geometry of the laser cavity, these lasers generally produce an elliptical beam that must be conditioned with non-spherical optics (i.e cylindrical lenses). 42 Electrical Contact Figure 2-6: Edge emitting semiconductor laser Vertical cavity surface emitting lasers are designed in such a way that these problems are eliminated. First, as the name implies, the light is emitted in the verti- cal direction (normal to the substrate) rather than from the edge of the substrate. This can be seen in Figure 2-7. The fact that the light is emitted vertically al- lows two-dimensional arrays of VCSELs to be easily fabricated. Also, because the contact/aperture of the VCSEL is patterned on the top of the structure via photolithographic means, the output beam can be circular. Light Top Dielectric Mirror z P-contact { Active Region Bottom Dielectric Mirror N-contact Substrate Figure 2-7: Structure of a vertical cavity surface emitting laser (VCSEL) While some of the problems associated with edge-emitting lasers have been solved 43 I by VCSELs, some still remain. One such problem is beam divergence. Because these devices have extremely small apertures (on the order of 1-10pm) [19] diffraction effects come into play. As is known from diffraction theory, as the aperture size decreases, the divergence increases [18]. For the initial development phase of the CONNPP, two VCSEL arrays were obtained from the U.S.-Japan Joint Optoelectronics Project (JOP) run by the Optoelectronics Industry Development Association (OIDA). These VCSEL arrays contained 64 individually addressable VCSEL structures in an 8 x 8 configuration on a 250pm pitch operating at 850nm. Also received from JOP were several laser driver chips designed especially for VCSELs. One of the most critical aspects of VCSEL performance for holographic interconnects is the beam profile. The beam profile takes into account both the divergence of the beam and the mode structure of the beam. Figure 2-8 shows the beam profile from one of the VCSELs in the OIDA/JOP 8 x 8 array. Each of the five pictures shown is the same VCSEL with a different input current. The power intensity gradually increases from Figure 2-8(a) to Figure 2-8(e). In looking at Figure 2-8, the most noticeable characteristic is that the mode structure of the VCSEL changes as the power is increased. In Figure 2-8(a), there is only a central bright spot which looks similar to a Gaussian distribution. As the power is increased, the mode structure evolves to a two lobed pattern in Figure 28(b), and eventually in Figure 2-8(e) to a ring-like pattern with a local minimum in the center. While the change in modes can be recognized in Figure 2-8, the effect was even more pronounced when viewed in person. Also of note is the fact that the transition between the different modes shown in Figure 2-8 is not completely smooth. As the current into the VCSEL is gradually increased, the pattern will change subtly. At some points, however, the VCSEL appears to "snap" to a completely different mode pattern. The most likely explanation for the mode changing in general and the mode "snapping" in particular is that as the current into the VCSEL is increased, the VCSEL heats up. The VCSEL heating up causes the laser cavity to change shape, supporting different modes. As the device heats up and expands, when a particular 44 Figure 2-8: Beam profiles of a single VCSEL at different operating powers mode cutoff is passed, the VCSEL has the potential to "snap" into a different mode pattern. This changing mode structure of these VCSELs could have a very different impact on different types of systems. If these VCSELs were to be used in an imaging system that required them to be dynamically modulated, the changing mode structure could seriously distort the images produced. The application examined here, holographic interconnects, might be somewhat more forgiving. As long as the light emitted from the VCSEL hits the corresponding detector, the holographic interconnect accomplishes its goal. If the holograms to direct and shape the light are fabricated in such a way that 45 they direct as much light as possible to the detectors, the changing mode structure should not have much effect on the performance of the holographic interconnects. The other part of the beam profile to examine is the divergence angle. Looking at Figure 2-8, it appears as if the divergence increases with power. In this situation, the divergence and the mode structure seem to be somewhat coupled. In Figure 2-8(a), the beam appears to be a simple Guassian. As a result, the intensity of the beam trails off significantly as the angle from the center of the beam increases. On the other hand, examining Figure 2-8(e), it can be seen that there is local minimum near the center of the beam and most of the energy is concentrated towards the outside edge of the beam. This modal change in intensity distribution will result in a change in observed divergence angle. The question becomes how does this observed change in divergence angle affect how the holograms are designed. Hologram Beam A Beam B ----- Figure 2-9: Two beams from the same origin point with different divergence angles hitting a hologram If the position of the VCSEL is fixed in the system and the hologram is fabricated to accommodate the largest observed divergence, a change in divergence angle should not affect the system. This situation is illustrated in Figure 2-9. Two diverging beams ("A" and "B") with the same point of origin but different divergence angles are shown hitting the hologram. As can be seen in the figure, the spatial frequency components that make up beam "B" are a subset of the spatial frequency components that make 46 up beam "A". As a result, if the hologram works effectively for beam "A", it must also work effectively for beam "B". The beam with the greatest divergence (Figure 2-8(e)) is used as the divergence of the VCSEL. A half-angle divergence of - 7.3' was measured. This value was calcu- lated by determining the width of the beam at the 1/e 2 point. Therefore, according to the previous analysis, if the hologram is created to accommodate a read beam with a half angle divergence of ~ 7.30, the hologram should work well for the various modes and divergence angles encountered. 2.2.2 Spacing between emitter planes and holographic elements Because of the non-zero divergence angle of the VCSELs used in the CONNPP, there will be definite limitations on the spacing between the emitter plane and the holographic element plane of the processor. The CONNPP specifies that the pitch of the neurons for the processor will be 2 5 0 ptm. From the previous discussion of the architecture, it is known that each of these 250pjmx 250pum neurons will contain nine VCSEL emitters. These nine VCSELs will then be directed and conditioned by nine individual holographic elements. These are the same nine holographic elements per pixel that are shown in Figure 1-5. Since the full pixel dimension is 250pm x 250pm, this means that the individual hologram dimensions must be 83pmx83pum. As the VCSEL is moved further away from the holographic element plane, the footprint of the VCSEL beam on the hologram will increase. This is shown in Figure 2-10. Notice that when the VCSEL is positioned at z1 , the beam footprint fits fully within the middle of the three holograms. When moved to z2 , the VCSEL beam now hits not only the middle hologram, but also the neighboring holograms. This will result in crosstalk between the channels. Given that the VCSELs discussed in the previous section had a half angle divergence of ~ 7.3', this means that the maximum allowed distance distance between the emitter plane and the holographic element plane is about 324pm or about 1/3 mm. Please note that the ~ 7.3 47 measurement Hologram r250Q m 83Ptm - ZI Figure 2-10: How distance of VCSEL from hologram affects crosstalk was the half-angle at the 1/e 2 point. If more of the beam needs to be confined to the 83pum hologram, the distance between planes without crosstalk will be less than 324pim. 2.2.3 Uniformity and linearity of VCSELs As the VCSELs will be used as analog outputs from one layer and analog inputs to another layer, it is important to examine the linearity and uniformity of the devices. To do this, an experiment was setup to measure the optical power as a function of current through the VCSEL. This experiment was done for three different VCSELs on a single 8 x 8 array chip. For each device the current was increased to its maximum value. The beam from the VCSEL was then focused through a lens, onto a power meter. By aligning the detector at its maximum intensity (i.e. maximum beam size) it is insured that the maximum amount of light will hit the detector at all intensities. The current was then reduced to OmA and swept to its maximum of 16mA. The results can be seen in Figure 2-11. While the three different VCSEL curves look reasonably close together, there are potential issues due to uniformity and linearity. In terms of uniformity, ideally all three curves would be identical and have a region that was perfectly linear. As can be seen in the graph, this is not the case. Raw errors of .32mA can be observed between different devices. By averaging the three power values for each current, a curve is 48 Optical power vs. Current for 3 VCSELs on the same chip 3.5 2.5 - l.'.I 0.5- 0 2 4 6 10 8 12 14 16 18 I (-A) Figure 2-11: Optical power emitted from three different VCSELs on a single chip as a function of input current obtained that best fits the data. Even in this case, errors as large as .18mW can be seen. This means that if these three devices were being used in a system, a maximum resolution of .18mW could be achieved. This means that due to uniformity issues alone, the VCSELs are limited to 5-bit resolution in the linear region between 6mA and 16mA. In order to look at the linearity of the devices, the data from 6mA to 16mA is again examined. This time, a linear fit is done for each curve. The standard error between the linear fit and the actual data is then calculated. For the three curves shown, the standard errors from their linear fits are 12.4%, 4.7%, and 2.7%. Please keep in mind that this is a best case scenario because a linear fit was done for each curve. If instead the power values are averaged at each point and the linear fit is then applied to this averaged curve, the standard errors for the three original curves will increase. Because of the 12.4% standard error in the linearity, the resolution is limited to about 3-bits. Because of these uniformity and linearity based resolution issues, it is important to keep very tight control over the performance of the VCSEL devices. The VCSELs tested here are older (1997) devices provided through a government program. As a result of this, they are no longer state-of-the-art. The VCSEL devices for the CON- 49 NPP will be fabricated in house. It will be important for the parameters discussed here to be tracked with the VCSELs fabricated in-house in order to optimize the performance of the holographic interconnects. As was discussed in Chapter 1, neural networks offer the benefit of being marginally fault tolerant. This fact may allow the CONNP to overcome some of the VCSEL accuracy limitations discussed above. A much more thorough neural network architecture study will be needed to determine which parameters the system is particularly sensitive to and which can be ignored. 2.2.4 Photodetectors As discussed previously, the detector plane of the holographic interconnect contains a single photodetector for each neuron in the plane. In the final CONNPP implementation, the photodetector is specified to be a gallium arsenide (GaAs) pin diode. Just as the VCSELs, the photodetectors will be fabricated on a substrate separate from the neural network circuits. Once the devices are fabricated, a process known as Aligned Pilar Bonding will be used to transfer the devices from the GaAs substrate to the silicon on sapphire substrate that contains the neural network circuits. The size of the photodetectors is not explicitly stated in the specifications of the CONNPP. There is a definitive tradeoff that can be made with the size of the photodetectors and alignability of the beams on the photodetectors. An important fact to remember is that according to the architecture, the beams from the nine nearest neighbors in the previous plane must be aligned through the holographic element and then onto the single photodetector. If the photodetectors are made large (i.e. 100pm x 100pm) it will be much easier to align the nine beams onto the single detector. A bigger target is easier to hit. On the down side, making the detectors large also takes up valuable chip real estate. With 100pumx 100pm photodetectors and 250Qmx250pm neurons, the photodetectors consume 16% of the total neuron area. This does not leave much room for VCSELs and neural circuitry. If the detectors are made very small (25pumx25pim) they consume only 1% of the total neuron area, but they will be much more difficult to align. Once a test system is built after the next 50 iteration of chips, a study will be done to assess the alignment accuracy achievable. This will in large part determine the necessary photodetector size. 2.2.5 Holographic element specifications The holographic interconnect system consists of three elements: the emitter, the holographic element, and the detector. Now that the characteristics of the emitters (VCSELs) and detectors (GaAs pin devices) are known, the holographic element specifications can be derived. The goal of each holographic element is to direct as much light from its corresponding VCSEL to its intended photodetector. The emitter plane and holographic element plane are aligned such that they are parallel. As the VCSELs emit light vertically, the beams from the VCSELs will be normal the holographic interconnect plane. As can be seen in Figure 1-6, most of the beams exiting the hologram are not normal to the plane. This means that the hologram must change the direction of the VCSEL beam. Photodetectors VCSEL Beams (b) (a) Figure 2-12: Two beams with the same energy, but different sizes hitting a photodetector It was shown previously that light emitted from the VCSELs can have up to a ~ 15' full angle divergence. It was also discussed that the photodetector sizes will be somewhere between 25pmx 25 pm and 1OOpmx1Opym. 51 However, the beam must propagate a minimum distance in order to travel laterally to hit the nearest neighbor photodetector. As a result, if nothing is done to condition the beam, the divergence will cause the beam to be significantly larger than the photodetector. In some cases, this can result not only in a degraded signal at the photodetector, but also crosstalk among the channels. For this reason, it is important to also use the hologram to correct the divergence of the beam. While it might work to only correct the divergence (i.e. convergent. plane wave output), it might prove useful to make the beam By focusing the beam onto the detector plane, maximum alignment flexibility is achieved. This is illustrated in Figure 2-12. In Figure 2-12(a) larger beam is seen incident on the photodetector. The larger beam is misaligned by a distance D. As a result, a significant portion of the beam does not hit the detector. In Figure 2-12(b) the same amount of energy is incident on the detector plane, but it is focused to its minimum size. Once again beam is misaligned by a distance D. Because the beam is so much smaller, all of the incident light still hits the photodetector. For this reason, it is best to have the beam focused in the plane of the photodetector for the production system. Holographic element Photodetector VCSELH Figure 2-13: Single holographic interconnect With this information, a complete picture of a single holographic interconnect can be developed. The VCSEL will emit a divergent beam normal to the plane of the hologram. The hologram will then redirect the angle of the beam and focus it onto the plane of the detector. This can be seen in Figure 2-13. Now that a specification 52 for the holographic interconnect has been developed, a model can be designed to determine exactly how to write holograms like this. 2.3 Modelling Holographic Interconnections Modelling holographic interconnections begins with understanding the physics and math behind interference theory. From that, it can be understood mathematically how simple plane wave holograms are created. The Bragg condition and how it relates to both writing and reading holograms will be discussed. From there a simple model will be developed to write holograms at one wavelength and read them at a different wavelength. This simple model will be developed further to one where the hologram produced can take the divergent input beam and produce a focused output beam. Volume aspects of the holograms will also be considered. 2.3.1 Interference theory The first step toward creating holograms is understanding interference. Interference is observed when two or more mutually coherent optical waves are present in the same space. The wave resulting from the interference is the algebraic sum of the individual waves [20]. Utt (r) = U(r) + U2 (r) (2.1) Where U(r) is a complex, monochromatic wave. When this wave is detected, its intensity is the quantity that will be recorded. We know that I C IU12 (2.2) where I is the intensity. So the intensity for Utot(r) would be ,tot = U +U 2 21 = U112 +IU212 53 + U(r)U2 (r)* + U(r)*U2 (r) (2.3) This intensity, is what would be detected or recorded. Notice that the in- 'tot, tensity of the resultant wave, Itot, is significantly different from the intensities of the individual waves, 1U11 2 and IU2 12 . The first two terms of the resulting intensity are only magnitudes, they have no complex phase term. These terms are what we would expect to see as the intensities of Ui(r) and U2 (r) individually. The last two terms in the intensity equation, U1(r)U 2 (r)* and Ui(r)*U2 (r), are complex waves. To illustrate what is happening, consider a very simple example where two coherent plane waves at different angles are incident on a recording medium. Eje- ik-z -kxx) Ejk, E2e' / -- - Recoruing Media z---Figure 2-14: Interference pattern of two plane waves being recorded. As seen in Figure 2-14, the plane waves are of the form E1 = Eie-j(kzz-kx) (2.4) and E 2= E 2 e-jkaz (2.5) At the recording media, the wave resulting from the interference of E1 and E is 2 Etot = Ei e-3kak"+~ ~k From Equation 2.3, the resulting intensity at the recording media is 54 (2.6) Iot = E12 + Itot = IE,1 Itot = E212 + E 1 E* + E*E 2 + E212 + EIE 2 e~jkx + EE E12 + E212 e+jkx 2 + E1E2 (e--kxx + e +jk x (2.7) (2.8) (2.9) This is the intensity pattern that will be present at the plane of the recording medium. 2.3.2 Recording holograms It is at this point that the type of recording media being used must be considered. For the purposes described here, there are two different types of recording media, thin film emulsions and thick film emulsions. A thin film emulsion has a thickness on the order of the period of the interference pattern being recorded. When a thin film emulsion is used, the interference pattern at the surface of the emulsion is what is recorded. It can basically be thought of as a two dimensional surface on which intensity pattern is being recorded. A thick emulsion, on the other hand, has a thickness that is much greater than the period of the interference pattern being recorded. Whereas the thin film emulsion only recorded the interference pattern in a single plane, the thick emulsion records the interference pattern at each plane throughout the volume of the emulsion. It is also useful to consider the mechanism by which the interference pattern is recorded by the emulsion. As discussed in Section 2.1, photosensitive materials respond to the intensity (amplitude squared) of a wave. Some photosensitive materials, semiconductors for instance, respond to intensity by generating an electrical current. Other photosensitive materials, like photo-polymers and photo-resist, respond to incident light by polymerizing. Photographic emulsions respond by a chemical reaction that involves converting silver halide into silver. The emulsion can then be bleached 55 making the silver into a transparent substance with a high index of refraction [21]. This means that by exposing a photographic emulsion to light and chemically processing it, the light intensity distribution is recorded as a variation in thickness of a high index of refraction material along the emulsion. If a thick film emulsion is used, high-index of refraction planes will result in the material. Traditionally, high-density photographic emulsions are used for holography. These emulsions may be either thick or thin film emulsions. Photo-polymers which result in an index of refraction modulation will also be considered in later sections. These materials are of particular interest due to their high diffraction efficiency and easy processing. Returning to Equation 2.9, it is seen that the intensity pattern at the plane of the recording medium consists of a constant term, E 1 12 +| E2 a phase component, ElE 2 ei-kxx and EE 2e+jkxx. 2, and two terms with This is the pattern that will be physically recorded in the holographic recording medium. The transmission function of the hologram will then be tholo(X, z) = A {(IE 1 2 + 1E 12) 2 + EIE 2 e-kxx + EIE 2 e+k~x} (2.10) where t(x, z) is the transmission function and A is a constant that takes into account response of the emulsion and exposure time. When a thin film emulsion is used, this pattern will be recorded at the surface of the emulsion. If the hologram is then played back with a plane wave identical to E , 2 the result will be tolo(x, z)E 2 = A { E 2 (|E 1 12 + |E 2 12) e-kz + E 1 |E 212 e-(kxx-kzz) + E1E2 2 e+(kxx-kzz) (2.11) Equation 2.11 has three distinct terms. The first term is a plane wave propagating in only the +z direction. The second term is a plane wave propagating in the +z and -x directions. Finally, the third term is a plane wave propagating in the directions. This can be rewritten as 56 +z and +x A (|E 1 2+ E 2 12) E 2 + A E2 I2 E + A E 2 12 E* (2.12) ( From Equation 2.12 it can be seen that the first term is an amplitude modulated version of the input wave E 2 . The second term is a replica of the original E 1 with a modulated amplitude. Finally, the third term is the conjugate of E 1 also with a modulated amplitude. This is illustrated in Figure 2-15. Hologram AE2 E'I A(E112 +|E2 2 )E2 E 1' tholo AlE 2 El Figure 2-15: Playback of the thin emulsion hologram with wave E2 In the example shown above, only plane waves were used to write the hologram. If one of the plane wave sources is replaced by a wave with structure from an object, similar results will be seen when the hologram is played back with the reference plane wave. There will be an undiffracted plane wave with an amplitude modulation, a modulated object wave (virtual image), and a conjugated object wave (real image) all propagating in different directions. While thin film holograms might be useful for some holographic interconnect applications, they are not appropriate for the implementation discussed in the previous section. The reason for this is because when the thin film hologram is played back, three beams result at the output. In a dense, point-to-point interconnection scheme, this will cause massive cross-talk between various nodes. A more desirable situation would be if one beam was input, and a single beam with the correct properties was output. For this application, thick film holographic emulsions are most appropriate. Thick 57 film holograms are written in exactly the same way as thin film holograms, the difference is in what is recorded. As discussed earlier, while thin film emulsions only record the interference pattern at the surface, thick film emulsions record the interference pattern at each plane for the entire thickness of the emulsion. Thick film holograms rely heavily on Bragg theory discussed next. 2.3.3 Bragg theory Before getting into the specifics of how to write holograms with the properties seen in Figure 2-13, it is useful to quickly review interference conditions for writing holograms and the Bragg condition for reading out the holograms. To write holograms, two mutually coherent waves are interfered and the intensity of their interference pattern is recorded in the holographic emulsion. In the case of two mutually coherent plane waves of wavelength A separated by an angle 0, a sinusoidal interference pattern with a period A is produced. The period A (also known as the grating spacing) is given by [20] A A 2 sin(0/2) (2.13) One important point to note is that 0 is the angle between the beams inside the emulsion. Generally, the index of refraction of the holographic emulsion is larger than the index of refraction of free space. As the interference pattern is recorded inside the emulsion and the write beams are given external to the emulsion, this difference in index of refraction between free space and the emulsion must be accounted for. This is accomplished by simply applying Snell's Law' to the beams external to the emulsion. In this brief review of Bragg theory, it will be assumed that the angles shown are internal angles (i.e. the angle inside the emulsion). The intensity of this sinusoidal interference pattern with grating spacing (A) is what is recorded in the holographic emulsion. If the emulsion is a thin film emulsion this pattern is basically only recorded on the plane of the surface. In this case a inisino1 = n2Sin02 58 hologram is produced that has three output terms. For the holographic interconnects specified in the previous section, it was desirable to have only a single output term. This can be achieved with thick film holograms. Thick film holograms have an emulsion thickness much greater than the grating spacing (A). Thick film holograms allow the interference patterns to be recorded at each plane throughout the thickness of the hologram. When the holographic emulsion is processed, the index of refraction of the material changes as a function of the intensity of the light. Regions that received more optical energy during exposure have a higher index of refraction. This results in the hologram having planes of high index of refraction material. These planes, in turn, can act as partially silvered mirrors to incoming light. Figure 2-16 shows two plane waves interfering inside a thick emulsion and also the emulsion after processing. The slightly angled parallel lines in the processed hologram represent the partial mirror planes. (a) (b) Thick Emulsion Recording Thick Emulsion After Processing Figure 2-16: (a)Two plane waves interfering inside a thick emulsion while a hologram is being recorded (b) The same thick emulsion after chemical processing When a beam of light hits one of these partial mirrors, a small portion of the beam will be reflected but most will pass directly through the weak mirror. As an important note for later analysis, the angle of inclination for these partial mirrors is the bisector of the the two recording beams. This can be proven by examining the 59 interference pattern with respect to the propagation vectors of the waves. Bragg theory was a result of research into X-ray diffraction to determine crystal structure. In this case, the crystal planes acted as partial reflectors introducing multiple beam interference effects [22]. While developed for the X-ray region of the electromagnetic spectrum, Bragg theory can be just as easily applied to the visible and infrared regions as well. Because the thick hologram is made up of many thin layers of dielectric media the phenomenon of multiple beam interference will come into play. Looking at Figure 217, two of the partial mirrors (n and n + 1) are seen separated by a distance A . Two beams are seen entering at an angle oz with respect to the partial mirrors. A C n+1 B Figure 2-17: The diagram illustrates the derivation of the Bragg Condition The first beam is reflected off of partial mirror n. The second beam passes through partial mirror n and is subsequently reflected by partial mirror n+ 1. Assuming these two beams are coherent to begin with, they are in phase up until the dashed line at point A. Therefore, the second beam travels a longer distance than the first beam. As a result of travelling a longer distance, the second beam also accumulates more phase. If the two beams are to interfere constructively when they exit the hologram (to obtain the maximum signal), it is necessary that they still be in phase. For this to happen, the extra distance travelled by the second beam must be equal to an integral 60 number of wavelengths (mA) of the beam. This means that for the case where m = 1, AB+ BC A (2.14) BC (2.15) AB = BC = A sin(a) (2.16) Examining the diagram sin(a) = sino~A =A which means that Substituting back into Equation 2.14, 2A sin(a) = A (2.17) Equation 2.17 is known as the Bragg condition. The Bragg condition says that given a hologram with partial mirror (grating) spacing A and input beam wavelength A, the output signal (diffracted beam) will be a maximum when the internal angle with respect to the grating is a. When the holograms are written, the an interference pattern is produced with the grating spacing given by Equation 2.13. This grating spacing is then used in the Bragg condition (Equation 2.17) to determine the internal read out angle (a) that gives the maximum diffracted signal. While the equation for the interference pattern and the Bragg condition are essentially the same, it is important to think about them separately as conditions may change between writing and reading (i.e. different write and read wavelengths). 61 2.3.4 Writing and reading holograms with the same wavelengths When making holograms for holographic interconnections, the input beam angle (Oi,) and output beam angle (Qout) along with the write and read wavelengths are usually known. Knowing the desired result (the read geometry), the geometry for writing must be calculated. In the simplest case, the write and read wavelengths will be the same. To develop equations for this simple case assume that a hologram is needed with the geometry shown in Figure 2-18. It must then be determined what the correct writing geometry is to produce a hologram with these characteristics. Oil 0out Output Beam Ao Input beam Hologram Figure 2-18: Desired readout geometry for hologram For this basic example assume the hologram is being recorded and played back with the same wavelength (Ao) of light. In this case, to determine the write geometry the output beam is simply moved the the input side of the hologram such that it intersects with the input beam. These angles external to the emulsion must be converted to angles internal to the emulsion. This is done by applying Snell's Law. The resulting internal angles are denoted with primes. 0/ = sinI (sin and and 62 Qin) (2.18) 0u = sin-I (sin n. ue (2.19) Where ne is the index of refraction of the emulsion. This write geometry will produce a hologram with a grating spacing given by A = (2.20) 2sin (n ) where the normal to the hologram is defined as 00. To prove that this works, the inclination angle of the grating (partial mirrors) must also be determined. As mentioned before, the angle of inclination for the grating (-y) is the perpendicular bisector of the two internal write beams. This is given by o1f +0Omi i 2 o (2.21) The write geometry along with the parameters A and -y are illustrated in Figure 2- 19. cin Partial Mirror Hologram Figure 2-19: Calculated write geometry for hologram To show that this write geometry produces a hologram with the readout geometry of Figure 2-18 the Bragg condition (Equation 2.17) is used. It is known that the hologram produced has a grating spacing of A and a grating inclination of -y. Earlier, o (the Bragg angle) was defined as the internal angle between the read out beam and 63 grating that produced the maximum output signal. If the hologram works as desired, the input angle with respect to the hologram that produces a maximum output will then be the angle of grating inclination (-y) plus the angle of the input beam with respect to the grating (a). This is illustrated in Figure 2-20 below. Partial Mirror cc Optimal Input Beam Figure 2-20: Illustration of angles in hologram read out As seen in the figure above, 0 Bragg = '7 + (2.22) L Therefore, if the hologram works, 0' for the read case will equal 0Bragg for the write case. The Bragg condition is 2A sin(a) = Ao (2.23) Substituting for A, A0 2 2 sin sin(a) = Ao (2.24) 2 Solving for a, 0-/ of 2 z= (2.25) Substituting a and -yo into Equation 2.22, 0 Bragg (2.26) Therefore, 64 OBragg n(2.27) and the hologram works as predicted. 2.3.5 Writing and reading holograms with different wavelengths When designing holographic interconnects, operating wavelengths for both reading and writing must be chosen. For reading, semiconductor diode lasers that can be directly integrated on chip are available in a variety of wavelengths from around 800 nm up to about 1500 nm. These devices have been widely exploited in fiber optic and telecom systems. Holographic emulsion materials on the other hand are generally not particularly sensitive to the near infrared radiation produced by the semiconductor lasers. The holographic materials typically have the best sensitivity for the blue-green to red wavelengths. As a result, it is many times necessary to write holograms for holographic interconnection systems at one wavelength and read the holograms with a different wavelength. This difference in read and write wavelengths must be account for when determining the correct writing geometry for a particular hologram. First off, the hologram will be read out with light of wavelength A, and written with light of wavelength A,. Assume that the desired read geometry is the same as in Figure 2-18. The only difference is that the hologram will be read out with light of wavelength Ar. A read geometry is specified by 0 in, 0 out, and Ar. These external angles (0i, and 00,t) must then be converted to internal angles with respect to the emulsion as previously discussed. This transforms the external read angles to On = sin_1 in Oe (2.28) and (sin out = sin- 65 Oeot (2.29) In order to achieve this read geometry specified by Ois, 0 out, and A,; a particular grating spacing (A) and grating inclination (-y) is needed. These parameters can be calculated directly from the internal read geometry (0/ and 0/' ). Then, with A, -y, and the write wavelength (Am) the internal write geometry can be calculated. This write geometry inside the emulsion is then transformed to a geometry external to the hologram. First, the grating spacing (A) is calculated just like the previous example. The output beam is moved to the input side of the hologram as in Figure 2-19. The angle between the two beams inside the emulsion (0,) determines the required the grating spacing (A) in order for Oi, to satisfy the Bragg condition. The angle between the beams (0') is given by Or =01n - O0'Ut (2.30) This means that the grating spacing desired is Ar = 2 sin (2.31) In o.t In order to produce a hologram with this external read geometry (Oin,Oout , and Ar) it is necessary to write the hologram such that it has a grating spacing of A given by Equation 2.31 and a grating inclination of -y given by Equation 2.21. The equation for the grating spacing of the hologram being written A where 0' Aw/e 2 sin 2 (2.32) is the angle between the write beams inside the emulsion. If the de- sired grating spacing (A,) is to be the same as the written grating spacing (Am), Equations 2.31 and 2.32 can be set equal. Ar 2sin n AW u 2sin 66 2 (2.33) Solving for OW, Ow = 2 sin-' { } sin (2.34) Knowing the angle between the write beams inside the emulsion and the grating inclination, the angles of the two write beam inside the emulsion can be calculated. = y- + o2 - sin- = {+sin ( -o (2.35) ) and of2 w2 = + 0' 2 2 A sin (0n -out + sin w (2.36) These angles inside the emulsion must then be converted to external angles in free-space. This gives the correct write geometry for the hologram. Owl = sin- 1 ne sin i +'t - sin 1 }(2.37) sin 2 out) sin sin ( -_ou\ I\ 2 ' and 0w2 = sin- -1 in0/~ n, sin +0/ in 2out . + sin' - Aw (2.38) where Ow1 and 0 w2 are the angles with respect to the normal of the two beams at wavelength A) used to write the hologram. Notice that Equation 2.37 and 2.38 are dependent on ne. This is especially true when it is considered that 0 'r and 0' are also dependent on ne. Generally, the write wavelengths are in the visible spectrum while the read wavelengths are in the near infrared. This generally means that Ar > Aw. Likewise, due to the form of the above equation, Or > Ow. As discussed previously in this section, when writing holograms, the partial mirror 67 planes are formed at the angle that is the bisector of the internal write angles. It was also discussed that the read geometry required the angle of the partial mirrors to be -y from Equation 2.21. If both of these statements are true, this means that the bisector of the internal write beams must be at an angle 'y with respect to the normal to the hologram. Figure 2-21 illustrates the derivation of the A, write geometry. The read geometry with Ar is shown as a dashed line. The solid line shows the write geometry after it has been transformed to the write wavelength A,. The A, write beams are also angled such that the bisector of the internal beams is at angle -y with respect to the normal. This write geometry (Am, the desired read geometry ( n, 0 Owl, 0W2, and 'y) will produce a hologram with out, Ar). Partial mirror ------------......................... Figure 2-21: Illustration of the derivation of method to write a hologram at wavelength A, to be read at wavelength Ar 2.3.6 How to write a hologram with a divergent beam input and a convergent wave output The ultimate goal is to determine how to write a hologram with the specifications set forth by the CONNPP. To do this, the hologram must take a divergent beam as an input and output a convergent beam at a given angle. This is illustrated in Figure 2-22. In Figure 2-22, D, 1 is the distance between the source and the hologram. Dr 2 is the distance in the z direction between the hologram and the plane of the detector. Xri is the distance in the x direction between the divergent source and the 68 the photodetector. Finally, Ordiv is the half-angle divergence of the source. Detector D -- r rdiv Divergent Source DDr2 x Hologram v z Figure 2-22: Hologram with divergent source as input and convergent beam as output It can be noted that the divergent wave from the point source is not a plane wave. Its wavefront has a curved surface that gets larger the further it propagates from the source. Fortunately, Fourier optics allows an arbitrary wave to be written as the sum of plane waves with different spatial frequencies [20]. For two optical waves of the same wavelength, spatial frequency is simply a measure of angle. The simple divergent wave can be written as the weighted sum of plane waves with angles from +Ordiv to - 0 rdiv. This means that the plane waves at ±Ordiv represent the extremes of the read out case shown in Figure 2-22. To simplify the problem, these two beams (at ±Odiv) will be used to develop a write geometry for the hologram. Once the problem has been reduced to looking at the extreme cases, it can be further broken down by examining each of the two extremes. +Ordiv and - 0 rdiv cases have been separated. In Figure 2-23, the In Figure 2-23(a), the +Ordiv is seen. From this diagram, it can be seen that Xr1 - DrI tan(Ordiv) Dr2 and hence, 0 out = tan 1 X - DrI tan(Ordiv) Dr2 69 (2.40) Xi - Dr1 tan(Ordl,) X,, +D, tanl(Odl) D, di~ Dr2 ) D QdiV (a) -2 (b) Figure 2-23: Read geometries for the extreme plane waves in the divergent source problem Analogously, looking at Figure 2-23(b), out2= tan_1 (Xri + Drl tan(Ordi) Dr2 (2.41) Now, in order to figure out how these holograms should be written, the approach developed in the previous section will be used. First, the angles must be converted to angles internal to the emulsion denoted in the following equations with a prime. Next, the grating angles (-y) and grating spacings (A) must be found. From Equation 2.21, + 0 2 Srdiv Out1 (2.42) and ~Y2 Ordiv + 0 o'ut2 (2.43) 2 Now, for A, A1 ) =/ 2 sin(Idi2 and 70 ti (2.44) A2 = (2.45) di2 2sin ot1 Next, for the last step of the process, the angle between the write beams (0") is determined using Equations 2.37 and 2.38. The +Ordiv will be denoted as case 'a' and the - 0 rdiv as case 'b'. For case a Owa1= sin- 1 ne sin (r'div - + sin { ' sin rdiv sin ri-out' (2.46) Out) and siV[fesn a2 wa =-sinne sin rdiv + 0 y1f ut + sin 2w (247 For the 'b' case, ______ .wb= sin- 1 n, sin ut2 - sin- out2 + sin- { w Odiv rrdv+O't0' X sin r \O'K' 2 (2.48) ut2 (2.49) and Owb2 =sin-1 ne sin ( div 0 sin This means that the write geometries have the write beams separated by I1Owa and 0 wbl -wb2 - wa2 I and the grating inclinations at -}1 and 72 respectively. While the equations are needed, they are not particularly illustrative as to what is happening. For this, an example is necessary. Assume that the hologram is being read out by an 850nm VCSEL (Ar) with a halfangle divergence (Ordiv) of 5'. The VCSEL is placed 1mm behind the hologram (Dri). 71 The photodetector is sitting 1mm in front of the hologram (Dr 2 ) and is displaced by 0.5mm in the x direction (Xri). The hologram is to be written with a 514nm argon ion laser (A,). Using Equations 2.40 and 2.41 the output angles are Ooutl= 22.40 and 0 =30.430 out2 From Equations 2.42 and 2.43, the grating inclinations, 7y1 and -Y,= 9.675' and '72= 8.250 (2.50) Y2, are (2.51) Finally, the angles for the 514nm write beams for case 'a' are 8.360 and 0wa2 = 1.11' and 0 Owa1 = =18.89' (2.52) 22.27' (2.53) for case 'b' Owbl wb2 = Figure 2-24(a) shows the read geometry using an 850nm divergent source and Figure 2-24(b) shows the write geometry using a 514nm source. Both figures are drawn to scale. The above diagram helps to give a better understanding of what is happening. Although the problem was broken down into two separate problems to solve it, the solutions can now be combined to yield even more information. When both solutions are put together as in Figure 2-24(b), the first thing to notice is that one of the write beams is a divergent beam from a point source. This point source has been rotated such that the bisector of the divergent beam makes a nonzero angle with respect to the hologram. The other write beam is a slightly convergent beam that would focus on the other side of the emulsion. The point where the beam would focus can be seen on the right hand side of the emulsion. 72 Write with A, = 514nm Read with A, = 850nm (1mm,0.5mm) - (-1 mm,O) ''(2.Omm,0.77mm) "T(-1.56mm,-0.15mm) (a) (b) Figure 2-24: (a)Read geometry for example (b) Calculated write geometry for example 2.3.7 How to write a hologram with a divergent beam input and a plane wave output Might the write geometry calculated above be simplified if the desired output of the hologram was a plane wave instead of a convergent wave? To answer this question, the equations from the previous section can be directly applied to yield a write geometry for a plane wave output. In order to make the above equations work for a plane wave output, all that has to be done is to set 0 OOU1 = Oout2 (2.54) When the output angles of the hologram are equal, there is a plane wave. To illustrate what happens, assume the same conditions as the previous example (Ar =850nm, AW =51 4nm,0rdiv = 50, and Dr =1mm). In addition, assume that the desired output angle of the plane wave is 15'. Using the equations in the previous section, the angles are calculated and the results are plotted to scale in Figure 2-25. Unlike the previous example, the write geometry is fundamentally different at the different wavelengths. If the hologram were to be written using Ar, a divergent 73 Read with )L,= 850nm Write with =514nm ) Plane wave (a) (b) Figure 2-25: (a)Read geometry for plane wave output (b) Calculated write geometry for plane wave output for example beam and a plane wave would be used. As seen in Figure 2-25, when A, is used to write the hologram, two divergent sources are used. This example goes to show that when different read and write wavelengths are used, normal intuition behind hologram geometry does not necessarily hold. 2.3.8 Volume aspects of holograms As discussed previously, the holograms used for the holographic interconnects here are thick holograms. This means that their thickness is not negligible. In some of the examples, a source distance of 1mm was specified. Some of the photopolymer emulsions tested are as thick as 200pm or one fifth of the source distance. As a result, it is useful to understand what is happening with the fringes inside the hologram. If two plane waves are used to create the thick hologram, the structure inside the hologram consists of flat planes of high index material as in Figure 2-16(b). When the reference beam and object beam are not plane waves, the structure of the hologram becomes more complicated. In the case of the holographic interconnects modelled in this section, a divergent and convergent beam are used to write the hologram. For the sake of simplicity and understanding the same wavelength will be used for reading and writing in this 74 example. In Figure 2-26, a portion of both the read and write geometries is shown. In the diagram, a cross-sectional plane (xo) of the hologram was chosen. Points were evenly spaced along the cross-section. Rays from the source were then drawn to the points. If the hologram is to work correctly, the rays at each point from the source must be directed to the detector. Finally, by propagating the rays from the detector back (dashed lines) the second write beam (in addition to the source) is shown. In other words, writing a hologram with write beam #1 and write beam #2 will result in a hologram that directs the beams from the source to the detector as desired. Please note that the propagation angles of the beams will change as they enter and exit the emulsion due to the change in index of refraction. This is also illustrated in Figure 2-26. Detector Hologram 4! Source 4' (write beam #1) $1S X0 xo z (write beam #2) Figure 2-26: Read and write geometries for a cross-sectional plane of the hologram As can be seen in Figure 2-26, as z increases (with x remaining the same) the angle from the source to cross-sectional plane changes. Likewise, the angle of write beam #2 also changes as the cross-sectional plane is followed through the hologram. Because the source and detector can be at any position, in general it can be said that as z is increased, both the angle between the beams (0w) and the inclination 75 of the grating ('y) will change. Figure 2-27 shows how the grating inclination of the partially silvered mirrors (bisector of the write beams, -}) changes with z for the hologram discussed in Figure 2-26. Partially silvered mirrors z 0 Holographic Emulsion Figure 2-27: Angle of the partially silvered mirror planes as a function of z for the hologram discussed in Figure 2-26 Another interesting aspect of the volumetric nature of the thick holograms is how a single beam propagating through the emulsion is treated. In this case, the angle of the input beam with respect to the hologram, does not change as it advances through the hologram. The output angle needed to hit the detector, however, does change. Once again, the change in index of refraction as the beams propagate into different media is taken into account. This can be seen in Figure 2-28. In the figure, the single beam can be seen propagating from the source through the hologram. At each point notated on the source beam, the partial mirror structures must be angled such that the reflected beam hits the detector. The reflected beams are projected back (dotted lines) to show the write beam that would produce this result. As in the previous example, this leads to a situation where both the angle between the write beams (Ow) and the grating inclination angle (-y) changes as the beam propagates through the hologram. Also like the previous example, the partially silvered mirror planes rotate through the hologram. This can be seen in Figure 2-29. The two previous examples should help illustrate the internal structure of thick film holograms. Earlier in this section, models were developed that showed how to design write geometries from a desired read geometry. If the writing of the hologram has been done according to these write geometries, every point within the hologram will partially reflect an incident beam from the source to the detector. As a result, 76 Detector Hologram Source (write beam #1) / /// - -- z (write beam #2) Figure 2-28: Read and write geometries for a single beam propagating through a thick hologram most of the light from the source will be focused on the detector creating a highquality holographic interconnect. These expected geometries are in turn confirmed by Reference [23]. In this source, the case of a divergent and convergent cylindrical beam incident from the same side, interfering in an emulsion is examined. It is shown that this case will produce a grating within the emulsion that are part of ellipses with foci at the two focal points of the beams. This is illustrated in Figure 2-30. Figure 2-30 shows the situation when both of the foci are aligned with the xaxis. In the case of the holograms for the CONNPP, the divergent source will be on axis, but the convergent source will be off axis. The resulting grating can be seen in Figure 2-31. Notice that this agrees reasonably well with Figure 2-27 (i.e. angle with respect to the normal of the holographic grating increases as x increases). 77 Partially silvered mirrors x - Holographic Emulsion Figure 2-29: Angle of the partially silvered mirror planes for a single beam propagating through the hologram y 01- y X -1 -0 (a) (b) Figure 2-30: (a) A cylindrical divergent beam and cylindrical convergent beam interfering within a holographic emulsion (b)The holographic grating after processing. The grating forms partial ellipses with foci 01 and 02. 2.4 Hologram Writing Systems Several hologram writing systems were designed, built, and tested in order to determine the best way for writing the holographic elements discussed previously. The goal of the system fabrication was to determine the best way to create a production system capable of making holographic element arrays easily. The first system examined was a very simple setup using a microscope objective as the beam shaping element. The next system looked at used a single large lens for both the reference and object beam. The last system investigated was a computer-controlled system using a beam splitter to bring both beams together in close proximity. 78 A ky 0 gent off axis source. 2.4.1 System using a microscope objective The system using a microscope objective was the first proof-of-concept system built to show that holograms could be made that take a divergent wave as an input, and produce a convergent wave as the output. The system (illustrated in Figure 2-33) begins with a very basic holography setup. The laser source is passed through a microscope objective/spatial filter combination to expand the beam. A convex lens is then used to collimate the beam. Next, the beam is split (approximately 50-50) into the two different interference legs. One of the interference legs goes directly into a microscope objective. The holographic plate is placed slightly past the focus of the beam to a point where it has expanded to the desired hologram size. The other leg is brought in through a shallow convex lens in order to make it slightly convergent. Notice that one of the write beams is convergent and one is divergent. This is exactly the geometry outlined in the last section to produce a convergent output beam upon read. The holographic plate is on a rotation stage to allow the plate to be rotated 79 with respect to the write beams. This is done because it is much easier to rotate the holographic plate than all of the optical elements associated with each of the write beams. A close-up of the beams at the surface of the hologram can be seen in the right side of Figure 2-32. The divergent beam can be seen as the solid line and the convergent beam can be seen as the dashed line. Ber Bpem Microscope Objective Holographic Emulsion Microscope Microcopve Laser Source Spatial Filter L s Collimating Lens Focusing Focusin Lens Lens Mirror Figure 2-32: Setup for the microscope objective system The system was built and operated with a 632.8nm Helium-Neon laser as the laser source. As this was a proof of concept setup, the more expensive blue-green sensitive Aprilis photopolymer holographic plates were not used. Instead, cheaper red sensitive PFG-01 silver hallide emulsions from Slavich were used. These silver hallide emulsions are significantly thinner (7-8pm) than the photopolymer emulsions (200pm). This leads to worse diffraction efficiency as a result of less of a Bragg effect within the emulsion. Also, the PFG-01 plates require chemical processing. This can lead to situations where it is difficult to diagnose whether problems are related to the geometry/optics or with the chemical processing of the emulsion. This is a distinct advantage of using the UV cured photopolymer emulsions. The goal of the system was to write a hologram using 632.8nm light that could be read out at normal incidence using a 850nm divergent source (VCSEL) and produce a convergent beam output. This goal was effectively accomplished. In order to show this, the hologram that was written was read out with an 850nm VCSEL normal to the plane of the hologram. The output of the hologram was then projected onto a 80 diffuser. The image on the diffuser was then captured with an infrared sensitive CCD camera. This can be seen in Figure 2-33. Convergent beam Diffuser VCSEL Undiffracted divergent beam Image recorded from diffuser Hologram Figure 2-33: Readout of the hologram with an 850nm VCSEL source Notice in the upper portion of the image recorded from the diffuser that there is small spot with greater intensity than the undiffracted beam. This convergent beam is the desired output of the hologram. The large, fuzzy circular structure in the middle of the image recorded from the diffuser is the undiffracted divergent beam. As was mentioned before, the PFG-01 plates have a relatively low maximum diffraction efficiency (- 50%). This explains why there is a significant amount of energy in the undiffracted beam. Thick holograms put all the light into one of the 1 " order beams and the 0 th order beam (undiffracted beam). However, because the PFG-01 emulsion is fairly thin, the light is actually diffracted into multiple orders. However, looking at the figure, only the 0 1h and +1s orders can be seen. This is because the higher orders (± 2 nd, ± 3 rd, etc.) don't have enough energy in them to be visible. The -1" because it is even more divergent than the 0 order cannot be seen th order and its intensity is below the sensitivity of the CCD camera. Although this system did accomplish its proof of concept goal, it is fraught with several problems that prevent it from being a production system that can create arrays 81 of holographic elements easily. The first such problem can be seen by examining Figure 2-32. As can be seen, the angle in the figure between the two separate legs of the write geometry is quite large (almost 450). In most cases, it would be desirable to have the angle between the two write beams be significantly smaller (~ 5-20). However, as the angle between the write beams is decreased, the convergent beam begins to hit the barrel of the microscope objective. Because one of the goals is to create very small holograms, the microscope objective must be very close to the emulsion. This leads to the problem of the convergent beam hitting the barrel of the microscope objective at small angles. This is illustrated in Figure 2-34. Microscope objective Microscope objective Beam blocked by objective Convergent beam (b) (a) Figure 2-34: (a) Beam at large angle hits emulsion. partially blocked by objective (b) Beam at smaller angle Another problem with the setup relates to the quality of the microscope objective. In all, about seven different microscope objectives were tested. All showed significant imperfections. These imperfections manifested themselves as high spatial frequency components in the output beam. Usually this is counteracted by a spatial filter to remove these high spatial frequency components. Unfortunately because of the small distances involved in the setup, it was not feasible to integrate a spatial filter into the setup. This resulted in the divergent beam having many high spatial frequency artifacts which can have a significant negative impact on the quality of the holograms produced. 82 2.4.2 Single lens system Taking into account some of the issues encountered with the microscope objective setup, a new system (illustrated in Figure 2-35) was designed to overcome these issues. This system is the same as the microscope system up to the beam splitter element. Just as in the microscope system, the beam from the laser source has been expanded and collimated. After the beam splitter, one of the beams goes directly into a large lens and is subsequently focused onto the holographic emulsion. This is the divergent beam of the desired write geometry. The other beam is reflected off of a mirror so that it at the correct angle with respect to the diverging beam. This collimated beam is then put through a concave lens which causes the previously collimated beam to diverge slightly. After that, the now slightly divergent beam is put through the same convex lens as the first beam, but near the edge of the lens. If the focal length of the diverging lens is larger than that of the converging lens, the beam will converge to a point behind the plane of the holographic emulsion. Splitter Microscope Objective Single Large Lens Holographic Emulsion Laser Source Spatial Filter Collimating Lens Mirror Diverging Lens Figure 2-35: Setup for the single lens hologram writing system In this system, the divergence angle of the beam normal to the large lens can be adjusted by changing the size of the beam entering the lens. If a collimated beam enters the the large lens, it will theoretically focus at a point regardless of its crosssectional area. Therefore, by increasing the cross-sectional area of the incident beam, the divergence angle of the beam after it crosses the focus will also increase. This can be seen in Figure 2-36. 83 D2 02 _ Figure 2-36: How divergence angle is affected by beam size Another benefit of this system is that it allows for very small holograms to be produced. In the microscope objective setup, in order to make the hologram smaller, the distance between the objective and the emulsion had to be decreased. This in turn required an even larger angle for the second beam which was undesirable. In the single lens setup however, the holographic emulsion can be positioned arbitrarily close to the beam focus in order to make very small holograms. As discussed previously, small holograms are a necessity for the CONNPP. The problem with this system is that the quality of the holograms that can be written is severely limited by the fact that the second beam does not pass through the center of the lens along with quality of the large lens used in the setup. Because the second beam is put through the large lens at a position that deviates significantly from the center of the lens, the phase of the second beam is not transformed symmetrically about its center. This can explain the non-circular shape of the beam at the plane of the holographic emulsion. A possible way to correct this would be by putting the second beam through the edge of a compensating lens before the large lens. By transforming the beam's phase profile non-symmetrically twice, in opposite directions, this effect could potentially be compensated for. However, the logistics involved in doing this potentially outweigh the benefits. Because the phase profile of the second beam was transformed non-symmetrically, the holograms that were produced performed very poorly. When attempting to read the holograms with a divergent VCSEL as in Figure 2-33, very low diffraction effi84 ciency was observed (i.e. the undiffracted beam was much brighter than the diffracted beam). A comparison of the hologram output of holograms produced with the microscope objective system and the single lens system can be seen in Figure 2-37. Microscope Objective Single Lens System System Diffracted beam Undiffracted beam (a) (b) Figure 2-37: (a)Output of microscope objective system hologram (b)Output of single lens system hologram It is important to notice the difference in the outputs of the holograms produced by the two different systems. The undiffracted beam in the hologram produced by the single lens system is much brighter than the undiffracted beam in the hologram produced by the microscope objective system. As discussed earlier, the goal of the holographic interconnects for the CONNPP is to put as much of the light as possible from the VCSEL into the diffracted beam. While the microscope objective system does have severe limitations in terms of usable geometries, the holograms produced by the system clearly outperform those made with the single lens system. 2.4.3 Computer controlled system with beam splitter The third and final system examined was originally designed and built by Milos Komarcevic [24]. of 2002 [25]. The system further optimized by Marta Ruiz Llata in the Fall While the first two systems examined were for proof-of-concept, this system was intended to be a production system. 85 As such, it was designed to be almost completely computer controlled. First, an overview of the optical aspects of the system will be given. Next, the mechanical layout of the system and computer controls will be described. Finally, results will be given showing holograms produced by the system. Figure 2-38 shows the basic optical setup for the computer controlled beam splitter system. In this system, the laser source first passes through the microscope objective and spatial filter to expand the beam. Immediately after being expanded, the beam is split off into the two separate legs. For visual simplicity, the beam through one of the legs is shown with solid lines and the other is shown with a dashed line. Mirror ~~- Mirror Microscope Objective Beam Splitter Laser Source Spatial Filter Beam Splitter Holographic Emulsion Figure 2-38: The optical setup for the computer controlled beam splitter system The beam represented by the solid line passes through two lenses to give it the correct divergence characteristics. The beam represented by the dashed line is put through a single lens so that it is convergent at the holographic emulsion. After the lens, the dashed beam is reflected off of a mirror that directs it towards the emulsion at the correct angle for the particular write geometry. Both beams are then put through a beam splitter. A photograph of the actual setup can be seen in Figure 2-39. This allows arbitrary angles between the converging and diverging 86 beams without having to be concerned about whether the non-normal beam will hit any of the optical elements in the other beam path. As it may be recalled, the nonnormal beam hitting the microscope objective barrel was the main problem with the microscope objective system. This was solved in the single lens system by putting both beams through one optical element, however this solution also compromised the quality of the holograms produced. By putting both beams through a single beam splitter, half of the power of each of the beams is wasted thereby increasing the exposure time. However, the benefit of arbitrary angular geometry without worrying about lens corrections or blocking optical elements potentially outweighs the downside of longer exposure time. This can also be compensated for by using a higher power laser. Lenses Obje ti /SpatiAT1te FBeamea Figure 2-39: A photograph of the computer controlled beam splitter system Adjustment of the lenses allows the system to be adjusted for arbitrary convergence and divergence angles of the write beams. Likewise, by moving and angling the second mirror in the dashed path, the angle between the beams can be changed. This provides the user with several knobs to setup write geometries determined by the equations from the previous section. The mechanics of this system are more complicated than the previous two systems. In the other two systems, the beams were confined to a single plane of space that was parallel to the optical table. The original embodiment of the computer controlled 87 beam splitter system, however, employed a 40 kilogram mechanical rotary stage which required the beams to propagate perpendicular to the plane of the optical table top. This, in turn, required the optical elements to be mounted on an optical breadboard that is vertically suspended over the rotary stage (Figure 2-39). When the system was overhauled by Marta Ruiz Llata, the 40 kilogram rotary stage was eliminated in favor of a somewhat less bulky computer controlled rotary stage. As was mentioned previously, the goal of this system was to produce a production system capable of writing holograms for the CONNPP. One of the requirements for a system of this type is the the ability to write large arrays of these individual holographic elements. In Figure 1-5, a single neuron view of the hologram plane was given that included nine separate holograms. This set of nine holograms is then repeated in an array structure over the holographic emulsion. An example of the resulting array containing 81 individually written holograms can be seen in Figure 2-40. Figure 2-40: An example of a holographic element array This holographic element array consists of only three basic hologram geometries. On the diagram, these three different holograms are represented by horizontal and vertical lines, angled lines, and circles. By using the rotary stage described above, the system can use these three basic geometries to produce the nine different holograms present in each neuron. In addition to the computer controlled rotary stage, there 88 are two computer controlled linear stages that allow the holographic emulsion to be translated in the x and y directions. The holographic element array shown in Figure 240 is made by stepping through the whole array element by element and exposing the holographic emulsion at each step. A close-up photograph of the stage setup can be seen in Figure 2-41. The system also has a computer controlled shutter that allows the user to set the exposure time for the hologram writing. This is extremely useful in doing characterization to determine the optimum exposure time for a particular holographic material. All three stages and the shutter are then connected to a PC and controlled via LabView. retar stage Figure 2-41: A close-up photograph of the computer controlled stage The builders of the system started by testing it with the relatively inexpensive redsensitive Slavich PFG-01 holographic emulsions. After testing and refinement of the system, the users began testing a blue-green sensitive photopolymer material made by Aprilis (formerly made by Polaroid). Whereas the Slavich PFG-01 material has a maximum diffraction efficiency of less than 50% with perfect writing and perfect chemical processing, the Aprilis photopolymer material has up to 90% diffraction efficiency when optimized properly. In addition to higher diffraction efficiency, there is also no chemical processing. To fix the hologram, the photopolymer is simply exposed to an ultraviolet light source. This allows the user to concentrate on getting 89 the geometry correct instead of worrying about the chemical processing. 2.4.4 Model verification using the beam splitter setup The goal of the experiments performed was to show that a hologram could be written to produce a desired read geometry using the models developed in the last section. The test holograms were written using a 514nm argon ion laser as the write source. The holographic material used was Aprilis photopolymer (formerly Polaroid) with a thickness of 200pum. The desired read out wavelength was 633nm (HeNe). An arbitrary read geometry was chosen and can be seen in Figure 2-42. 11.75mm 14.4mm 48mm Figure 2-42: Desired read geometry The write geometry was calculated from the models in the previous section and the Hologram Writer was set up with that write geometry. One of the shortcomings of the Hologram Writer is that because of the vertically suspended optics, it is very difficult to accurately measure distances and angles between the optical elements. Improvements will be made to the system and optical elements will be added specifically to help the user quantify the distances and angles for the write geometry. After the system was set up as close to the calculated write geometry as possible, the hologram was written. After writing, it was exposed to UV light and then characterized. Shown in Figure 2-43 is the output of the hologram using a 633nm HeNe laser. The read geometry was then measured and compared to the desired read geometry. The results can be seen in Figure 2-44. While the actual read geometry did not 90 Figure 2-43: Output of hologram match the desired read geometry exactly, all dimensions were within 10% of their target values. This 10% error can easily be accounted for due to the accuracy of the measurements for the write geometry. As future work, the Hologram Writer system will be upgraded for significantly increased measurement accuracy. The diffraction efficiency for the holograms produced was 35%. Optimization of the key parameters will continue to improve the diffraction efficiency of the holograms produced. ordiv = 8.20 ..............--------- - ----- 16mm :.------3m m 46mm ...............---- Actual Read Geometry Desired Read Geometry Figure 2-44: Actual read geometry compared to desired read geometry As is known from the previous sections, the Bragg angle must be matched in order for the waves to constructively interfere and produce a maximum output. Measure- 91 ments were taken to determine the tolerance of the system to deviations from the Bragg angle. Figure 2-45 shows the normalized diffraction efficiency as a function of deviation from the Bragg angle. A sinc function has been fitted to the data as it is the theoretically predicted form of the curve [20]. Normalized Diffraction Efficiency vs. Deviation From Bragg Angle 0.9 0.8 0 7 0 0.1- 0.3 01 0 1 2 3 Degrees 4 5 6 Figure 2-45: Tolerance of hologram to deviation in angle 2.4.5 Suggestions for future work After working with the three different systems and observing their benefits and deficiencies it would be useful to draw some conclusions that could lead to an easier to use system that reliably produces good quality holograms. Of the three basic systems (microscope objective, single lens, and computer controlled beam splitter) the beam splitter setup offers the greatest flexibility and should be pursued further. It allows the write beams to be set at arbitrary angles without the problems that plagued both the microscope objective system and the single lens system. The most drastic change that needs to be made to the computer controlled beam splitter system is its orientation. As described earlier, the bulk of the optical elements are mounted on to an optical breadboard vertically suspended over the rotary stage. This potentially leads to many problems. Because the optics and mounts are vertically suspended, they are subject to the forces of gravity without any support from 92 underneath. As in all holographic writing systems, movements of components of the system by fractions of a wavelength can result in significant degradation of the quality of the holograms produced [211. Gravity acting on these components in a direction which they are not supported could lead to these movements. In one instance where the setup had not been touched for days, a lens fell out of a mount and broke on the optical table. The fact that this lens fell means that it had been moving for some time. As may be remembered, the reason for this awkward vertically suspended setup was because of the 40 kilogram rotary stage. Since then it has been replaced with a somewhat smaller rotary stage. The solution, however, is to move to a motorized rotary optical mount such as the New Focus Model 8401 or the Oriel Model 13049. These mounts are usually used for waveplates and polarizers but would also work well for this application. In addition to the motorized rotary mount a motorized vertical translation stage is also needed. Using these two stages in conjunction with one of the existing motorized horizontal translation stages would allow the entire system to be in a single plane parallel to the table. In general, moving to a single parallel plane setup makes the system much easier to work with and align for the user. In addition to ease of use, this new setup allows for easier initial characterization of holograms produced. The easiest way to test the diffraction efficiency of a hologram is to put it back into the system it was written with and block one of the two write beams. Measuring the output of the hologram (power in diffracted and undiffracted beams) will give an upper limit for the diffraction efficiency of the hologram. This eliminates potential alignment problems that arise when the hologram is put in another system to characterize its maximum diffraction efficiency. Users of the current system are unable to do this because the write beams are stopped directly on the other side of the holographic plate by the base of the rotary stage. In addition, reflections of the write beams from the rotary stage base in the current setup could also potentially be interfering and causing a decrease in diffraction efficiency. Once the problems due to the vertically suspended optics are solved, effort should be directed towards increasing the accuracy and precision of the setup. The final 93 version of the CONNPP calls for holograms that are 83pmx83pim that need to direct a beam to a detector that may be as small as 25pmx25pm. With the specifications laid out for the CONNPP if the angle of the output beam is off by even 10, it will not hit its detector correctly. In the current system, it is extremely difficult to position the beams with such precision. Currently, to determine the angles, several measurements must be taken that incorporate estimations. The magnitude of these estimations is more than enough to cause inaccuracies of more than 10. The measurements that are needed for the write setup are the distance of the divergent source from the emulsion, the size of the both the divergent and convergent beams at the emulsion, and the position of the convergent beam focus on the opposite side of the emulsion. An example of how to alter the system to provide one of the needed measurements will be given here. An improvement to measure the distance from the source of the divergent beam to the emulsion will be discussed. Figure 2-46(a) shows the part of the setup containing the divergent beam in the region near the emulsion. The distance D is the measurement that needs to be obtained. Holographic Emulsion Flippable Lens Holographic Emulsion Mirror Lens Screen (a) (b) Figure 2-46: (a)Close-up of divergent beam near the emulsion (b)How to measure distance D with a flipper mirror Figure 2-46(b) shows a possible way to measure the distance D without having to insert rulers or other undesirable objects into the aligned beam path. This solution consists of a flipper mirror (made by New Focus Corporation) and a screen on a ruled 94 rail system. First, the system must be calibrated so that it is know what ruler mark on the rail system corresponds to the holographic emulsion. This only must be done once when initially setting up the system. By then moving the screen on the rail such that the beam has a minimal spot size and noting the measurement on the ruled rail, the distance from the focus to the emulsion can easily be calculated. Furthermore, by moving the screen to the mark made for the emulsion position, the size of the beam at the emulsion can also be measured with a ruler without having to worry about touching any of the optics. Once the measurements have been taken, the mirror is simply flipped out of the beam path so that the beam propagates to the emulsion. Another area that bears investigation is using multiplexed holograms. By having each neuron use a single multiplexed hologram and a single VCSEL, the size required for a single neuron could be reduced by the area of eight VCSELs and eight VCSEL drivers. This could easily comprise 40% of the total chip area necessary for a neuron. If the VCSELs are able to produce 1mW of power, this allows over 100pW per channel. As the photodetectors and amplifiers are designed for between 0-10p>iA, using a single VCSEL per neuron should be feasible from an optical power perspective. This scheme does have architectural ramifications however. In the original CONNPP design, each of the nine lasers was able to independently modulate its power. This allowed each connection from a neuron in plane n to have different optical intensity (i.e. weight) to each of the nine neurons it connects to in plane n + 1. By using a single VCSEL for each neuron, this makes the intensity, and hence the neural weights, to the nine neurons in plane n + 1 identical. Further architectural neural network simulations are necessary to determine if multiplexing the holograms at the expense of the connection model is an acceptable solution. 95 96 Chapter 3 Devices and Circuits While the holographic elements discussed in the previous chapter are an integral part of both the holographic interconnect structure, devices and circuits are also serve a crucial role. As the VCSELs used were obtained from a third party without any control over their characteristics or manufacture, they were discussed briefly in the last chapter. The photodetectors and amplifiers for these photodetectors, on the other hand, will be covered in this chapter. Two full custom chips have been designed thus far for the Compact Optoelectronic Neural Network Processor Project. One of these chips has been fabricated and tested. The other is in the design phase and is expected to be fabricated during the summer of 2003. In this chapter, the design, fabrication, and test of the first chip will be discussed. The key learnings from this chip were then incorporated into the second chip. Its design and simulated performance will also be shown. 3.1 Background The Compact Optoelectronic Neural Network Processor consists of four separate subsystems. These subsystems are the holograms, the detectors, the logic and electronics, and the emitters. The last chapter focused on the holographic interconnects which dealt with three of these subsystems (holograms, detectors, and emitters) with the emphasis being on the holograms. This chapter focuses on the detectors and the first 97 part of the electronics block that interfaces the detectors with the rest of the electronics. This basically deals with the conversion of the optical signal to an electric signal along with the conditioning of the subsequent electric signal. Chapter 2 Emitters m*Holograms mo Detectors Electronics Chapter 3 Emitters Holograms Detectors Ele tronics m Figure 3-1: Illustration of how the material covered in the chapters fits in the architecture In the actual system, the light conditioned by the hologram is focused on a photodetector. This photodetector converts a portion of the incident photons into electrons-hole pairs. Once these electrons and holes are created, they are instan- taneously subjected to the fields within the photodetector device. If the device is reverse biased, the electron-hole pairs will potentially be ripped apart and collected by the positive and negative terminals of the device. This flow of electrons and holes will result in a current. At this point, the signal has effectively been converted from optical signal to electrical current signal. However, most logic and decision circuitry use voltage inputs rather than current inputs. As a result, the electrical current must be converted to a voltage. This can be done by a transimpedance amplifier. A transimpedance amplifier takes a current as its input and produces a voltage proportional to its input as its output. Generally, the current measured in the photodetector will be very small. So, in addition to converting the current to a voltage, the signal will also need to be amplified. This amplification is usually done after the signal has been converted to a voltage. 98 In order to design the photodetectors and amplifier circuits, it had to be determined which design parameters to optimize. The first metric to consider is speed. The goal of the CONNPP is not to rival speeds obtained by leading edge microprocessors. Rather, the CONNPP seeks to tackle classes of problems that traditional microprocessors are inherently bad at (i.e. recognition, classification, etc.). As a result, the speed of the devices and circuits was not paramount to their design. After all, the human-brain works with speeds on the order of milliseconds yet it routinely solves problems that computer scientists have been struggling with for 30 years using traditional microprocessors. The next metric to examine is size. This takes into account both the size of the photodetector and the size of the amplifier circuits. With the CONNPP specification setting the size of a single pixel at 250pmx25Optm, size should be minimized as much as possible. As a result, the amplifier circuits should be made to take up as little chip real estate as possible while still performing their function well. The photodetectors, on the other hand, are a different matter. The size of the photodetectors is related to the input beam size and alignment characteristics of the system. By designing the holograms such that they focus the light on the detector the size of the detectors can be somewhat decreased. As the size of the photodetectors is decreased, however, they become more difficult to align with the beams. For this reason, a more conservative (i.e. larger) size should be chosen in the beginning of the project. As the micro-optic aspects of the system are developed further, it should become apparent what the minimum size of the photodetectors should be. The last optimization parameter to examine is performance and functionality. For the photodetectors, the performance and functionality metrics are mostly tied to process parameters that cannot be changed. Factors that can be changed such as geometry will be explored. For the amplifier circuits, the most important performance metric is linearity. The linearity of the amplifier circuits can directly affect the maximum bit-level resolution that can be achieved by the system. A secondary consideration for the performance of the amplifiers is voltage swing. A larger voltage swing will potentially allow the electronics following the amplifier circuitry to have 99 less strict design parameters. 3.2 Photonic Systems Group Chip #1 It is well known that silicon, while cheap, is a less than ideal semiconductor for optoelectronic applications due to its indirect bandgap structure. For this reason, the initial specification for the Compact Optoelectronic Neural Network Processor Project called for the circuits and optoelectronic devices to be fabricated in gallium arsenide (GaAs). Furthermore, the commercial Vitesse HGaAs5 process was decided on due to its availability to the Photonic Systems Group. When the Photonic Systems Group Chip #1 (PSG1) was designed and fabricated, the HGaAs5 was still in the developmental stages. This led to some significant issues which will be discussed later in this section. This section will describe the design and testing of both the photodetectors and the amplifier circuits for the PSG1. 3.2.1 PSG1 Photodetectors Beginning with the Vitesse HGaAs4 process, p-contacts were introduced to allow a fixed backgate potential. While the p-contacts were introduced to help with issues such as drain-lag and optical cross-talk between devices, they also provide the opportunity for new devices. Up until the HGaAs4 process, metal-semiconductor-metal (MSM) photodetectors were primarily used. While the MSM devices performed well, they required costly extra processing steps. Photodetectors that fit within the standard process flow are much more desirable. As a result, a lateral pin was proposed and designed by Joseph Ahadian [261. Without p-contacts, these devices were un- able to be built in the pre-HGaAs4 processes. The basic structure of the lateral pin photodetector can be seen in. Figure 3-2. Traditionally, pin photodetectors use a heavily doped p and n region with the material between remaining close to intrinsic doping. In many situations this results in the whole intrinsic region being depleted even at zero bias. Unfortunately because of process constraints, having a close-to-intrinsic region between the two contacts was 100 not an option. In the case of the HGaAs5 process, there is a difference in doping of about 103 between the contacts and the p- region. If the device was being engineered from scratch without the constraints of the HGaAs5 process, a doping difference of about 1010 would be normal. In the original layout geometry, the spacing between the n-type and p-contact material is 2.Qpm. In an unbiased state, the depletion region extends into the p- region by about 0 4 . pm. A graph of depletion width as a function of reverse bias can be seen in Figure 3-6. As the material parameters do not change with the geometry, the curve in Figure 3-6 holds for all the devices discussed in this section. Figure 3-2: Cross-section of lateral pin unit cell Depletion Width vs. Reverse Bias Voltage 2.50E-0G 2.OOE-06 1.50E-06 0 j lODE-06 5.00E-07 0.00E+00 0 5 20 15 10 Reverse Bias Voltage (V) 25 30 Figure 3-3: Depletion width as a function of reverse bias voltage 101 As can be seen in the Figure 3-2, there is a p- layer on top of the substrate. This causes a small depletion region between the substrate and p- due to doping mismatch. Next, the source/drain implant (n-type doping) can be seen just below the surface of the wafer. Once again, due to the doping mismatch between the n-type and pmaterial, there is a natural depletion region formed. The black rectangle on top of the n-type material is the metal contact. As a side note, the metal contacts on both the n-type and p-contact regions do block the incoming light. The light is incident to the surface of the wafer. The light will enter the device everywhere there isn't metal blocking. While Figure 3-2 shows what is happening conceptually, the actual layout is somewhat more complicated. The layout for a 6.4Pmx6.4pim pin unit cell can be seen in Figure 3-4. This unit cell is then tiled to the desired detector size. All the p-contacts are connected to one another as are the active area n-type regions. p-contact Active Active &Meta l1 > Gate Metal Figure 3-4: Layout of the original lateral pin unit cell The capacitance for this device can be estimated from the geometry of the device along with the junction capacitance for the process. In a 4V-6V reverse bias condition, the junction capacitance should be approximately 0.05fF/pm of active area edge. The unit cell in this initial geometry has a active area edge of 20pjm. This results in a capacitance of about 1fF/unit cell. In Figure 3-5 the unit cell has been tiled to create a 75pmx75pm photodetector. This particular detector is made up of 120 unit cells 102 for a total capacitance of 120fF. rfU-fti= Figure 3-5: Layout of the 75pimx75p~m lateral pin In addition to this original lateral pin geometry, three other lateral pin geometries were also laid out on PSG1. While the original geometry was minimum sized for the HGaAs4 design rules with 2.O0pm between the n-type and p-contact regions, a new geometry was created that had minimum spacing for the HGaAs5 design rules. The unit cell has exactly the same structure as the original device, but it has been shrunk. The size of the full detector after the unit cell has been tiled remains the same, 75pimx75pim. The n-type to p-contact spacing in this new device is 0.9pum and the unit cell size is now 3.4pamx3.4pm. This means that the p- region is now fully depleted at a reverse bias voltage of only 5V. The capacitance per unit cell has dropped to 0.5fF/unit cell; however, the total capacitance has risen to about 186fF (50% larger capacitance than the original geometry). The minimum HGaAs5 geometry can be seen in Figure 3-6. Another useful metric to examine when talking about these devices is fill factor. Fill factor, for the purposes here, is defined as the ratio of area exposed to optical light to total area of the detector. In the case of the two detectors discussed so far, the Metall and Gate Metal layers will block light from hitting the detector. The original 103 -4 1 41 (a) 4 (b) Figure 3-6: (a) Unit cell for the minimum HGaAs5 geometry (b) Full 75pmx75pm detector detector with 2 .Opm n-type to p-contact spacing has a fill factor of 61%. This means that 61% of the light hitting the detector will actually interact with the device. The minimum HGaAs5 geometry detector has a fill factor of 52%. Incrementally, the original geometry detector will see 15% more light than the HGaAs5 minimum detector. In addition to the two detector structures already discussed, two more pin photodetectors with a simpler geometry were also designed. Instead of using the more complex active area ring with a p-contact in the middle, these detectors simply use strips of metal-1 covered active area and gate covered p-contacts. These strips are then alternated to produce the desired detector size. Two different types of this strip detector were designed. The first uses minimum spacing (0.9pum) between the n-type material and the p-contact. The second device has a larger spacing of 1.5pm between the two strips. An example of the geometry for the strip pin photodetector can be seen in Figure 3-7. Looking at some of the key metrics for the strip photodetectors, it can be seen that for the minimum strip photodetector capacitance is approximately 151fF and the fill factor is only 35%. For the strip photodetector with 1.5pim spacing, the capacitance 104 ~*hD~~ Active and Metall P-contact and gate metal, (a) (b) Figure 3-7: (a) Unit cell for the strip pin photodetector (b) Full 72pbmx7 2 pm strip detector is about 115fF and the fill factor is 52%. 3.2.2 PSG1 Amplifiers As mentioned previously, the PSG1 chip was designed and fabricated in the Vitesse HGaAs5 process. This fact has some significant consequences which bear discussion. Most of the chips designed and fabricated for microprocessors, memories, DSPs, etcetera are fabricated in silicon rather than GaAs. Most of today's silicon based process use CMOS (Complimentary Metal Oxide Semiconductor) technology. CMOS technology uses two flavors of MOSFETs (Metal Oxide Semiconductor Field Effect Tranisistors) to create a low power, high speed logic family. As a result, CMOS and MOSFETs are what most circuit designers are comfortable with. Because of the material parameters, GaAs does not use MOSFETs or CMOS technology. For logic type operation, GaAs can use MESFETs (MEtal Semiconductor Field Effect Transistor) and the DCFL (Direct Coupled FET Logic)logic family [27]. The major difference between MOSFETs and MESFETs is the the structure of their gates. MOSFETs, as their name implies, have their gate separated from the substrate by an extremely thin insulating oxide layer. As a result, the impedance 105 looking into the gate of the M1SFET is almost infinite. MESFETs, on the other hand, have a metal gate that is directly in contact with the substrate. This results in a Schottky barrier diode formed at the surface of the device. This, in turn, means that the gate is not isolated from the rest of the device and will have leakage current through to the bulk. In addition to this major structural difference between MESFETs and MOSFETs, there are also material parameters such as mobility and bandgap energy that make silicon and GaAs very different to work with. In the course of designing the first generation amplifier, many different amplifier topologies were examined and explored. In the end, a topology was chosen that was based on the work of Joseph Ahadian and outlined in his Ph.D. thesis [26]. This original topology was modified to meet the requirements for the CONNPP. A block diagram of amplifier topology used for PSG1 can be seen in Figure 3-8. TIAM DIFF Bal/Unbal 2TIA2 Figure 3-8: Amplifier topology for PSG1 The first stage of the topology consists of two transimpedance amplifiers (TIAs). These TIAs convert the current from a photodetector or other current source to a voltage. The output of the TIAs is then fed into a differential amplifier which amplifies the difference between the two input signals while discarding the commonmode signal. The output of the differential amplifier is itself a differential signal. This differential signal is then input into a balanced-to-unbalanced converter which takes a double-ended input and produces a single-ended output. The transimpedance amplifier (schematic in Figure 3-9, layout in Figure 3-10) 106 utilizes feedback to convert the current at the input to a voltage at the output. The basis for the TIA begins with a simple common source amplifier made up of enhancement mode MESFET El and the resistor R1. This common source amplifier then has another resistor (R2) added as a feedback element between the input (Iin) and the drain of El. The diode (Dl) has been added to raise the voltage level of the output to correctly bias the differential stage that follows the TIA. This amplifier works by pushing (or pulling) current into the input terminal. The only path for the current to travel is through the transistor El to ground. As the current through El is being forced to change by the input, the drain voltage of El (and hence the voltage at the output) must also change. Also, as the current input modulates minutely, so will the voltage at the output. Vdd 1=3.3V R1 =6.014KO Vout R2=1.999Kn ItD1 E1 =45/0.5 Iin Figure 3-9: Transimpedance amplifier schematic The choice of the feedback resistor (R2) is a tradeoff between noise performance and bandwidth of the TIA. If the feedback resistor is large, it will cause a large change in voltage at the drain of the transistor giving better noise performance [28]. At the same time, it will also increase the RC time constant at the input and hence will reduce the bandwidth capabilities of the amplifier. Conversely, making the feedback resistor small will make the TIA more susceptible to noise while increasing the maximum frequency at which it can be operated. The RC time constant (Ti,) at the input is given by [26] 107 .I........... ...... ... .. .. ...... -:. R1\ R2 D1 > R U -z z Z T 177,T7 I --Z JA4 N "-Z U El -j U , ; . .......... ............ U L'i - t ± v Figure 3-10: Transimpedance amplifier layout Ti - RinCin - Cn (i+ G) (3.1) where Cin is the total capacitance between the input node and ground and G is the gain of the common source amplifier without the feedback resistor. It is clear that as R 2 increases so does in, reducing the bandwidth. The transimpedance amplifier designed and shown in Figure 3-9 was simulated using Vitesse's proprietary MESFET models. The results show the amplifier having a transimpedance of about 1800Q. Therefore, the 10pA input swing produces an 18mV swing at the output of the TIA. The output voltage signals from the two transimpedance amplifiers are then input into a differential amplifier. This differential stage is made up of two common source amplifiers with their sources coupled together through a common current source. The schematic of this amplifier stage can be seen in Figure 3-11 and the layout in Figure 3-12. When the input signals on the two sides of the differential amplifier are the same, the outputs of the two sides of the differential amplifier are also the same. The current through the left side of the differential amplifier is I, and the current through the right side is '2. The sum of the currents I1 + I2 flows through E6. As E6 108 Vdd2=2V R5=6.014K R6=6.014K R7=6.014K Vout1 Vin1 Vout2 I1 I2 Vin2 E5=45/0.6 E4=4510.6 E6=100/1 E7=10/1 11+42 R8=1.008K R9=1.008K Figure 3-11: Differential amplifier schematic is a statically biased current source, the current through E6 must remain constant. This mean that if the voltage Vni increases and causes I1 to increase, the biasing must change for E4 and hence 12 must decrease. The voltage signals at V"t 1 and V0 t2 will be complementary about a common quiescent voltage. Once again, the proprietary Vitesse SPICE models were used to simulate the performance of the differential amplifier designed. The differential amplifier was simulated using the 18mV swing output voltage from the TIA simulation as an input to one of the sides of the differential amplifier. The other side of the differential amplifier used the output voltage from the TIA that is held at a constant bias. The results show a differential gain ((V+out - V-out)/Vin) of about 11. For the 18mV input swing on the differential input, a differential output swing of about 200mV is seen. The third and final stage of the amplifier is a balanced-to-unbalanced converter. This stage is meant to take the complementary double-ended output from the differ109 R5 R6KIIq -R7 E5 ... ----- R9 - Pu .t.l ...... k>1> Figure 3-12: Differential amplifier layout ential stage and convert it to a single-ended output. The circuit accomplishes this through a clever use of a current mirror structure. The circuit schematic can be seen in Figure 3-13(a). This circuit consists of the output of the differential stage input into two depletion mode FETs (DFETs). The lower half of the circuit is a simple current mirror. The voltage on ni sets the bias level on DF1 which in turn sets the current level through E8. Because E8 and E9 make up a current mirror, whatever current flows through E8 must also flow through E9 (i.e. I = 12). However, it was stated previously that and Vin2 are complementary. This means that if Starting from equilibrium, assume that I1 = '2. ni = ni decreases, Vi 2 ni must increase. n2 which explicitly implies that Now, Vin1 decreases slightly. This causes 1 to increase slightly. Because of the current mirror structure, when I1 increases, 12 must also increase. Because of the complementary input structure, Vin2 n2 must increase slightly. This increase in will make the current through E9 want to decrease; however, this current is set by the current mirror structure and must increase. As a result, to meet both of these conditions (increasing Vn2 and increasing 110 12) V0om must decrease significantly E7 Vdd2=2V Vin1 DF=1/4 DF2=1/4 Vin2 Vout DF1 - E8=2.2/0.5 +i,. E8 E9=2.2/0.5 (a) (b) Figure 3-13: Balanced-to-unbalanced circuit schematic to accommodate these two contradictory forces. By using this structure, the voltage swing on Vi,,l has effectively been effectively negated (by changing it to a current) and then added to the other input, Kn2. While the output will be somewhat less than the sum of the signals, it should still produce a single ended output with a magnitude greater than either ni or Vi 2 (about 150mV instead of the total 200mV for the full differential signal). The schematic for the complete amplifier circuit with all three stages can be seen in Figure 3-14(a). The layout for the circuit can be seen in Figure 3-14(b). The total size of the circuit is about 46pmx96pm. Notice that there is a metal-3 layer covering the entire circuit. Because the intent is to use this circuit for optoelectronic applications, it is necessary to optically shield the circuit. The uniform metal layer does this effectively. Without the optical shielding, significant numbers of carriers could be created within the substrate and potentially affect the functionality of the underlying circuits. Another structure included in this layout that cannot be seen on the layout of the individual stages is p-contact regions. Along the left hand side, top, and right hand side of the layout can be seen sizable blocks of p-contacts. These p-contacts are contacted to ground in order to hold the substrate at given potential to help keep backgate effects at a minimum. 111 R5 R6 R1 *R7 R3 DFR DF2 D1 R2 E E5 D2 R Vout Iin(1) El E7 E2 E6 Iin2 E8 R8 E9 R9 (a) 75' (b) Figure 3-14: Complete 3-stage amplifier circuit schematic and layout 3.2.3 PSG1 testing The PSG1 was the first chip designed by the Photonic Systems Group in a number of years. As all of the students involved in previous chips had graduated, the group basically had no experience in this area. As a result of this, there were some major challenges faced in the design and test of the chip. These challenges and the learnings gained from these challenges will be discussed here. The design for the Photonic Systems Group Chip #1 Vitesse in the Fall of 2001 for fabrication. Vitesse in the early Spring of 2002. was finished and sent to The finished chips were received from The chip was fabricated as part of a Vitesse 112 test chip containing other experimental devices and circuits. The approximately 35 chips were received from Vitesse as bare die without being packaged. The fact that we received bare die instead of packaged parts ended up representing one of the first challenges in testing the PSG1. After appropriate chip packages were obtained, the chips had to be mounted in the packages. Once mounted in the package, a wire bonder was used to connect the on-chip pads to the lead frame of the package. While the wire bonding was not particularly difficult to do after a couple days of practice, it did lead to some questions of integrity of the connections. The on-chip bond pad sizes were suggested by some of the designers at Vitesse as typical for this type of chip. Unfortunately, it was not considered that almost all of Vitesse's chips (even test chips) are bonded mechanically rather than manually. While an automatic wire bonder would have little difficulty making connections to pads of this size (75pm x 1lOpm), it was quite challenging with the manual wire bonder. Making the connections was relatively easy, but it was always questionable if the tail of the wire bonded to the on-chip pad was possibly making contact with other structures on the chip. If this happened, it could possibly lead to a short-circuit, damaged devices, or generally flaky behavior of the circuits. There are a couple different solutions to the wire bonding problem. The first solution is to increase the size of the on-chip bond pads making them large enough to lay down a bond without question of rogue connections. The other solution is to have the bare die packaged and wire bonded before delivery. Enlarging the bond pads will require more chip real estate which will in turn cost more money to produce. On the other hand, having the chips packaged and bonded will also add to the bottom line cost (usually about $1000). While this trade off could be evaluated for each chip given the number of pads and cost/mm2 for chip real estate, it seems that for peace of mind sake it would be better to plan on having the chips packaged and bonded by a third party vendor. The next issue encountered involved the performance of the amplifier circuits. The PSG1 included four amplifier circuits of the type described in the last section connected to the four different types of photodetectors. Additionally, there was also 113 B ..... .... ........ '4,4.4,. K 4t 4fV. ,4444 4'444;>4(4 ..it.4-" S' ~'9i >4't '4 4< .,. 4 4... .4 ~2.,>~> '.sQ4 4~ . 4, -, ,~, r4.. .. t~ K' if ........... .............. ... ..... .... ... ...... .... .. . 4 4S I' 11 A 44,4 4 E I .4 f ~,. I~ F~. j.a -""-i-I ..,'... If ......... . II1,1- '4"4,4~ "(1' Sl-im V ... .. .. 4.4-4.-. 4 44,.~~ r-'4' '4-4' <'-4'. i<~44>4.4~4<'t" -'-"rr'ctc-"-ti, "4*4.4~'*~ ' 4. 4 4~' '44 .~. 4.', 44, 4.., .>. 4 II >f 4. Ill j AL BP:;t~-4N ., '~~.,;%$,f "I 4;,.-" .4.4 I I . II I1 1 K4-&VV K~~~~< 4. ' 7 jj'>.4 -F7 r~' [44 ~ Figure 3-15: Layout of the full PSG1 chip 114 44. .4>~ 11 4«4 4' ~ 4'. 4. 4 . S a single amplifier not connected to any photodetector to allow for electrical-only testing of the circuit. This electrical-only circuit was the first targeted for testing. It was expected that the circuit would produce a variable voltage output linearly proportional to the current difference between the two inputs. When the circuit was biased up as designed and the appropriate current inputs were provided, a static voltage was seen at the output. After trying to adjust all the available variables (Vddl, Vd2, lini, as a function of and lin2) the output voltage still did not show any dynamic change lini and Iin2. As a digression, the situation above highlights a design flaw that resulted from inexperience in designing chips. The discussion of the flaw is included here to prevent it from happening again. The 3-stage amplifier was only laid out on PSG1 as a single, monolithic circuit. As a result, the only nodes that were accessible were the two inputs, the output, and the two bias voltages. As a result, the circuit was extremely difficult to troubleshoot. In hindsight, the individual stages of the 3-stage amplifier should have been laid out separately in addition to the single monolithic version. The circuits might not have worked any better as a result, but it would have been much easier to troubleshoot and learn more about specifically where the problem was in the amplifier. Once it was determined that the 3-stage amplifier circuit was not functioning correctly, it was necessary to determine why it was not functioning correctly. The limited number of internal circuit nodes accessible significantly hindered debug of the circuit. As the output node did not show any change in voltage with changing inputs, the only thing left to look at was the input nodes. As discussed earlier, the circuit functions by taking two input currents (11 and 12). The most telling test that was done was to set the current at both input nodes to OA and then measure the voltage at these input nodes. It was expected that the voltages measured would be identical. Unfortunately, that was not the case. Shown in Figure 3-16 is a table showing the input voltages measured on four different chips with the input currents set to OA. The fact that there is over 30mV of mismatch between the inputs in some cases suggests that the inputs are not matched. 115 Vin1 Vin2 Chip #1 394mV 378mV Chip #2 390mV 353mV Chip #3 383mV 370mV Chip #4 363mV 329mV Figure 3-16: Table showing input voltages on four different chips with input currents set to OA Once it became clear that the inputs were not matched, it was necessary to determine what was causing the problem. The circuit had been modelled during the design phase at all the available corners with up to 3- process variation in each direction. While in some cases, performance was not optimal, functionality was always preserved. Through conversations with Vitesse, the resistors became the main focus of the debug effort. As was mentioned previously, the HGaAs5 process was still in development during the period in which the chip was designed and fabricated. After speaking Vitesse about the models used for simulation, it was determined that the models did not take into account process variation of the NRES resistors used in this design. Furthermore, these resistors could vary up to 15% of designed value within the same chip. Using this information, the amplifier circuits were re-simulated at various process corners with various resistor mismatches in the first stage. In doing this, a situation was found that approximately matched what was being seen in the lab. By running a 2- process corner with a 10% resistor mismatch in the transimpedance stage, the simulation showed a similar level of mismatch of the input voltages to the voltage mismatch measured in the lab. Even without having the appropriate models, this situation could have probably been avoided. In the layout for the amplifier, the two transimpedance amplifiers were put on opposite sides of the differential amplifier 116 for ease of layout. This meant that the resistors in the different TIAs were sepa- rated by about 100pm. If these dual TIAs had been laid out as a single block with the resistors sitting right next to each other, the mismatch would probably not have occurred. Unfortunately, it was unknown at design time that this was a significant issue partially due to inexperience on the part of the designer and partially due to incomplete information from the manufacturer. This leads to another tangential issue about choosing which process to work within for designing ICs. While the Photonic Systems Group was very fortunate to work with Vitesse Semiconductor using the HGaAs5 process for the PSG1, working with an experimental process also proved somewhat problematic. With models and design parameters continually in a state of flux during development, working in process that is still in development can prove daunting even for a circuit designer or engineer directly employed by the company developing the process. These employees have the luxury of a better flow of information about the development between groups within the company. Being external to the company adds another level of complexity. It makes it quite difficult to ensure sure that the most current design rules and design models are always being used for the design. If the decision is made to use an experimental process again, it is very important to manage the connections and try to be included in the group that has access to all the necessary and up-to-date information. 3.3 Photonic Systems Group Chip #2 One of the inherent problems using GaAs is that it is essentially opaque to light at the wavelengths of interest (i.e. 850nm). This is a result of GaAs being a direct bandgap semiconductor and the absorption curve at the bandgap edge increasing sharply. The fact that GaAs is opaque to light of this wavelength means that for the CONNPP both sides of GaAs chips would need to be processed in order to produce an OEIC that could be cascaded. The emitters would need to be grown or bonded on one side of the chip, and the detectors and electronics would need to be grown, bonded, or 117 patterned on the opposite side of the chip. Processing both sides of a chip or wafer would require several non-standard process steps some of which would need to be developed specifically for this project. As a result, other alternatives to GaAs have been explored. One of the most attractive options for the CONNPP is Peregrine's 0.5pm ultrathin silicon (UTSi) CMOS process. This process consists of a very thin layer of silicon (120nm) on top of a sapphire substrate. This sapphire substrate acts as an insulator also making this a silicon-on-insulator (SOI) process. In addition, sapphire is an optically transparent material for light at a wavelength of 850nm. The thin silicon layer, because of its thickness, is also essentially transparent to the 850nm light. This means that the emitters, detectors, and electronics can potentially all be grown, bonded, or patterned on the same side of the chip or wafer. The difference between the GaAs and UTSi SOS implementations can be seen in Figure 3-17. Due to the benefits gained by using the Peregrine UTSi process, it was chosen for the Photonic Systems Group Chip #2 (PSG2). It was determined that the PSG2 would include amplifiers, photodetectors (if possible), aligned pilar bonding structures, and analog to digital converters. The amplifiers and the photodetector structures will be discussed here. GaAs UTSi SOS VCSEL VCSEL Detector Detector Sapphire (b) (a) Figure 3-17: (a)GaAs chip with two sided processing (b) UTSi SOS chip with all devices on a single side 118 3.3.1 PSG2 Amplifier With lessons learned from PSG 1, the next generation amplifier was designed with an eye towards simplicity. Once again, several different topologies were researched, examined, and simulated. With potential revisions in the overall architecture, linearity of the amplifier circuits is of paramount importance. The resolution of the archi- tecture as a whole is potentially limited by the linearity of the amplifiers, detectors, and emitters. Whereas there is only limited design control over the linearity of the detectors and emitters, the amplifiers provide much more control over the linearity through their design. Therefore, they were designed such that they would not be the weakest link in the system in terms of bit-level resolution. Most of the literature on optical receivers and transimpedance amplifiers is focused on digital communication for high speed optical networks [29-31. In these cases, the designers are concerned with eye diagrams and obtaining correct logic levels. The CONNPP, however, seeks to optically transmit information in an analog fashion between the planes of neurons. Furthermore, the speed of the CONNPP is not a key metric (at least at this stage of the project). While the amplifiers on the PSG1 chip were adapted from a digital optical interconnect application, the amplifiers on the PSG2 were designed from scratch. After investigating various transimpedance amplifier topologies, it was determined that an operational amplifier (op-amp) with negative resistive feedback provided the best linearity with the largest voltage swing. A top-level diagram of the amplifier structure can be seen in Figure 3-18. While there are many varieties of op-amp, a simple two-stage CMOS op-amp topology was chosen for its simplicity and potential small footprint [32]. The first stage of this two-stage CMOS topology consists of a differential PMOS pair with a single-ended output. This single-ended output is then input into a simple common source NMOS amplifier with current supply. A feedback resistor is then connected between the input and the output. A schematic of this amplifier can be seen in Figure 3-19 complete with transistor and resistor sizings. 119 Rf Vout + VBias Figure 3-18: Top level diagram of op-amp in transimpedance configuration It might be noticed that the resistors in this design are unusually large for onchip applications. One of the benefits of the Peregrine 0.5pum UTSi CMOS process is that it includes a resistor structure with very high resistivity (SN resistor) [33]. This resistor structure exhibits uniformity and variation with temperature similar to other, lower resistivity structures. In the previous design on PSG1, resistors played a crucial role in terms of biasing differential structures and other stages. This inclusion of resistors in the differential structures became a weak point of the design. In the op-amp design used in PSG2 (Figure 3-19), the resistors are only used for a single current supply bias and the resistive feedback element. This eliminates the need for near-perfect matching for resistor structures. The models from Peregrine did include resistor process corners. These corners were simulated and it was shown that the the variation in resistor values affects the operation of the op-amp circuit in a very minor fashion. The circuit in Figure 3-19 was simulated using the Spectre models obtained from Peregrine Semiconductor in their 0.5pum UTSi CMOS process development kit (PDK). The amplifier was designed to take a 0-10pA input from a photodetector and produce voltage swing on the output. The full range (-100ptA-100pA) transfer function for the amplifier can be seen in Figure 3-20. It can be seen from the full range characteristic that the amplifier has a fairly linear range all the way from -25pA to about 20pA. The best way to characterize 120 Vdd=3.3V Iin R1 20K M7 10/1.5 M2 10/1.5 M1 5/1.5 M3 10/1.5 M4 10/1.5 VZ s Vout 10/1.5 10/1.5 R2 50K Figure 3-19: Schematic for the two-stage op-amp with negative resistive feedback the linearity of the amplifier over a given range is to examine the derivative of the transfer function. If the transfer function were perfectly linear in a particular range, it would be expected that the derivative over that range would be a constant. Shown in Figure 3-21 is the derivative of the full range transfer function. There are a couple of important things to note from Figure 3-21. First, the values of the derivative are very large in magnitude (almost 50kQ). This transimpedance value is actually the gain of the amplifier. In this case, for every 1A of input swing, 50,OOOV of output swing will be seen. The target input current swing was about 10pA which, by this convention, corresponds to about 0.5V of voltage swing. Next, as expected, the derivative has a relatively flat section in the range right around OA. If the target range for the input current is 0-10pA, then it would be desirable to have the absolute flattest part of the derivative curve centered around the middle of that range (5pA). In Figure 3-22, the transimpedance curve has been zoomed in on such that the area of interest (0-10pA) can be seen clearly. 121 DC Response 4.0 :3.0 > 2.0 1.0 -100u -50Ou 0,00 50,Ou 100u dc ( A ) Figure 3-20: Transfer characteristic for PSG2 amplifier (Current input, voltage output) Notice that the value of the derivative function varies only about 20Q out of 50,OOOQ in the range of interest (0-10pA). This represents a 1 part in 2500 variation of the linearity over the full range of interest. While many of the system's architectural details are yet to be hammered out, it has never been suggested that the system achieve better than 8-bit resolution. This means that the amplifier clearly achieves the maximum 8-bit resolution desired. In addition, the amplifier is currently not the weakest link in terms of bit-level resolution. The optoelectronic devices such as the VCSELs and photodetectors provide much lower bit-level resolution. Improvement of the resolution of these optoelectronic devices should be a major focus of continuing work. While the speed of operation was not a major consideration for the PSG2 amplifier, AC simulation was done to determine the frequency range of operation. The simulations shows that the amplifier has a 3dB cutoff point of greater than 100MHz. Once the amplifier circuit was designed schematically and simulated, it was then laid out, extracted, and compared to schematic (LVS). The final amplifier structure was about 24pmx45pm. This is about 1/4 of the area used for the PSG1 amplifier. Furthermore, the Peregrine process has a larger minimum feature size (0.5pum) than 122 10K 0,0 -10K -20K -30K -40K -50K -100u -50.0u 0.00 (A) . 1. . . . I . . . . . . . 1 , 50.0 U 100u Figure 3-21: Derivative of the transfer function shown in Figure 3-20 -49.580 -49.600K -49.620K -49.640K -49.660K -49.680K -49.700K I -2.00u 1.00u 7 4.OOu .00u 10.Ou 13.Ou (A) Figure 3-22: Close up of Derivative of the transfer function the Vitesse HGaAs5 process (0. 3 5 pm). This reduction in size is a function of better topology design along with better layout. The layout for the PSG2 amplifier can be seen in Figure 3-23. 123 9 23." 1s5 -20 Figure 3-23: Layout of PSG2 transimpedance amplifier 3.3.2 PSG2 Photodetectors In terms of photodetectors, the intention of the CONNPP using the Peregrine UTSi SOS process was to use aligned-pilar-bonding to mechanically bond GaAs pin photodetectors to the silicon-on-sapphire chip. While this was the eventual goal, it was also realized that it would be useful to produce photodetectors directly in the Peregrine UTSi SOS process for circuit and small system characterization. As a result, an investigation was conducted to determine the feasibility of producing photodetectors in the standard UTSi SOS process. After researching the topic to understand the specific processing steps used by Peregrine, it became clear that building photodetectors in the standard process was somewhat problematic. The first issue relates to the fact that this is an Ultra-Thin Silicon process meaning that a very thin layer of silicon is deposited on top of a sapphire substrate. For this particular process, the thickness of the silicon layer is about 120nm. At 850nm, silicon has absorption coefficient of 0 - 800cm 124 1 (3.2) This means that the amount of light that passes through the silicon layer without being absorbed is e-= e800(120x10- 7 ) = 99.04% (3.3) In other words, for every 1mW of optical power that passes through the silicon layer, about 9.6 pW of optical power is absorbed in the silicon layer. This fact by 1 itself does not preclude the design and fabrication of photodetectors. Additionally, the fill factor (percentage of device are exposed to the light) is likely to be around 50% which will reduce the amount of light absorbed by another factor of two. For testing purposes, these are not showstopper issues as the intensity of the light can be increased until sensitivity is seen. The main issue in building these photodetector devices lies in the process flow itself. Ideally, to build either metal-semiconductor-metal (MSM) photodetectors or lateral pin photodetectors, it is necessary to have a relatively low doped region between the higher doped contact regions. Additionally, this relatively low doped region should be exposed to the incident optical energy the user wishes to detect. An example of a lateral pin of this type can be seen in Figure 3-2. Most unfortunately, the Peregrine UTSi SOS process does not allow lateral pin structures to be built. The reason for this can be found by examining the isolation structures used to electrically isolate NMOS devices from PMOS devices. The necessary structure for a lateral pin device consists of p+ and n+ regions separated by a low doped or intrinsic region. The problem building this structure is that Peregrine uses a thick field oxidation layer in between the n-well and p-well structures. This field-ox layer penetrates the entire 120nm of the silicon giving complete electrical isolation between the p and n regions. While this is ideal for transistors, it is problematic for optoelectronic devices. This field-ox layer makes it impossible to create a lateral pin structure because any region between a p region and n region is completely filled with field-ox. There is an exception to this rule for field-ox isolation between p and n regions. 125 This exception applies to the diode structures in the UTSi SOS process. A drawn diode consists of p region directly abutted to an n region where the junction between the two regions is covered by gate material. By definition, the area under the gate structure remains at a lower doping than the defined p or n regions. While it seems possible that this structure could be used as a lateral pin photodiode, that does not seem to be the case. Photodetectors based on this diode structure were designed by Edward Barkley in 2000. Upon testing the structures, they did not show any optical response. It is postulated that this the lack of optical response is due to the structure of the gate. While the optical energy should be able to penetrate the polysilicon gate, many times a silicide is put on top of the gate structure to facilitate better electrical contact [34]. This silicide will potentially act as a reflector to incoming light. While it has not been able to be confirmed by Peregrine, it appears from the lack of optical response that there is a silicide layer on top of the gate. In conjunction with Edward Barkley, new photodetector structures were designed in the UTSi SOS process. These devices are structurally much different and use a different principle of operation. In the course of research into the nuts and bolts of the Peregrine UTSi SOS process, it was discovered that there is a drawn layer called SDBLOCK. The function of this layer is to block the source/drain implant in specific areas. The area beneath the SDBLOCK should then have a much lower doping than the source and drain regions abutting it. As a result, this SDBLOCK layer was used to pattern a device that looks much like an NMOS transistor without a gate structure. This device can be seen in Figure 3-24. The operation of this device is quite unique from anything discussed thus far. As it might be noticed, the regions shown have an npn structure similar to a bipolar junction transistor. There are separate connections to the n+ regions (emitter and collector), but there is no electrical connection to the p region (base). Generally in a BJT, a small current is injected into the base which causes a larger current to flow from the emitter to the collector. In this case, the current is being injected optically into the base as electron-hole pairs (EHPs). Because of the potential difference between the different types of materials along with the bias, the electrons from the EHPs will 126 UTSi p Sapphire Figure 3-24: UTSi optoelectronic device be collected by the positively biased terminal. This leaves a buildup of holes in the base region raising the potential of the base region. This phenomenon is referred to as "self-biasing" by Yamamoto et al [351. Furthermore, it is expected that, becuase of its BJT structure, the photodetector should exhibit gain. This was reported in both [35, 361. In the devices designed for the PSG2, these source and drain regions were interdigitated and each connected together to produce a single detector with a larger footprint. Two different structures with different source-to-drain spacings were laid out. One was laid out with minimum spacing (1.2pm) and one was laid out with double-minimum spacing (2.4[tm). The layout for this detector structure can be seen in Figure 3-25. 3.4 Future Work As mentioned at the beginning of this chapter, work is continuing on the PSG2 chip. Currently, the amplifiers and photodetectors discussed here have been designed, simulated, laid out, and verified. Some of the other structures slated to be included on the PSG2 chip include Aligned Pilar Bonding sites. These APB sites will allow work to begin on the integration of the optoelectronic devices onto the chip as outlined previously. In addition to the APB structures, work is also ongoing in the area 127 Metal I Source Drain SDBLOCK (b) (a) Figure 3-25: Lateral BJT photodetector layout of circuit design. Currently, a successive approximation analog-to-digital converter (ADC) is being design, simulated, laid out, and verified by Travis Simpkins of the Photonic Systems Group. As this ADC also inherently contains a digital-to-analog converter (DAC), this circuit will provide the facilities to convert data within the chip and the architecture between digital and analog signals. If the CONNP is going to eventually be communicating with a PC, this is an important piece of infrastructure to have regardless of the neural network architecture chosen. As design on the PSG2 chip wraps up, work will begin on the design of the PSG3 chip. This chip will most likely start to implement some of the neural network structure discussed throughout the body of this document including the architecture discussed in Chapter 4. This chip will hopefully have a small array of neurons/pixels that will allow a small-scale proof-of-concept system to be built. This chip will provide learning on some of the key alignment and packaging issues faced by the CONNPP. 128 Chapter 4 Neural Network Models and Architectures Over the past twenty years, neural networks have been built in software running on general purpose digital computers, dedicated analog hardware, and dedicated digital hardware. The Compact Optoelectronic Neural Network Processor actually repre- sents a hybrid of all three of these implementations. The current suggested CONNPP architecture will be explained along with a proposal for a major architectural change to the original design. Models will be built to demonstrate the performance differences between the current architecture and the proposed architecture. 4.1 Specification For the CONNPP Hardware Implementation While it is possible to model large neural networks with a general purpose computer, these implementations do not always produce the fastest results as the hardware is not optimized for the problem. Specialized analog hardware has also been used to replicate the functionality of a neural network. These analog components generally require the use of precise resistive and capacitive elements to accomplish their task. This poses a potential problem as semiconductor processing techniques generally have 129 difficulty producing resistive and capacitive elements with high precision. With the success and availability of CMOS technology, many researchers have started to look at specialized digital hardware to implement neural network functionality. The CONNPP architecture draws from all three of these areas to create a processor that utilizes the available technology in a very efficient manner. A block-level diagram of a single CONNPP neuron can be seen in Figure 4-1. These neurons are then tiled in a 2-dimensional array on a VLSI chip. As described in Chapter 1, the detector element and VCSEL array are fabricated such that optical input is accepted on one side of the chip and optical output is produced on the other side of the chip. Because of this novel fabrication technique, the chips can then be cascaded in the third dimension. In conjunction with a holographic element array, the neurons on one chip can communicate optically with the neurons on the next chip. Neighboring NeuroI Neighboring euron Neuron Neighboring Neuron 1/0 Neighboring Neuron Microprocessor Figure 4-1: Block-level diagram of single neuron in the CONNP As the system consists of a series of identical elements that have been replicated in 3-dimensions, the workings of the system as a whole can be understood by examining the operation of the single neuron shown in Figure 4-1. The operation of the neuron begins with the detection and conversion of an analog optical signal to a digital 130 electrical signal by the detector unit (Figure 4-2). The optical signal received comes from the nine nearest-neighbor neurons in the previous layer. As the photodetector senses intensity and the nine beams from the previous layer are incoherent, this detector is basically a summing unit within the neuron. The photodetector converts the light incident on the photodetector to an electrical current. The amplifier circuits discussed in Chapter 3 convert this electrical current into a voltage at a level suitable for analog-to-digital conversion to a digital signal. Finally, this analog voltage is converted into a digital signal by the analog-to-digital converter. This digital signal is then passed on to the VLSI Electronics block. Detector Unit Photodetector + Ampifier Analog-to-Digital Converter Figure 4-2: The detector unit of a single neuron The VLSI Electronics block takes the digital signal from the detector unit and passes it along to the block labeled "I/O to Neighbors." As mentioned in previous chapters, the processor has both in-plane connections and between-plane connections. The between-plane connections are the optical interconnects that allow the planes of neurons to communicate with each other. The in-plane connections allow neurons within a single 2-dimensional array to communicate. In this original CONNPP architecture, each neuron is able to communicate with its four nearest-neighbor neurons within the plane of the chip. The "I/O to Neighbors" block sends the digital signal representing the optical input to the four nearest-neighbor neurons. Likewise, this block also receives a digital input from the four nearest-neighbor neurons representing their optical inputs. These signals then are passed back to the VLSI Electronics block. 131 Figure 4-3: 2-dimensional array of neurons with 4-way in-plane nearest-neighbor connections The VLSI Electronics block proceeds to sum the digital signal from the optical input along with the digital signals from the four nearest-neighbor neurons. This digital sum is then used as the input to the activation function of the neuron. The output of the activation function becomes the output of the VLSI Electronics block and the input to the VCSEL Array block. Each VCSEL in the array connects to a different neuron in the next 2-dimensional array of neurons. Each VCSEL has a current applied which is proportional to the output of the activation function multiplied by the weight between the two neurons. This means that each VCSEL in the array is potentially modulated with a different current. By doing this, the connection weight information is explicitly encoded in the optical signal incident on the photodetector. This allows the optical signals incident on the photodetector to simply be summed as the weighting has already been done. It may be noticed that the "I/O to Microprocessor" block has not been involved in the operation of the neuron to this point. This block is utilized mainly in the training and programming of the neural network. The CONNP does not have the facilities or infrastructure on-chip to evaluate and recalculate weights. For this task, the neural network processor relies on the host PC running software that calculates the new weights and sends them to back the neural network processor using the "I/O to Microprocessor" block. One advantage of this methodology is that the training algorithm is not implemented in hardware. This means that a variety of different 132 training algorithms can be used to give the neural network processor greater flexibility. The other main use of the "I/O to Microprocessor" block is to program the CONNP. Training the network to a small mean squared error is potentially a very time-consuming task. Within the neural network processor, the state of the processor is basically captured by the connection weights between neurons. If after the training session, the weights for all the neurons could be stored on the host PC, they could then be recalled later in order to restore the state. Having the facilities to stream weights in and out of the processor allows this functionality. The neural network can be extensively trained for a particular task, store the weights on the PC, and then later request this list of weights from PC when a similar problem is encountered. 4.2 Proposal For a CONNP With a Global Busbased Architecture As discussed in the previous section, there are two different levels of connections within the CONNP. There are in-plane electrical connections between neurons and there are between-plane optoelectronic connections between neurons. The specification described allowed for each neuron to communicate electrically with four neurons within the plane of the chip and nine neurons in the next plane. The optoelec- tronic connections, as described, clearly allow the processor to achieve 3-dimensional scalability. This 3-dimensional scalability allows the processor to increase its computational power simply by increasing the number of cascaded layers. The in-plane connections, however, seek to increase the computational power of each layer within the neural network. Thus, if the in-plane connection model could be significantly improved, every layer of the cascaded network would benefit. The between-plane optoelectronic connections within the processor, as specified previously, push the boundaries of the current state-of-the-art technology. It would be desirable if the in-plane electrical connections also pushed these boundaries. While the optoelectronic connections achieve this through the use of novel processing tech- 133 niques and holographic interconnects, the in-plane electrical connections must accomplish this through novel architectures and circuit design. We propose using a digital bus-based connection model within the plane of the chip to achieve global in-plane interconnects. Figure 4-4 shows a block-level diagram of a neuron in the new proposed bus-based architecture. As can be seen in the figure, the "I/O to Microprocessor" and "Neuron I/O" blocks have been consolidated into a single block. Previously the "Neuron I/O" block consisted of connections to the four nearest-neighbor neurons within the plane of the chip. This meant that separate infrastructure was potentially needed to communicate with the in-plane neurons and with the microprocessor. In the previous design, a chip-level bus was still required. This chip-level bus connected to every neuron on the chip and allowed the neurons to send and receive weight information during training or programming. In this revised architecture, this chip-level bus is also utilized to allow every neuron on the chip to communicate with every other neuron on the chip. VCSELs IDetectorl VLSI lElectronicsi 1/0 to Microprocessor Neuron 1/O Chip-level Bus Figure 4-4: Block level diagram of new proposed neuron structure The operation of the global bus-based architecture is actually very similar to the original CONNP design. First, the optical inputs are received by the detector block of the neuron and converted to a digital signal. 134 This digital signal is then passed along to the "VLSI Electronics" block. This block holds on to the signal and also forwards it to the I/O block. This leads to a discussion on the operation of the chip-level bus. In addition to the I/O block within each neuron, there is also a chip-level I/O block that oversees the chip-level bus. This block is responsible for controlling all of the traffic on the chip-level bus. Once the optical inputs have been received by the neurons, the bus control logic takes over. As discussed, the optical input to each neuron was detected and converted to a digital signal. The bus allots each neuron a time slice to broadcast the digital data it received as an optical input. When the neuron is not broadcasting its digital data, it is receiving digital data from other neurons. Control " Figure 4-5: Bus based architecture with bus control block and individual neurons As an example, consider a very small on-chip neural network consisting of four neurons. Each neuron has an address assigned to it ranging from 1 to 4. The bus control logic directs the neurons to broadcast their data in order. During the first clock cycle, neuron 1 broadcasts its data and neurons 2-4 receive the data from neuron 1. In the second cycle, neuron 2 broadcasts its data while neurons 1, 3, and 4 receive the data. This continues until all the neurons have broadcast their data. This scheme allows n neurons to broadcast their data in n cycles. This clearly leads to a tradeoff between connections and time. In the original implementation of the CONNP, each neuron could communicate with its four nearest135 neighbors in 4 clock cycles (assuming the I/O block can only talk to one neuron at a time). This means that there is a direct tradeoff between number of in-plane connections and number of clock cycles required to complete a single operation. This issue will be examined in more detail in the next section. Global Bus 8 Reset Wehtsf(x) Address out X 8 Data Accumulator Multiplier Figure 4-6: Simplified block-level diagram of logic within neuron Figure 4-6 shows a block-level diagram of the inner workings of the logic within a bus-based neuron. When the broadcast is set to begin, the bus control logic sends a reset signal to the counter (the box labelled "+1"). This counter contains the address of the neuron currently talking on the bus. This address is used to index into the table of weights. The weight retrieved becomes the first operand for the multiplier. The data is received from the neuron currently in control of the bus becomes the second operand for the multiplier. The operands presented to the multiplier are a weight and a data value. This represents one of the wizx terms in the sum n wiS i (4.1) i=1 The result from the multiplier (wizx) becomes the inputs into the adder which is configured with feedback as an accumulator. This accumulator adds each of the wixi elements to a running sum. After all the neurons have broadcast their data, the result of the accumulator is the expression given in Equation 4.1. This expression is then used as an index into a lookup table of activation function f(x). The number of 136 entries in this table represents the resolution of the activation function. This value, f is the actual output of the neuron. wii) (4.2) However, instead of firing the VCSELs in proportion to the output of this function, it would be desirable to also encode the weights for those connections. This means that this output value should be multiplied by each of the nine between-plane nearest-neighbor weights to produce nine separate outputs. These nine outputs will then be used to determine the proper current level at which to drive each of the nine VCSELs in the array. 4.3 Neural Network Models and Simulations While several different architectural models for the CONNPP have been proposed and described, it is only feasible to actually implement one of these architectures in hardware. As a result, models and simulations allow the different proposed architectures to he evaluated according to a particular set of metrics. The purpose of this section is to develop these models and simulations in order to evaluate these different architectures. The first task in building a model is to determine the best development environment in which to build the model. Many different choices were examined and evaluated for this task. Among the choices were the Stuttgart Neural Network Simulator (SNNS), MATLAB, several Java neural network packages, and writing the models in C or C++ from scratch. The SNNS package seemed to be potentially powerful for large scale neural network simulations, but proved difficult and time-consuming to use from an interface point of view. Building the models and simulations from scratch using C was ruled out as an option because it seemed somewhat foolish to invest a large amount of time in reproducing work that had already been done using commercial products. After examining and evaluating all the available options, MATLAB and its Neural Network Toolbox were chosen as the development environment. The Neural Network Toolbox's ease of use coupled with the power of the underlying 137 MATLAB environment made it the logical choice. 4.3.1 Background on MATLAB Neural Network Toolbox Figure 4-7 shows the MATLAB Neural Network Toolbox's internal representation of a single neuron element. In the neuron shown, there are R inputs into the neuron. These R inputs consist of an input value (pi) and a connection weight (wi,i). Input Neuron p1 n P2 a P3 1,1 . 1,R b Figure 4-7: MATLAB model of a neuron The input values are stored in an R x 1 vector p and the connection weights are stored in a 1 x R single row matrix W. In the summing unit (labelled E) each of the input values is multiplied by its corresponding weight element and they are all added together. In addition, a constant input of 1 is multiplied by the weight (b). This provides a dynamic bias for the neuron as the bias (b) can be adjusted through training. The sum of all the input and the bias is labelled as n in the diagram. This value is then fed into the activation function unit of the neuron (labelled activation function unit uses n as the argument for function f f). The and produces the value a. This means that the output of the neuron can be written as a = f(W - p + b) (4.3) where W - p is the dot product of the single row matrix W and the vector p. 138 The neural network toolbox provides a set of activation functions that includes linear functions, logsig, tansig, and the Heaviside function just to name a few. Once the single neuron structure in MATLAB is understood, it can be extended to create entire layers of neurons. A layer of neurons accepts inputs from a number of sources just like a single neuron. Each neuron in the layer processes the information presented to its inputs and each neuron in the layer then produces its own output. A diagram of this can be seen in Figure 4-8. Input m \ Neuron Layer Ai n-, fnal P2 P3 b2 1 PR - 5n - a bs Figure 4-8: MATLAB model of a layer of neurons The data structures within the layer of neurons changes somewhat from the single neuron model. Assuming there are R inputs to the layer, the input values remain in the form of a R x 1 vector. The connection weights change their form significantly due to the fact that MATLAB assumes that every input has a connection to every neuron in the layer. This is a problematic assumption for the models built to represent the CONNPP. The solution to this problem will be addressed as the CONNPP model is developed. Whereas the weights for a single neuron were in a single row matrix with 139 dimensions 1 x R, for a complete layer of neurons, the weights comprise a matrix with dimensions S x R. Each row of the matrix represents a list of weights for a particular neuron in the layer. Each of the neurons has its own summing unit that it uses to add each of the input values multiplied by the corresponding weight. This produces an s x 1 vector n where ni represents the output of the summing unit of the jth neuron. The values in this vector are then input into S activation function units to produce an S x 1 vector a representing the output of the layer of neurons. Each value in the vector a corresponds to the output of an individual neuron within the layer. This means that the functionality of the layer of the neurons can be written as a = f(W - p + b) (4.4) where all the elements within the equation are matrices. MATLAB simplifier the layer representation shown in Figure 4-8 to reflect the fact that the neuron layer can be represented as matrices and matrix operations. This simplified layer structure can be seen in Figure 4-9 Input Neuron Layer Rx" n RxPISxR F_4 b, R ) f SX 1P +- xi Figure 4-9: Simplifier MATLAB model of a layer of neurons reflecting matrix properties All of the neurons within a given layer in MATLAB use the same activation function (specified by the user). These layers can then be cascaded such that the outputs of layer i become the inputs to layer i activation function if desired. + 1. Each layer may have a different Furthermore, the layers need not contain identical 140 numbers of neurons. 4.3.2 Model of the CONNPP hardware using MATLAB To model different proposed hardware implementations, it is necessary to have a good understanding of the operation of the hardware along with the facilities available within MATLAB. From this knowledge, higher-level models can be built up from the low level structures provided by the Neural Network Toolbox. As discussed previously, the MATLAB Neural Network Toolbox assumes that all layers are globally interconnected. This is far from the case with the CONNPP. In fact, there are many situations where the layers are sparsely connected. For this reason, it is important to develop a method to replicate these various sparsely connected architectures. Recall from the explanation of the Neural Network Toolbox structures that the connection weights between layers are stored as S x R matrices where S is the number of neurons in the Zth layer and R is the number of neurons in the (i + l)th layer. If this is the case, then a set of matrix masks can be created to zero the elements without any connection in the network. 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 35 36 31 32 33 34 Figure 4-10: A 6 x 6 layer of neurons with each element numbered As can be recalled from previous discussions about the CONNPP, the neurons are laid out in 2-dimensional planes. Figure 4-10 shows a layer of 36 neurons arranged 141 in a 6 x 6 array with each neuron numbered. It is known that the optoelectronic connections between planes use a 9-way nearest neighbor connections scheme. As an example this means that neuron 8 in layer i would connect to neurons 1, 2, 3, 7, 8, 9, 13, 14, and 15 in layer (i + 1). It can also be recalled that internal to MATLAB, this 36 neuron layer is stored as a 36 x 1 vector. Because there are 36 inputs from layer i to layer (i + 1) along with the fact that layer (i + 1) also has 36 inputs, leads to the connection weight matrix between layer i and layer (i + 1) having dimensions 36 x 36. Each row of this 36 x 36 connection weight matrix represents the weights for a particular neuron in layer (i + 1). In this example, row 8 of the connection weight matrix will be examined. In the first of replicating the 9-way nearest neighbor connection scheme a second 36 x 36 matrix is created. This will be the mask for the connection weight matrix. In order to achieve the 9-way nearest neighbor interconnect scheme between planes, elements 1, 2, 3, 7, 8, 9, 13, 14, and 15 of row 8 of the mask matrix are set to a value of 1. All other elements in row 8 of the mask matrix are set to 0. This process is repeated for all of the 36 neurons and all 36 of the rows of mask matrix. Once the complete mask has been created, an element-wise multiplication is done using the mask matrix and the connection weight matrix. The resulting matrix will be the new connection weight matrix. Any element in the matrix whose corresponding mask matrix value was 1 will remain unchanged. Any element whose corresponding mask matrix value was 0 will also be 0. It is important to note that these mask matrices must be created and applied between any set of layers where global interconnects are not desired. Furthermore, these masks must be applied after every training cycle. When MATLAB trains the neural network, it calculates a complete new set of weights optimized to minimize the error on the next presentation of a pattern. This potentially changes elements in the connection weight matrix from their desired value of 0 when there is no connection. As a result, it is very important to zero these elements after each training cycle so as not to get misleading results. Now that it is understood how to implement different connection models between 142 neural network layers, a model of the CONNPP hardware can be developed. One of the key characteristics of the CONNPP model is that in a single hardware layer, inputs are received optoelectronically, converted to digital format, and than distributed digitally among the neurons in the layer. The converted optoelectronic inputs along with the digital inputs from the other neurons are then used as the input to the activation function of the neuron. In order to model this single hardware layer, two MATLAB neural network layers are required to reproduce the functionality of the single hardware layer. A diagram of the MATLAB single hardware layer model can be seen in Figure 4-11. Hardware Layer j Layer i Wi Layer (i) Lin WV+1)tan b b( 0 Sig Figure 4-11: MATLAB model of a single CONNPP hardware layer As can be seen in the figure two layers of neurons are used to replicate the functionality of a single CONNPP hardware layer. Layer i uses the Purelin function as its activation function while layer (i + 1) uses a tansig function as its activation function. The purelin function is a linear function with a slope of 1. This means that the sum presented to the activation function of a neuron in layer i will also be the output of the activation function to layer (i matrix between hardware layers (j - 1) and j. + 1). W is the connection weight This represents the optoelectronic connections between the layers of the processor. As such, this first connection weight matrix of the hardware model will always be masked so as to provide 9-way nearest neighbor connections between the hardware layers. For each neuron in layer i, the nine nearest neighbor inputs are multiplied times their corresponding weights, added 143 together, and then presented as outputs to the next layer. This is analogous to the detector unit of the hardware implementation which optoelectronically sums the nine optical inputs and passes them along to the VLSI Electronics block of the neuron. The connection weight matrix W(i+l) represents the connections between the neurons within the plane of the chip. This connection weight matrix is basically analogous to the Neuron I/O block in the hardware implementation of the CONNPP neuron. As the goal of simulations conducted here is to evaluate different interconnection models for neurons within the plane of the chip (i.e. 4-way nearest neighbor vs. global), this connection model will be one of the key variables in the simulations. In order to implement different connection models, different mask matrices are simply applied to the connection weight matrix W(i+) after each training session. Figure 4-12: Tansig function Next, the outputs from layer i are multiplied by their corresponding weights from connection weight matrix W(i+1) which distributes the inputs to layer (i+ 1) according to the connection model being tested. The inputs to each neuron in layer (i + 1) are then summed. Finally, this sum is used as the input to the tansig activation function of layer (i + 1). This activation function has the form 2 tansig(n) = (1 + 144 _ -2n) - 1 (4.5) The graph of the function can be seen in Figure 4-12. The output of this layer is a vector containing the number of elements equal to the number of neurons in the layer. This represents the output of the hardware layer. In a hardware implementation, this will be the optoelectronic signals from the VCSEL array. This output will act as the input to the next hardware layer. Next, the simulations conducted and the results of those simulations will be discussed. 4.3.3 Simulations and Results In the last section the models were developed to replicate the functionality of the CONNPP hardware using the MATLAB Neural Network Toolbox. This section will take these models one step further by providing implementation details, simulation conditions, and results of those simulations. Hardware Layer 1 Hardware Hardware Layer2 Layer 3 Figure 4-13: Example of the hardware system to be simulated in MATLAB The goal of the simulations was to provide a quantitative understanding of how different in-plane connection models between neurons affect the computational power and training time of the neural network as a whole. Simulation parameters were chosen such that they both represented potential real world implementations as well as being computationally tractable in the time available. As a result, a simulation was constructed to replicate a CONNP with three hardware layers (shown in Figure 4-13. As can be seen in the figure, the input is presented to Hardware Layer 1. Hardware 145 Layer 1 and Hardware Layer 2 are connected via the optoelectronic interconnects. Likewise Hardware Layer 2 and Hardware Layer 3 are also connected optoelectronically. The output is then taken from Hardware Layer 3. All three layers have in-plane connections. The in-plane connection model is the key parameter of interest in these simulations. Four different in-plane connection models were tested. The first in-plane con- nection model (Referred to as NONE) does not allow the neurons to communicate at all within the plane of the chip. The next connection model simulated was the 4-way nearest neighbor interconnects from the original. As another data point, the nine-way nearest neighbor connection scheme used for the optoelectronic connections between the planes was also tested for in-plane connections. Finally, a global in-plane interconnection model was tested. As mentioned previously, the activation function used for the neurons was a tansig function described in the last section. This is an activation function commonly used for neural networks. The network is a feedforward neural network that implements a backpropagation learning algorithm. The feedforward aspect of the network basically means that the inputs are fed into one layer of the network, propagate forward through the various hidden layers, and an output is produced at the final layer. This means that while a neuron from layer i can talk to a neuron from layer (i + 1), a neuron from layer (i + 1) cannot talk to a neuron in layer i. The backpropagation property means that in the training of the neural network, the error between the actual output and desired output is fed back through the network so that the weights between the hidden layers are adjusted so as to minimize the error on the next presentation of that pattern. The goal of the first simulation performed was to look at the effect of the in-plane connection model on training time for a particular neural network. In addition to varying the in-plane connection models, the size of the networks was also varied. Two different sized neural networks were examined. The first neural network had 36 neurons per layer which corresponds to a CONNP with a 6 x 6 2-dimensional array of neurons in each hardware layer. The second neural network had 144 neurons which represents a CONNP with a 12 x 12 2-dimensional array of neurons in each hardware 146 layer. Therefore, it was desirable to determine how the in-plane connection model affects the training time for the two different sized neural networks. In order to determine this, a series of training patterns needed to be chosen. Each training pattern consists of an input to the neural network and a desired output. Six different training patterns were chosen. As the input to the CONNP is in a two dimensional array, dot-matrix characters were chosen as the patterns. The desired output for the network was to classify the which letter was being presented. Therefore, if the letter 'A' was being presented at the input to the neural network, the desired output would have the output of neuron 1 (from Figure 4-10) produce a maximum output with all the other neurons producing a minimum output. Because two different size networks were being tested (6 x 6 and 12 x 12), the training patterns were simply scaled. Figure 4-14 shows the six different training patterns and their corresponding desired outputs. Inputs FCET 5X Desired Outputs Figure 4-14: Training patterns for the in-plane connection model simulation The neural network was trained in batch mode. This means that all six of the training patterns were presented to the neural network before the errors were calculated and the weights changed. This training method, in many cases, produces faster convergence of the weights. After each training cycle, a mean squared error (MSE) is calculated by the Neural Network Toolbox. The first set of simulations used these six training patterns for the different networks being tested to find out how many 147 training cycles it took to produce a MSE of less than 1 x 10-4. Because the weights are initially random, the number of cycles required to train the neural network to a given level of MSE varies. For this reason, each simulation was run six times so that the average and standard deviation could be calculated. The results of this simulation can be seen in in Figure 4-15. Number of training cycles required to achieve a MSE of le-4 6000 - 5000 S4000 .- +6x 3000 2000 1000 0 Global 9-way 4-way None -+- 6x6 108.8 167.3 279.16 1219.8 -- 12x12 77.83 1105.6 5602.3 ~15000 In-plane Interconnect Scheme Figure 4-15: Results for training simulation using six training patterns As would be expected, as the number of in-plane connections decreases, the number of training cycles required to reach a MSE of 1 x 10-4 increases significantly. The original simulations were done only on the 6 x 6 network. It was then considered that for a 6 x 6 network, the sparse in-plane interconnect models (4-way and 9-way) were actually reasonably powerful. Consider the fact that for 9-way in-plane interconnects for a 6 x 6 network, each neuron can talk to 25% of the neurons in-plane. Even the 4-way in-plane interconnects allow for 11% connectivity within the plane. For this reason, the 12 x 12 neural network was simulated. Using the 12 x 12 network, a 9-way in-plane connection model corresponds to 6% connectivity and a 4-way in-plane connection model corresponds to a 3% connectivity. As can be seen in the results of the simulations as this connectivity level decreases, the required training time increases. 148 This is reflected by both varying the connection model and by varying the size of the network. Focusing on the two proposed connection models (global and 4-way), it can be seen that for the 12 x 12 network that the neural network using the global in-plane connections trains about 72 times faster than the neural network using the 4-way in-plane connections. At first glance, these results seem very encouraging for the proposed global in-plane connection model. However, it must be considered that the 4-way connection model requires 4 clock cycles to complete the operation while the global connection model required 144 clock cycles to distribute the data globally. This reduces the time advantage of the global connection model from 72 times faster to 2 times faster. At this level, it is not clear that the global connection model offers enough benefits to justify potential increases in real estate and complexity. Another important metric to consider for the different in-plane connection models is the number of patterns each of the different networks can store. The training patterns used for this simulation consisted of a complete dot-matrix alphabet similar to the patterns shown in Figure 4-14. Likewise, the desired output patterns also consisted of a single element with a maximum value and a the rest of the elements with a minimum value. For this set of simulations, only the 12 x 12 network was examined. For a given in-plane connection model, each network was trained with an increasing number of training patterns. It was then recorded how many training cycles were required to reach a MSE of 1 x 10-' for that number of patterns. As an example, when a batch of 3 patterns was presented to the global in-plane connected network, it required about 80 training cycles to train the neural network to a MSE of 1 x 10- 4 . However, for the 4-way in-plane connected network, it took 1107 cycles to store 3 patterns to a MSE of 1 x 10-1. The complete results can be seen in Figure 4-16. If a cutoff of 5000 training cycles is used, it is shown that while both 9-way and 4-way in-plane connections fail at storing 9 patterns, the neural network using global in-plane connections has absolutely no trouble storing all 26 letters. Also of note is that the 9-way and 4-way connection models perform very similarly. In addition, the connection model with no in-plane connections is only able to store 2 patterns 149 Training patterns required to store different numbers of patterns to a MSE of 1x10-4 5000 4500 4000 -o 3500 - 3000 CL~ 2500- 2000 .E wo Global 9-way A- 4-way x-None 1500 1000 0 S 50000 5 10 15 20 25 30 # of Patterns Stored Figure 4-16: Training cycles required to store different numbers of patterns to a MSE of 1 x 10-4 before it reaches the 5000 training cycle threshold. While it is difficult to quantify "computational power" for neural networks, the fact that the neural network using global in-plane connections is able to store significantly more patterns than the more sparsely connected networks makes it a much more desirable architecture. This is why the number of patterns that a neural network can store is one of the key metrics used for evaluation. This result, showing the global in-plane connection model outperforms the 4-way connection model, justifies significant further research and investigation into the feasibility of the global in-plane connection model. 4.4 Suggestions For Future Work As the work on modelling and implementing the various in-plane connection architectures is in the early stages, there are several areas that require a significant amount of research. First, the models used for simulation need to be reworked in order to closer replicate the actual hardware being built. One of the first ways this can be 150 done is by discretizing the the weights stored in the network. As was described previously, both proposed architectures rely on the fact that weights are stored in digital form within the neurons. MATLAB, on the other, assumes analog weights by using high precision values for the connection weights. By passing the weights through an analog-to-digital converter after each training cycle within MATLAB, the digital nature of these architectures can be replicated. This will hopefully give a good understanding of the consequences of moving from an analog architecture to a digital architecture. Another potential issue that needs to be understood is the storage of the weights in the global in-plane connection model. For global interconnects, the number of connections required increases as n 2 where n is the number of neurons. This means that for a true globally connected in-plane neural network like the one described, each neuron must store n weight values. While the CONNPP does intend to increase the number of neurons per layer as the project progresses, it does not intend to increase the physical size of the neurons. methods to storing all n 2 weights. As a result, it is necessary to examine alternate One way of accomplishing this is by storing a subset of the weights that are determined to be the "most important" by some predetermined algorithm. This potentially allows most of the functionality to be be retained while reducing the physical size requirements. Further research is needed to understand the impact of this architectural change on the performance of the CONNP. The amount of chip real estate needed to implement each of the different architectures is also a major consideration that needs more investigation. While the global in-plane connections appear to produce a neural network with better performance (by some metrics), it is unclear how much physical space is needed to implement a single neuron. Likewise, this physical space requirement is also unknown for the originally proposed 4-way connection model. As the available chip space is a bounded resource, it is necessary to take this into consideration when looking at performance. One way to potentially quantify the space requirements is to use hardware level functional models built using a hardware description language such as Verilog or VHDL. By factoring in the size requirements for the various in-plane connection models along with 151 other performance based metrics, the best model can then be chosen and implemented in physical hardware to produce a performance optimized Compact Optoelectronic Neural Network Processor. 152 Bibliography [1] C. Warde and C. Fonstad. Compact optoelectronic neural co-processor project. In Proceedings of the ICOLA '02, pages B07-B11, Jakarta, October 2002. [2] S. Atkins. The fuss about the bus: PC 100 and the move to 100MHz SDRAM modules. TechNet, 1999. [3] D. A. B. Miller and H. M. Ozaktas. Limit to the bit-rate capacity of electrical interconnects from the aspect ratio of the system architecture. Journalof Parallel and Distributed Computing, 41(1):42-52, 1997. [4] Jan M. Rabaey. Digital Integrated Circuits, chapter 9. Prentice Hall, 1996. [5] D. A. B. Miller. Rationale and challenges for optical interconnects to electronic chips. Proceedings of the IEEE, 88(6):728-749, 2000. [6] Yi Ding. U.S. Patent 6,351,576, February 2002. [7] J. W. Goodman, F. J. Leonberger, S. Kung, and R. A. Athale. Optical intercon- nections for VLSI systems. Proceedings of the IEEE, 72(7):850-866, 1984. [8] D. Psaltis and N. Farhat. Optical information processing based on an associative model of neural nets with thresholding and feedback. Optics Letters, 10(2):98- 100, 1985. [9] J. H. Hong, S. Campbell, and P. Yeh. Optical pattern classifier with perceptron learning. Applied Optics, 29(20):3019-3025, 1990. 153 [10] K. Wagner and D. Psaltis. Multilayer optical learning networks. Applied Optics, 26(23):5061-5076, 1987. [11] Donald Crankshaw. S.m. thesis, Massachusetts Aligned GaAs pilar bonding. Institute of Technology, Department of Electrical Engineering and Computer Science, June 1998. [12] Wojciech Giziewicz. semiconductor bonding. Optoelectronic integration using aligned metal-to- S.M. Thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, June-August 2000. [13] Hao Wang. Monolithic Integration of 1.55 Micron Photodetectors with GaAs Electronics for High Speed Optical Communications. PhD thesis, Massachusetts Institute of Technology, Department of Materials Science and Engineering, Au- gust 1998. [14] A. Rahman and A. Fan. Three-dimensional integration: analysis, modelling and technology development. MIT Microsystems Technology Labs Research Report, 2000. [15] Russel Beale and Tom Jackson. Neural Computing: an introduction. Institute of Physics Publishing, 1990. [16] L. Zhang, M. G. Robinson, and K. M. Johnson. Optical implementation of a scond-order neural network. Optics Letters, 16(1):45-47, 1991. [17] A. D. McAulay, J. Wang, and X. Xu. Optical perceptron learning for binary classification with spatial light rebroadcasters. Applied Optics, 32(8):1346-1353, 1993. [18] Eugene Hecht. Optics. Addison Wesley, 2002. [19] T. E. Sale. Vertical Cavity Surface Emitting Lasers. Research Studies Press LTD., 1995. 154 [20] Bahaa E. A. Saleh and Malvin C. Teich. Fundamentals of Photonics. John Wiley and Sons, Inc., 1991. [21] Graham Saxby. PracticalHolography. Prentice Hall, 1988. [22] Charles Kittel. Introduction to Solid-State Physics. Wiley, 7th edition, 1996. [23] L. Solymar and D. J. Cooke. Volume Holography and Volume Gratings. Academic Press, 1981. [24] Milos Komarcevic. Production of holographic optical interconnection elements. Master's of Engineering, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, September 2000. [25] Marta Ruiz Llata. Final report September - November 2002. Final report of work done at MIT in the Fall 2002. [26] Joseph F. Ahadian. Development of a Monolithic Very Large Scale Integration Optoelectronic Integrated Circuit Technology. PhD thesis, Massachusetts Institute of Technology, Department of Electrical Engineering and Computer Science, February 2000. [27] Omar Wing. Gallium Arsenide Digital Circuits. McGraw-Hill Publishing Company, 1990. [28 Paul R. Gray, Paul J. Hurst, Stephen H. Lewis, and Robert G. Meyer. Analysis and Design of Analog Integrated Circuits. Wiley, 4th edition, 2001. [29] R. Li, J. D. Schaub, S. M. Csutak, and J. C. Campbell. A high-speed monolithic silicon photoreceiver fabricated on SOI. IEEE Photonics Technology Letters, 12(8):1046-1048, 2000. [30] T. K. Woodward and A. V. Krishnamoorthy. 1 Gbit/s integrated optical detectors and receivers in commercial CMOS technologies. IEEE Journal of Selected Topics in Quantum Electronics, 5(2):146-156, 1999. 155 [31] M. Ingels and M. S. J. Steyaert. A 1 Gb/s O.7-pm CMOS optical receiver with full rail-to-rail output swing. IEEE Journal of Solid-State Circuits, 34(7):971-977, 1999. [32] Roger T. Howe and Charles G. Sodini. Microelectronics: an integrated approach. Prentice Hall, 1997. [33] Peregrine Semiconductor, San Diego, CA. Peregrine UTSi 0. 5um Design Manual. [34] Ben G. Streetman. Solid-State Electronic Devices. Prentice Hall, 4th edition, 1995. [35] H. Yamamoto, K. Taniguchi, and C. Hamaguchi. High-sensitiviy SOI MOS photodetector with self-amplification. Japanese Journal of Applied Physics Part 1, 35(2):1382-1386, 1996. [36] W. Zhang, M. Chan, S. K. H. Fung, and P. K. Ko. Performance of a CMOS compatible lateral bipolar photodetecor on SOI substrate. IEEE Electron Device Letters, 19(11):435-437, 1998. 156