512 IEEE JOURNAL OF SELECTED TOPICS IN QUANTUM ELECTRONICS, VOL. 9, NO. 2, MARCH/APRIL 2003 An Optical Centralized Shared-Bus Architecture Demonstrator for Microprocessor-to-Memory Interconnects Xuliang Han, Gicherl Kim, G. Jack Lipovski, Fellow, IEEE, and Ray T. Chen, Senior Member, IEEE Abstract—An architecture demonstrator of an innovative interconnect scheme called the optical centralized shared-bus is presented in this paper. This architecture retains the advantages of shared-bus topology while at the same time specifying a uniform interface between the electrical and the optical backplane layers in contrast to other proposed architectures. For the first time, a fanout equalized optical backplane bus is demonstrated. In this architecture demonstrator, the data paths required for the microprocessor-to-memory interconnects are provided by the optical centralized shared-bus. The optoelectronic interface modules are optimized to support data rates up to 1.25 Gb/s. The objective of this microprocessor-to-memory interconnects demonstration is to ensure the feasibility of applying this innovative architecture in real systems. Index Terms—Optical backplane bus, optical interconnect, optoelectronic interface, vertical cavity surface emitting laser (VCSEL). I. INTRODUCTION I NTERCONNECT is becoming an even more dominant factor in modern computation systems [1]. As the underlying implementation technology, however, electrical interconnection faces numerous challenges, such as power consumption, signal integrity, and electromagnetic interference. The employment of optical interconnects will be one of the major alternatives for upgrading the interconnect performance [2]. Machine-to-machine interconnection has already been significantly improved by utilizing optical means. The major research thrusts in optical interconnects are in the backplane and board levels where the physical limitations of electrical interconnects are imposing a prominent bottleneck. Shared-bus architecture is a preferred interconnect scheme because its broadcast nature can be effectively utilized to reduce latency, to lessen the complexity of interconnection network management, and to support cache coherence in a multiprocessor. However, the physical length, the number of fanouts, and the speed of the shared-bus are significantly limited by the underlying electrical interconnects. Optics has been widely agreed as a better alternative as an interconnect technology. Great research efforts have been dedicated to the design of innovative Manuscript received October 31, 2002; revised February 6, 2003. This work was supported in part by BMDO, in part by DARPA, in part by ONR, in part by AFOSR, and in part by the ATP program of the State of Texas. The authors are with the Microelectronic Research Center, Department of Electrical and Computer Engineering, University of Texas, Austin, TX 78758 USA (e-mail: raychen@uts.cc.utexas.edu). Digital Object Identifier 10.1109/JSTQE.2003.813311 optical shared-bus architectures [3], [4]. These previously reported optical shared-bus architectures, however, are difficult to implement in practice due to the large variation among the optical signal fanouts that dramatically increases the complexity of the optoelectronic interface modules. The difficulty in equalizing the fanouts is mainly due to the requirement to support the bidirectionality of signal flows on the backplane bus. In our previous report [5], a new method for rebroadcast signals in an optical backplane bus system was described. Uniform optical signal fanouts become possible by using this method. Upon this method, we present in this paper an innovative architecture called the optical centralized shared-bus, which is, to the best our knowledge, the first fanout equalized optical backplane bus. As shown in Fig. 1, the electrical bus provides interconnects for the noncritical paths, whereas the optical bus for the critical paths. To implement optical interconnects, volume holographic gratings are used for coupling optical signals into and out of the optical wave-guiding plate within which optical signals propagate as substrate-guided waves [6]. The details of this innovative architecture, especially the feature of uniform optical signal fanouts, are described in Section II. In Section III, an optical centralized shared-bus architecture demonstrator for microprocessor-to-memory interconnects is presented to ensure the feasibility of this proposed idea. Finally, a summary is given in Section IV. II. OPTICAL CENTERALIZED SHARED-BUS ARCHITECTURE Fig. 1 illustrates the architectural concept of the optical centralized shared-bus. The electrical backplane layer provides interconnects for the noncritical paths. The daughter board that is inserted into the central backplane connector plays a pivotal role in this architecture and is referred to as distributor in this paper. The optoelectronic interface modules, including vertical cavity surface emitting laser (VCSELs) and photodetectors, are integrated on the backside of the electrical backplane and aligned with the underlying optical backplane layer. Different from the other modules, the positions of the VCSEL and the photodetector in the central module are swapped. The optical backplane layer consists of an optical wave-guiding plate with the properly designed volume holographic gratings integrated on its top surface. Underlying the distributor is a double-grating hologram, and the others are single-grating holograms. In this way, this architecture provides the bidirectionality of signal flows on the backplane bus. 1077-260X/03$17.00 © 2003 IEEE HAN et al.: OPTICAL CENTRALIZED SHARED-BUS ARCHITECTURE DEMONSTRATOR Fig. 1. 513 Optical centralized shared-bus architecture. In this architecture, there are two optical paths for each signal line. One is for the source daughter board to deliver the signal to the distributor, and the other one for the distributor to broadcast the signal to all the daughter boards on the backplane bus. A complete data transfer from a daughter board to: 1) another daughter board (point-to-point); 2) more than one daughter board (multicast); and 3) all the daughter boards on the backplane bus (broadcast), generally involves two processes, which are single-hop delivery from the source daughter board to the distributor and then broadcast from the distributor. This explains the reason we name this architecture optical centralized shared-bus. First, the VCSEL of the source daughter board projects the light surface normally on its underlying holographic grating. This signal is coupled into the optical wave-guiding plate and propagates within the plate under the total internal reflection (TIR) condition [6]. Then, this signal is surface normally coupled out of the plate by the central double-grating hologram and detected by the receiver of the distributor. Second, the distributor regenerates the same optical signal and projects it surface normally on its underlying double-grating hologram. This signal is diffracted into two beams and coupled into the optical wave-guiding plate, propagating along the two opposite directions within the plate under the TIR condition. During the propagation, a portion of the light is surface normally coupled out of the plate by a daughter board’s underlying holographic grating and detected by its receiver. This daughter board takes appropriate actions on the received data. If the distributor is the data source, the first process will not happen. If the distributor is the only data destination, the second process is not necessary. The protocol governing the bus should be able to tolerate the transmission delay due to the difference in the distance. Furthermore, in conformity with a specific global topology, a hierarchical interconnection network [7] can be constructed by using the optical centralized shared-bus as the building block and the distributor as the socket. The most attractive feature of this architecture is to achieve uniform optical signal fanouts. Assuming that the VCSELs of all the optoelectronic interface modules emit the same optical power, uniform fanouts mean that: 1) the power of signals delivered from any daughter board to the distributor is same; 2) the power of signals broadcast from the distributor to all the daughter boards on the backplane bus is same; and 3) the power of signals broadcast from the distributor equals that delivered from any daughter board to the distributor. Thus, this architecture specifies a uniform interface between the electrical and the optical backplane layers in contrast to other proposed architectures, e.g., [3] and [4]. On the optical centralized shared-bus, the fanouts are equalized by specifying the diffraction efficiency of the volume holographic gratings [5]. Because of the symmetric configuration, it is obvious that the two multiplexed gratings inside the central hologram should have the same diffraction efficiency. The analysis in [5] shows that the fanouts are equalized if the following iterative equation is satisfied: (1) where represents the diffraction efficiency of the th singlegrating hologram counted from the central double-grating hologram. In the case of uniform fanouts, the fanout coefficient, which is defined as the ratio of the fanout power to the effective VCSEL fanin power, is obviously equal to the reciprocal of the total number of the daughter boards (not including the distributor) on the backplane bus. Thus, the fanout capacity, i.e., the 514 IEEE JOURNAL OF SELECTED TOPICS IN QUANTUM ELECTRONICS, VOL. 9, NO. 2, MARCH/APRIL 2003 Fig. 3. Fig. 2. Microprocessor-to-memory interconnects demonstration. maximum number of daughter boards that one optical centralized shared-bus can accommodate, can be calculated if the bit error rate requirement, marginal power penalty, and the parameters of the optoelectronic interface modules, such as VCSEL emission power and photodetector sensitivity, are specified. III. MICROPROCESSOR-TO-MEMORY INTERCONNECTS DEMONSTATION As shown in Fig. 2(a) and (b), this architecture demonstrator consists of a microprocessor board (CPU12, Motorola Inc.), four external memory (8KB SRAM) boards, an electrical backplane bus, and an optical centralized shared-bus. The microprocessor board is inserted into the central backplane connector as the distributor, and the memory boards, L1, L2, R1, and R2, are inserted into their corresponding slots. Fig. 3 is the conceptual connectivity block diagram of this system. For simplicity, only one memory board is illustrated in this diagram. As shown in Fig. 3, the optical centralized shared-bus provides the data paths for the microprocessor-to-memory interconnects. Although the system clock is only 16 MHz, which is limited by the capacity of the microprocessor (CPU12), this architecture demonstrator is sufficient for our proof-of-concept objective. A. Volume Holographic Gratings The volume holographic gratings specified by the optical centralized shared-bus architecture were recorded in Connectivity block diagram of the architectural demonstrator. dry photopolymer films (HRF-600X014-20, DuPont). The photopolymer-based volume hologram is an attractive candidate for making high-efficiency gratings. The advantages of this material over other types of emulsion, such as dichromated gelatin and silver halides, include dry-processing capability, long shelf life, and good photo-speed [8]. The maximum permissible dose of the incident light is beyond 100 JW/cm . The two-beam interference method was used to form the gratings in the dry photopolymer films. This material consists of monomers, polymeric binders, and photoinitiators. The monomers are polymerized when exposed to the light of specific wavelength, and the refractive index of the film is determined by the polymer concentration. While being exposed to an interference pattern, there are more monomers being polymerized in the bright regions than in the dark regions. This nonuniform illumination sets up monomer concentration gradients, driving the monomers to diffuse from the dark regions to the neighborhood bright regions. A final uniform illumination is required to polymerize the remaining monomers and stabilize the spatial distribution of the polymer concentration, which conforms to the original illumination pattern. Thus, a grating structure is formed inside the film. To obtain the double-grating holograms, two sequential exposure steps are required to form the two multiplexed gratings inside the films. Our objectives are to obtain single-grating holograms with accurate diffraction efficiency that satisfies iterative (1), and high-efficiency equal-strength double-grating holograms. The quality of these volume holographic gratings directly affects the fanout variation and capacity, therefore, are pivotal to the implementation of the optical centralized shared-bus architecture. The recording schedules were developed by using the method described in [9]. Following these recording schedules, we were able to control the accuracy of the diffraction efficiency within 2% and obtained 47%/47% double-grating holograms. Fig. 4 demonstrates the uniform optical signal fanouts in the architecture demonstrator we built. In conformity with the specification of the optical centralized shared-bus architecture, a 47%/47% double-grating hologram is integrated in the middle of the optical wave-guiding plate. The other two are single-grating holograms with 50% diffraction efficiency. The HAN et al.: OPTICAL CENTRALIZED SHARED-BUS ARCHITECTURE DEMONSTRATOR 515 Fig. 5. Onboard high-speed performance test and the eye pattern at a data rate of 1.25 Gb/s. Fig. 4. Demonstration of uniform optical signal fanouts in the architecture demonstrator. C. Transmitter/Receiver Protocol diffraction angles within the plate are 45 , which satisfies the TIR condition. The two 22.5 bevels at both ends are coated with aluminum, providing nearly 100% reflection efficiency. An 850-nm VCSEL source was used to obtain the result as shown in Fig. 4. The input optical power from the VCSEL was 2 mW, and the fanouts were, from left to right, 0.404, 0.4.06, 0.400, and 0.396 mW, respectively. A two-dimensional (2-D) multibus line configuration [10] can be implemented by replacing the optical source with a 2-D VCSEL array in this architecture demonstrator. B. Optoelectronic Interface Modules The optoelectronic interface modules, consisting of transmitters and receivers, implement electrical-to-optical and optical-to-electrical conversions. As discussed earlier, the optical centralized shared-bus architecture specifies a uniform interface between the electrical and the optical backplane layer. A transmitter module consists of an 850–nm VCSEL and a laser driver that accepts differential PECL inputs and provides complementary modulation currents. A receiver module consists of a photodetector/transimpedance amplifier and a postamplifier that accepts a wide range of voltages (10–1200 mV) while providing a constant-level (PECL) output voltage. The PCB layout design was optimized to support data rates up to 1.25 Gb/s. Eye patterns were measured to characterize the high-speed performance of the optoelectronic interface modules in this architecture demonstrator, as shown in Fig. 5. The pseudorandom bit sequence (PRBS) from a pulse generator (HP8183A) was used as the input to a laser driver in an interface module. The output optical signal was transferred through the optical centralized shared-bus and then detected by a receiver in another interface module. Along with the trigger signal from the pulse generator, the output signal from the receiver was fed into a digital communication analyzer (HP83480A) to display eye patterns. The eye pattern at a data rate of 1.25 Gpbs is shown in Fig. 5. This result verifies that these interface modules are capable of supporting data rates up to 1.25 Gb/s, thus providing a sufficient headroom for implementing more powerful computation systems. To efficiently utilize the large capacity of the optical links in this architecture demonstrator, serial data transfer was performed between the transmitter and the receiver ends. Clock forwarding method, i.e., using the same clock signal at both the transmitter and the receiver ends, was utilized to implement this serial data transfer. Without involving complicated clock recovery circuits, this method reduces the complexity of the system. In order to synchronize the serial data with the clock signal, a special transmitter/receiver protocol was designed. At the idle state, the data links keep logic level low [11]. For a serial data transfer, the serializer at the transmitter end attaches a logic-high bit in front of the actual data bits, signifying the beginning timing of the data. Thus, the data pattern presented on the data links is a logic-high bit followed by the actual data payload. When received at the receiver end, this logic-high bit synchronizes the trailing data bits with the clock signal. Therefore, the correct deserializaion can be performed at the receiver end. In this architecture demonstrator, this protocol was carried by hardware, thus, transparent to the upper programming level. D. Optical Connectivity Verification To verify the optical connectivity required by the microprocessor-to-memory interconnects, we ran this architecture demonstrator under an infinite loop operation. Inside this loop, a memory address was issued to select one memory board, and then a data transfer (write and read back) was performed between the microprocessor and the selected memory board. To visualize this data transfer, the data patterns on the optical links were displayed on the screen of an oscilloscope, as shown in Fig. 6(a) and (b). In these tests, the data patterns were measured through the monitor ports, as shown in Fig. 2, which indicates the modulation current of their corresponding VCSELs. Fig. 6(a) shows the result when data “0 4C” was transferred between the microprocessor and memory board L1. Along with the starting logic-high bit, the serial data pattern on the optical link should be “100 110 010” in the time-increasing order. The correct patterns were observed as shown in Fig. 6(a), where channels 1 and 2 show, respectively, the data pattern measured at monitor port L1 and C. Fig. 6(b) shows the result when data “0 44” was transferred between 516 IEEE JOURNAL OF SELECTED TOPICS IN QUANTUM ELECTRONICS, VOL. 9, NO. 2, MARCH/APRIL 2003 ACKNOWLEDGMENT The authors thank BMDO, DARPA, ONR, AFOSR, and the ATP program of the State of Texas for supporting this paper. REFERENCES Fig. 6. Optical connectivity verification between the microprocessor and memory boards (a) L1 and (b) R2. the microprocessor and memory board R2. The correct patterns were also observed as shown in Fig. 6(b), where channels 1 and 2 show, respectively, the data pattern measured at monitor port R2 and C. Similar tests were conducted on the optical links between the microprocessor and the other memory boards on the backplane bus, and the correct patterns were also observed. These results verify the optical connectivity required for the microprocessor- to-memory interconnects. IV. CONCLUSION For the first time, we designed and implemented a fanout equalized optical backplane bus. This innovative architecture, called the optical centralized shared-bus, retains the advantages of shared-bus topology, while at the same time, specifying a uniform interface between the electrical and the optical backplane layers in contrast to other proposed architectures. The objective of this microprocessor-to-memory interconnects demonstration is to ensure the feasibility of applying this innovative architecture in real systems. The overall performance of this architecture demonstrator is limited by the microprocessor (CPU12) we used. On the other hand, the optoelectronic interface modules in this architecture demonstrator are able to support data rates up to 1.25 Gb/s, thus providing a sufficient headroom for implementing more powerful computing systems. With the rapid developments in active optoelectronic devices, much higher data rate, e.g., 10 Gb/s, may be considered in the system design. Furthermore, the optical centralized shared-bus architecture is of general applicability and can be applied in scenarios other than the microprocessor-to-memory interconnects demonstrated herein, e.g., centralized shared-memory multiprocessors. [1] W. J. Dally, “Computer architecture is all about interconnect,” in Proc. 8th Int. Symp. High-Performance Comput. Architecture, Cambridge, MA, Feb. 2002. [2] M. R. Feldman, S. C. Esener, C. C. Guest, and S. H. Lee, “Comparison between optical and electrical interconnects based on power and speed characteristics,” Appl. Opt., vol. 27, pp. 1742–1751, 1988. [3] J. Yeh, R. K. Kostuk, and K. Tu, “Hybrid free-space optical bus system for board-to-board interconnections,” Appl. Opt., vol. 35, pp. 6354–6364, 1996. [4] S. Natarajan, C. Zhao, and R. T. Chen, “Bi-directional optical backplane bus for general purpose multi-processor board-to-board optoelectronic interconnects,” IEEE J. Lightwave Technol., vol. 13, pp. 1031–1040, June 1995. [5] G. Kim, X. Han, and R. T. Chen, “A method for rebroadcasting signals in an optical backplane bus system,” IEEE J. Lightwave Technol., vol. 19, pp. 959–965, July 2001. [6] K. Brenner and F. Sauer, “Diffractive-reflective optical interconnects,” Appl. Opt., vol. 27, pp. 4251–4254, 1988. [7] T. M. Pinkston, “Design considerations for optical interconnects in parallel computers,” in Proc. 1st Int. Workshop Massively Parallel Processing Using Optical Interconnections, Cancun, Mexico, 1994, pp. 306–322. [8] W. J. Gambogi, A. M. Weber, and T. J. Trout, “Advances and applications of DuPont holographic photopolymers,” in Proc. SPIE, vol. 2043, 1994, pp. 2–13. [9] X. Han, G. Kim, and R. T. Chen, “Accurate diffraction efficiency control for multiplexed volume holographic gratings,” Opt. Eng., vol. 41, pp. 2799–2802, 2002. [10] G. Kim, X. Han, and R. T. Chen, “Crosstalk and interconnection distance considerations for board-to-board optical interconnects using 2-D VCSEL and microlens array,” IEEE Photon. Technol. Lett., vol. 12, pp. 743–745, June 2000. [11] R. T. Chen, “VME optical backplane bus for high performance computer,” J. Optoelectron. Devices Technol., vol. 9, pp. 81–94, 1994. Xuliang Han received the B.S. degree in electronic engineering from Tsinghua University, Beijing, China, in 1999 and the M.S.E degree in electrical and computer engineering, in 2001, from the University of Texas, Austin, where he is currently pursuing the Ph.D. degree in electrical and computer engineering . His research interests include massively parallel processing using optical interconnects, optical centralized shared-bus, high-speed modular optoelectronic transceivers, guided-wave optics, and holography. Gicherl Kim received the B.S. and M.S. degrees in physics from Inha University, Incheon, Korea, in 1989 and 1991, respectively, and the Ph.D. degree in electrical and computer engineering from the University of Texas, Austin, in 2000. He was a Researcher in the Agency for Defense Development, Korea from 1991 to 1997. He is currently a Research Engineer and Project Manager with Omega Optics, Inc., Austin, TX. His current research topics are mainly devoted to design and fabrication of optical board and backplane, and the advanced architectural design of optical interconnects and networks. Recently, his research works have also covered the design, fabrication, and packaging of modular optical motherboard bus, based on waveguides and substrate-mode interconnects using passive and active photonic devices. His interest in the optical data transmission topologies includes point-to-point, multidrop, multipoint, and switch fabric optical communication from the intercomputer to the intracomputer levels. HAN et al.: OPTICAL CENTRALIZED SHARED-BUS ARCHITECTURE DEMONSTRATOR G. Jack Lipovski (S’63–M’66–SM’82–F’94) is a Professor in the Department of Electrical and Computer Engineering, University of Texas, Austin, where he has taught since 1976. Previously, he taught electrical engineering and computer science at the University of Florida from 1969 to 1976. In 1988–1989, he held the Grace Hopper Chair of Computer Science, Naval Postgraduate School, Monterey, CA. He designed and built the pioneering database computer, CASSM, and parallel computer, TRAC. He has published more than 70 papers and has authored or coauthored six books and edited three books. He has consulted for several companies, including the Microelectronics and Computer Corporation. He currently studies parallel, database, and artificial intelligence computer architectures and microcomputers. Prof. Lipovski chaired the IEEE Computer Society Technical Committee on Computer Architecture and the Computer Architecture Committee of the Association for Computer Machinery. He is also a Computer Society Governing Board Member, the Euromicro Director, IEEE MICRO Editor, and the Area Editor of the Journal of Parallel and Distributed Computing. 517 Ray T. Chen (M’91–SM’98) is the Temple Foundation Endowed Professor in the Department of Electrical and Computer Engineering, University of Texas, Austin. His research group has been awarded more than 60 research grants and contracts from such sponsors as the Department of Defense, the National Science Foundation, the Department of Energy, NASA, the State of Texas, and private industry. The optical interconnects research group at UT Austin has reported its research in more than 250 published papers. Currently, there are 20 Ph.D. degree students and seven postdoctors working in his group. He has served as a Consultant for various federal agencies and private companies and delivered numerous invited talks to professional societies. His research topics include guided-wave and free-space optical interconnects, polymer-based integrated optics, a polymer waveguide amplifier, graded index polymer waveguide lenses, active optical backplane, traveling-wave electrooptic polymer waveguide modulator, optical control of phased-array antenna, GaAs all-optical crossbar switch, holographic lithography, and holographic optical elements. Prof. Chen has chaired or been a Program Committee Member for more than 40 domestic and international conferences organized by SPIE, OSA, IEE, and PSC. He is a Fellow of SPIE and the Optical Society of America and a Member of PSC.