-4 I, Modular High-Speed Signaling for Networks in HPC Systems Using COTS Components by Peggy B. Chen Submitted to the Department of Electrical Engineering and Computer Science in Partial Fulfillment of the Requirements for the Degrees of Bachelor of Science in Electrical Engineering and Computer Science and Master of Engineering in Electrical Engineering and Computer Science at the Massachusetts Institute of Technology May 9, 2000 @ Copyright 2000 Peggy B. Chen. Al rights reserved. The author hereby grants M.I.T. permission to reproduce and distribute publicly paper and electronic copies of this thesis and to grant others the right to do so. Author __ Peggy B. Chen ent of Electrical En peering and Co puter Science Certified by ThesisgSUpervisor, MII Artifici Thomas Knight Intelligence Lab Accepted by Arthur C. Smith Chairman, Department Committee on Graduate Theses MASSACHUSETTS INSTITUTE OF TECHNOLOGY JUL 2 7 2000 I IRRARIFRq MODULAR HIGH-SPEED SIGNALING FOR NETWORKS IN HPC SYSTEMS USING COTS COMPONENTS by Peggy B. Chen Submitted to the Department of Electrical Engineering and Computer Science May 9, 2000 in Partial Fulfillment of the Requirements for the Degree of Bachelor of Science in Electrical Engineering and Computer Science and Master of Engineering in Electrical Engineering and Computer Science ABSTRACT Increasingly powerful applications summon the need for larger machines to support their large memory and computation requirements. A key component in designing a high performance computing (HPC) system is providing the high bandwidth, low latency interprocessor communication network necessary for efficient and reliable multiprocessor operation. The design process begins with an evaluation of technological trends, advancements and practical limitations. The goal is to develop a low cost architecture that is scalable and can be produced in a short design cycle. The Flexible Integrated Network Interface (FINI) provides the physical layer and link layer medium that controls movement of data between HPC processor nodes. The FINI is designed using commodity-off-the-shelf (COTS) components and high-speed signaling techniques that enable data to be transferred over transmission lines up to 10 meters at a maximum rate of 5.38 Gpbs. TABLE OF CONTENTS .....- 8 Chapter 1. Introduction.......................................................................................... . . 1.1 Goals.....................................................................................................----..... . 1.2 Scope ........................................................................................... -- -- -- -9 . ------.------...............-----............... 10 ........ 1.3 O rganization .................................................................................... -............ 8 Chapter 2. High Performance Computing (HPC) Systems.................................................11 2.1 Re-Engineering an HPC System.......................................................................11 2.2 ARIES HPC System.............................................................................-..12 2.2 METRO: Multipath Enhanced Transit Router Organization............................................14 ..- 15 ... 2.3 Design Requirements....................................................................................... 2.2.1 Scalability....................................................................................................... 16 --............ 2.2.2 Programmability................................................................................. .... .........-------................ 16 .... 2.2.3 R eliability................................................................................... .. . --------............. 2.2.4 Efficiency ........................................................................................... --------.. . . . 19 ------................... 21 .... 3.1 Design M otivations......................................................................................... .... ...... ...23 3.3 Duplicated Architecture............................................................................... 3.4 Design Goals................................................................................- .. 17 19 Chapter 3. The Flexible Integrated Network Interface ....................................................... 3.2 Architecture .......................................................................---- 16 ...... ---------.............. 24 3.4.1 Technology Trends ................................................................................. ...25 3.4.2 Component Choices .................................................................................. ...25 .... 3.4.3 Design Trends ........................................................................................... 3.4.4 FINI Challenges ....................................................................... ....... 26 26 Chapter 4. High-Speed Interconnect........................................................................-.... 27 4.1 High-Speed Signaling Problem ............................................................ 27 4.2 Transmission Lines....................................................................... ....... 28 4.2.1 Definition of High-Speed Signal................................................................28 4.2.2 Transmission Line Models.............................................................30 4.2.3 Impedance Discontinuities..................................................................32 4.2.4 Skin Effects............................................................ ............. .........-----------...................... 34 4.2.5 Termination Techniques......................................................................36 4.3 Differential Signaling .............................................................................. 4.3.1 Differential Signal Basics .......................................... 2 36 36 4.3.1 LVDS ............................................................................................................................. 39 4.4 Transmission M edia .............................................................................................................. 39 4.4.1 Transmitter/Receiver Pair ..............................................................................................40 4.4.2 Cable .............................................................................................................................. 40 4.4.3 Connector ....................................................................................................................... 42 Chapter 5. Com ponent Specification ..........................................................................................43 5.1 LVDS Transmitter and Receiver ..........................................................................................43 5.1.1 DS90C387/DS90CF388 .................................................................................................44 5.1.2 Decision Factors .............................................................................................................. 9 5.1.3 Functionality of DS9OC387/DS90CF388 ......................................................................45 5.2 Processors ............................................................................................................................. 46 5.2.1 Virtex XCV I OO-6PQ240C .............................................................................................46 5.2.2 Decision Factor .............................................................................................................. 47 5.2.3 Programming the XCV 100-6PQ240C ...........................................................................47 5.2.2 FPGA-I (M aster) ...........................................................................................................49 5.2.3 FPGA-2 (Slave) .............................................................................................................. 50 5.2.4 FPGA-3 (Slave) .............................................................................................................. 50 5.2.5 FPGA-4 (Slave) ..............................................................................................................50 5.3 M em ory .................................................................................................................................51 5.3.1 M T55L256L32P .............................................................................................................51 5.3.2 Decision Factors .............................................................................................................51 5.4 Clock .....................................................................................................................................52 5.4.3 Functionality of AD9851 ...............................................................................................52 5.4.2 Decision Factors .............................................................................................................53 5.4.3 Programming the AD9851 .............................................................................................54 5.5 Power Supply ........................................................................................................................56 5.5.1 LT1374 ...........................................................................................................................56 5.5.1 Decision Factors .............................................................................................................56 5.6 Cables and Connectors .........................................................................................................57 5.6.1 LVDS Cables .................................................................................................................57 5.6.2 Double-Decker Connector .............................................................................................58 5.6 Test Points ............................................................................................................................58 Chapter 6. Board Design ..............................................................................................................60 6.1 Schematic Capture ................................................................................................................60 6.1.1 Com ponent Library.................................................................................................... 61 6.1.2 Schem atic Sheets............................................................................................................62 6.1.3 Schem atic V erification.............................................................................................. 62 6.1.4 Bill of M aterials.............................................................................................................63 6.2 Board Layout ............................................................................................................ 6.2.1 Footprint Library .. ................................................... 63 .............................................. 64 6.2.2 Component Placement ...................................................................... 64 ( 6.2.3 PCB Board Stack (Layers)......................................................................................... 67 6.2.4 Routing...........................................................................................................................69 6.3 Board Fabrication ................................................................................................................. 72 6.4 PCB M aterial and Package Properties............................................................................. 72 6.5 Board Assem bly....................................................................................................................73 Chapter 7. A pplications ............................................................................................................ 75 7.1 Capability Tests....................................................................................................................76 7.1.1 Latency Test 7.1.2 Round Trip Test. ............................................................................................................. 76 ................................................................................................. 77 7.1.3 Cable Integrity Test ............................................................................................... 78 7.2 N etw ork Design.................................................................................................................... 7.2.1 Routers M u tita............eN 78 .................................................................................... 79 7.2.1 M ultipath, M ultistage Networks .................................................................................... 80 Chapter 8. Conclusion..................................................................................................................83 8.1 FIN I Architecture ......... ...................................................................................... 8.1.1 Netw ork Interface Architecture............................................... 83 ....................................... 8.1.2 Basic Router Architecture.............................................................................. ....... 8.2 H igh-Speed Interconnect ........................................................................................ 84 84 .... 85 8.2.1 Transm itter/R eceiver............................................................................................... 85 8.2.2 Cables and Connectors ........................................................................ 87 8.3 FPGA s .................................................................................................................................. 87 8.4 Continual Evaluation ............................................................................................................ 88 Appendix A. FINI Board Schematics..................................................................................... 89 Appendix B. FINI Bill of Materials (BOM).............................................................................101 A ppendix C. FIN I Board Layout..............................................................................................104 Bibliography ............................................................................................................................... 4 108 TABLE OF TABLES Table 1. Dielectric Constants of Some Common Materials [Suth99].......................................29 Table 2. Skin-Effect Fequencies for Various Conductors.........................................................35 Table 3 . Pre-emphasis DC Voltage Level with RPRE [National99] ......................................... 45 Table 4. Pre-emphasis Needed per Cable Length [National99]................................................46 Table 5. SelectMAP Mode Configuration Code [Virtex99]..................................................... 48 Table 6. AD9851 8-Bit Parallel-Load Data/Control Word Functional Assignment [Analog99] ...54 Table 7. Resistor Values for Resistor Divider in Power Supply Circuitry................................56 Table 8. Electrical Characteristics of Cable [Madison99].........................................................58 Table 9. Designator K ey.................................................................................................................63 Table 10. CommercialTransceivers........................................................................................... 5 86 TABLE OF FIGURES Figure 1. ARIES HPC System Diagram - Scalabe Multirack Solution...................13 Figure 2. HPC System Processor Node Diagram...........................................................................14 Figure 3. Die Shot of K7 Athlon Processor.................................................................................17 Figure 4. FPS and FINI Connection Diagram............................................................................21 Figure 5. Flexible Integrated Network Interface Block Diagram...............................................22 Figure 6. FINI Block Diagram with Duplicated Architecture........................................................24 Figure 7. First- and Second-Order Models of Transmission Lines ............................................ 31 Figure 8. High-Speed Region on Network Interface................................................................ 33 Figure 9. Twisted-Pair Dimensions........................................................................................... 33 Figure 10. Microstrip Dimensions..............................................................................................34 Figure 11. Skin Effect in AWG24 Copper Versus Frequency ................................................... 35 Figure 12. Differential Signals ............................................................................................... ..37 Figure 13. Differential Skew ................................................................................................ ... 37 Figure 14. Effect of Skew on Receiver Input Waveform.......................................................... 38 Figure 15. LVDS Point-to-Point Configuration with Termination Resistor .............................. 39 Figure 16. Twisted-Pair Cable.............................................................................................41 Figure 17. Diagram of Parallel Programming FPGAs in SelectMAP Mode [Xilinx00]....... 48 Figure 18. Basic DDS Block Diagram and Signal Flow of AD9851 [Analog99]............53 Figure 19. High-Speed Signal Probe Setup................................................................................59 Figure 20. Top View of FINI Board.................................................................................... 66 Figure 21. Bottom View of FINI Board ......................................................................................... 67 Figure 22. FINI Board Stack Diagram.......................................................................................68 Figure 23. Split Planes on the Internal +V33 Power Plane ....................................................... 69 Figure 24. Diagram of Latency Test...............................................................................................76 Figure 25. Diagram of Round Trip Test.................................................................................77 Figure 26. Diagram of Cable Integrity Test ................................................................................... 78 Figure 27. Diagram of HPC System (Processor Nodes, FINI, and Network)..............79 Figure 28. Dilation-1, Radix-2 Router ........................................................................ ....... 80 Figure 29. Dilation-2, Radix-2 Router ...................................................................................... 80 Figure 30. 8 x 8 Multipath, Multistage Network....................................................................... 81 Figure 31. Basic Router Building Block Architecture .............................................................. 85 Figure 32. FINI Board - All Layers ...................................................................................... 6 104 Figure 33. FLNI Board - Layer 1 1H ......................................................................................... 105 Figure 34. FIN I Board - Layer 2 (2V) ......................................................................................... 105 Figure 35. FIN I Board - Layer 3 (3H) ......................................................................................... 106 Figure 36. FIN I Board - Layer 4 (4V).........................................................................................106 Figure 37. FIN I Board - Ground Plane ........................................................................................ Figure 38. FIN I Board - +3.3V Plane..........................................................................................107 7 107 CHAPTER 1 INTRODUCTION Increasingly powerful applications summon the need for larger machines to support their large memory and computation requirements. Modem microprocessors have paved the way for largescale multiprocessor systems, but the challenge remains to effectively utilize their potential performance to run not only today's range of applications, but future applications as well. One key component in designing such a high performance computing (HPC) system is providing the high-performance and reliable interprocessor communication network necessary to allow efficient multiprocessor operation. 1.1 GOALS The goals of an interprocessor network for an HPC system are: " High bandwidth " Low latency 8 J " High reliability (signal integrity) " Scalability " Reasonable cost " Practical implementation High bandwidth, low latency and good physical design are the key properties of a reliable highspeed signaling network in HPC systems. Practicality and cost constraints, important in today's implementation-oriented environment, are addressed through the use of commodity-off-the-shelf (COTS) components. The network topology and protocols must be formulated in a scalable manner. For example, a small, simple crossbar seems easy to replicate, but distributing a large function across an array of small cross bars would incur 0(n) switching latency and require 0(n2) such switches [DeHon93]. Ease of replication does not translate to scalability. 1.2 SCOPE This work only attempts to address issues directly related to designing a modular high-speed signaling network for HPC systems using COTS components. Attention is paid to industrial design practices and trends and how these techniques can be utilized to provide an efficient and reliable interface between processing nodes and the network. The design detailed here may not be the ideal solution, but it is the most practical and realistic solution at the time of design due to component availability and/or cost constraints. It provides a foundation for implementing and testing higher-level protocols and API issues involved in network research and design. The design will highlight the difficulties and limitations of implementation. Results detailed here may be used as a stepping-stone for future network interface designs in HPC systems. 9 1.3 ORGANIZATION Before discussing the strategy used to design the high-speed signaling network, Chapter 2 provides a background on HPC systems. Chapter 3 then describes the architectural design of the interprocessor communication network called the Flexible Integrated Network Interface (FINI). Chapter 4 continues with a discussion of design issues specific to high-speed digital design. Chapter 5 discusses the components used to develop the FINI interface card and Chapter 6 examines issues related to printed circuit board (PCB) design. Chapter 7 highlights some applications the FINI board may be used for and Chapter 8 concludes with an analysis on the FINI design and suggestions for future work. 10 CHAPTER2 HIGH PERFORMANCE COMPUTING (HPC) SYSTEMS High performance computing (HPC) systems are large-scale multiprocessor systems intended for computation and communication intensive purposes. They should have hardware capable of supporting parallel programming constructs supplemented by a good API. HPC systems should also be able to support thousands of processors, gigabytes of data, a complex interprocessor communication network, and any interaction between software drivers, compilers and the hardware itself. This chapter gives an overview of the ARIES HPC system. 2.1 RE-ENGINEERING AN HPC SYSTEM The current existence of many high performance computing (HPC) platforms questions the need for another one. Evaluation of popular HPC systems reveals many advances in computer 11 architecture technology, but also reveals a variety of deficiencies. Commercially available HPC platforms suffer in varying degrees from poor programmability, inadequate performance and limited scalability. The ARIES HPC system being designed at the Artificial Intelligence Laboratory at the Massachusetts Institute of Technology is the result of examining computer architecture from all levels of abstraction. Topics of consideration include operating systems and applications, transistor level chip designs, systems engineering and hardware support. With no requirements on backward compatibility, each component of the ARIES HPC system is re-engineered from the basic fundamentals to maximize the potential of the system. In all cases, current industrial practices are examined. Improvements are made where possible while many other elements have been completely redesigned to enhance performance. 2.2 ARIES HPC SYSTEM With multiple racks of interchangeable devices, the ARIES HPC system provides a scalabe physical design for high performance parallel computing as depicted in Figure 1. Each slot can contain either a processor node that consists of a RAID array, PIM processors chip and an I/O processor, or a network interface. The swappable devices are connected via a backplane router. The flexibility of the arrangement of HPC system components permits the implementation of a plurality of high performance parallel computing architectures with a single set of hardware components. 12 "RACK" Interchangeable devices (Processor Netwrk, etc.) EI 7< links to other network boxes K 16 more Ivices on oppos to side dense router backplane Exploded view of a network box "BOX" [ RAID Arry Node Chips + 10 + Routers Exploded view of a processor box Figure 1. ARIES HPC System Diagram - Scalabe Multirack Solution The architecture of the processor node is detailed in Figure 2. Each HPC processor node requires a means to communicate with other processors. The Flexible Integrated Network Interface (FINI) card is the building block of the interprocessor communication network. The FINI card serves as the network interface card for each HPC processor node. In addition, it is also the building block upon which the back plane router is constructed. The FINI is designed on a stand-alone PCB (see Chapters 3-6) providing the flexibility necessary to implement a scalable HPC system. 13 to backplane 48 VDC about 16 MB PIM per chip @ 8 procs local regulators coolant router Ar heat exchanger- route 1/0 and storage chip monitor processor_ 2 GB buffer RAID ring distributed RAID-5 Array (about 60 GB/node) Figure 2. HPC System Processor Node Diagram 2.2 METRO: MULTIPATH ENHANCED TRANSIT ROUTER ORGANIZATION As machine size increases, interprocessor communication latency will increase, as will the probability that some component in the system will fail. In order for the potential benefits of large-scale multiprocessing to be realized, a physical router network and accompanying routing 14 protocol that can simultaneously minimize communication latency while maximizing fault tolerance for large-scale multiprocessors is necessary. The Multipath Enhanced Transit Router Organization (METRO) is a physical layer architecture featuring multipath, multistage networks [DeHon93]. Such a network architecture provides increased bandwidth, improved fault tolerance and the potential to decrease latency while maintaining the scalability of the design. The METRO Routing Protocol (MRP) sits on top of the physical layer and encompasses both the network layer and the data-link layer as described by the ISO OSI Reference Model [DZ83]. The MRP fulfills the role of the data-link layer by controlling the transmission of data packets and the direction of transmission over interconnection lines, and it provides check-sum information to signals when a retransmission is necessary. The MRP also provides dynamic self-routing, thereby performing the functions of the network layer. All together, the MRP provides a reliable, bytestream connection from end-to-end through the routing network [BCDEKMP94]. The flexibility to choose parameters permits customized implementations of the METRO network to suit target applications and available technologies. In the ARIES HPC system, the network will be built out of FINI cards. Simple routing and freedom to optimize the target technology makes METRO an ideal network architecture to meet the needs of our communication intensive multiprocessor system. 2.3 DESIGN REQUIREMENTS The basic design goals of the ARIES HPC are: * Scalability " Programmability 15 " Reliability * Efficiency. 2.2.1 Scalability A versatile HPC system should scale from a few processors to thousands of processors. Scalability is supported by the ARIES HPC system via a modular design, as depicted in Figure 1. Implemented using multiple racks of interchangeable devices, the ARIES HPC system can support anywhere from a few to several thousand processors. The network topology used will be based on multi-butterflies [Upfal89] [LM92] and fat-trees [Lei85]. Multipath, multistage networks are scalable and allow construction of arbitrarily large machines using the same basic network architecture. This makes it an ideal choice for HPC systems. The hardware resources required for fat-tree networks grow linearly with the number of processors supported [DeHon93]. 2.2.2 Programmability Programmability refers to the ease with which all users can program the HPC system to perform the required function. A programming language and compiler that allows programmers to express parallel constructs in a natural manner is necessary. Programmability requires an easy-to-use user interface to minimize programmer time spent producing correct code. 2.2.3 Reliability A reliable HPC system is guarded against potential failures because it is designed with a philosophy that there should be no single point of failure. Adaptive error correcting hardware and novel hardware-supported programming models offer secure and reliable computing. The multipath, multistage network topology provides a basis for fault-tolerant operation along with high bandwidth operation. By providing multiple, redundant paths between each pair of 16 .......... ... Data may get processing nodes, alternative paths exist to route around faulty components. but corrupted, components may fail and external factors may cause the system to malfunction, ensures the protection in the form of backups, check points, self-detection and correction reliability of the ARIES HPC system. 2.2.4 Efficiency of abstraction. In a The efficiency issue within HPC systems is a consideration at multiple levels to processing parallel computer, efficiency is dependent upon the ratio of communication time can handle such a time. This requires an efficient processor and a communication network that of this work. load. Design of the communication network is described in the remaining chapters Figure 3. Die Shot of K7 Athlon Processor maximizing the number of Processor efficiency on the ARIES HPC system is achieved in part by of transistors running computations per silicon area. Modern day processors contain millions processor. Less than a sequential streams of instructions. Figure 3 shows a die shot of an Athlon The other three-quarters quarter of the chip area is devoted to useful computation shown in red. as caching, instruction are dedicated to keeping the first quarter busy through techniques such 17 reordering, register renaming, branch prediction, speculative execution, block fetching and prefetching. In a parallel architecture context where multiple processes are placed on a single die, overall area efficiency is more important than the raw speed of an individual processor. Simple, efficient and cost effective architectures will yield the greatest computing power. Efficient computing is achieved through the use of less silicon to perform the same amount of work. 18 CHAPTER 3 THE FLEXIBLE INTEGRATED NETWORK INTERFACE The Flexible Integrated Networking Interface (FINI) is the high-speed network interface in the ARIES HPC system that facilitates communication between the processor nodes. A high bandwidth, low latency interprocessor communication network is necessary for efficient and reliable multiprocessor operation. This chapter focuses on the motivations behind the design of the FINI and describes its architecture and functionality. The conclusion highlights the goals and challenges of this design. 3.1 DESIGN MOTIVATIONS Advances in technology have provided engineers with a vast array of building blocks. The combinations that can achieve any one result are nearly endless, as is the case for developing the 19 FINI. The focus in designing this interprocessor communication network is on how to achieve a high-speed network through the evaluation of technological trends, advancements and practical limitations. Results from this work will be of benefit to future designers of all high-speed signaling networks in HPC systems. Past research on high-speed network design for large-scale multiprocessing implemented the network interface in silicon to produce an integrated circuit (IC) chip [DeHon93]. This was a time consuming, costly (on the order of $1OK for 30 chips) and fixed solution. Designing an IC is a worthy investment if the goal is to mass-produce a commercial product, but it is not economically feasible for a low volume scenario such as the research project at hand. The goals associated with designing the FINI are the antithesis of what an IC design entails: " Short design cycle " Low cost " Flexibility. In a research project, the probability of revisions is extremely high. A low overhead solution coupled with a short design cycle minimizes the total cost and design time in the long run. The FINI leverages existing technology by integrating commodity-off-the-shelf components (COTS) on a stand-alone printed circuit board (PCB) to achieve the desired interconnect solution in a relatively short design cycle. Modifying a PCB is relatively easy compared to redesigning a completely new IC. Recent technological advancements have produced powerful programmable logic devices that can be integrated into a design to provide a more flexible and cost efficient solution. Incorporating field-programmable gate arrays (FPGA) in this solution provides the flexibility to implement various functions and perform multiple tests using the same platform. 20 A high-speed signaling network depends upon the FINI architecture, the signaling techniques used on the PCB and the chosen COTS components. The remainder of this chapter describes the architectural design of the FINI while Chapter 4 examines various technical design issues related to high-speed signaling. Components used on the FINI card are detailed in Chapter 5. 3.2 ARCHITECTURE The Flexible Integrated Network Interface (FINI) is the physical layer and link layer medium that controls the movement of data between processor nodes. Its primary function is to transmit data from one processor node to another and receive data in the reverse direction as well. NHPC Processor F Node o N C A D HPC Processor NodeR D Figure 4. FPS and FINI Connection Diagram The Flexible Integrated Network Interface is comprised of FINI cards implemented on PCBs connected together by high-performance cables. Each processor node on the ARIES HPC system requires a dedicated FINI card to handle its data transmission and reception. A partial diagram of the setup is shown in Figure 4. At the time of writing, the ARIES HPC system is not ready to be tested in conjunction with the FINI. For the purposes of programming and testing the FINI in the meantime, a desktop PC and fast, large SRAM buffers will be used in place of the ARIES HPC 21 system. While this alternative solution will not demonstrate the capabilities of the HPC system, it does provide a means of testing the communication network itself. At this phase in the HPC development process, it is more crucial to understand the design of high-speed networks and associated signaling techniques. Focus is placed on maximizing the performance of the FINI card. The FINI card is responsible for tasks such as bus arbitration and data synchronization. Data management on board the FINI card is controlled by FPGAs. A parallel port serves as the interface between the computer and the FINI card. Data to be transmitted between processor nodes enters the FINI from the parallel port into the Master FPGA (FPGA-1). Data can be stored in the associated memory until it is ready to be passed to the transmitter and sent to another processor node through high-performance cables. A FINI that is receiving data from another processor nodes sees data enter via the receiver. This incoming data is passed into the Slave FPGA (FPGA-2) and can be then stored in the associated memory until it is needed. Memory - FPGA 1 Transmitter FPGA 2 Receiver 0e Clock Memory -- Figure 5. Flexible Integrated Network Interface Block Diagram 22 Figure 5 presents the basic block diagram of the FINI architecture. The FINI is powered by 2.5V and 3.3V power sources and clocked by a 100MHz clock. The Master FPGA (FPGA-1) is programmed via the parallel port while the Slave FPGA (FPGA-2) is programmed by the Master FPGA. Section 5 describes each component used on the FINI card in detail. 3.3 DUPLICATED ARCHITECTURE This is the first version of the FINI and one of the primary concerns is ensuring that all components work correctly and the design is valid. All projects come with a limited budget, so minimizing costs when possible is desirable. By choosing small component packages when possible, less board real estate will be consumed. Furthermore, since two FINIs are required to successful test data transmission and reception, an alternative to producing two FINI boards is to replicate the design on one single PCB. This approach will be taken and the FINI will be replicated on the backside of the PCB (See Figure 6). In the duplicated architecture, some components are not replicated because resources from components on the top side can be shared. One 3.3V power supply is sufficient to support the duplicated design and is therefore not replicated on the back of the board. The 2.5V power supply is replicated because the FPGAs run off of this supply and they each consume enough power that one will not be sufficient. There is only one clock on the board. For the purposes of programming on-board components, the parallel port is wired to FPGA-I that serves as the Master FPGA. FPGA-2, FPGA-3, and FPGA-4 are all Slave FPGAs. 23 32 PARALLEL PORT 2.5V 17/ FPGA-1 48 XMT-1 18 3.3V H E A D E R 12 4, FP4A-2 SRAM RCV-1 32 DDS 0 I- 32 16.6MHz FPC A-3 48 XMT-2 18 H E A D E R 0 I.- FPGA-4 48 RCV-24 TDB = Tes t and Debug Bus Figure 6. FINI Block Diagram with Duplicated Architecture 3.4 DESIGN GOALS Engineering is about problem solving. The answer to the high-speed signaling network problem in HPC systems is not one hard-set answer though. Instead, there are an endless number of ways 24 to implement FINI given the choices in components, advancement in technology and design trends. The main purpose behind designing FINI is to understand and evaluate the technology available on the market today to determine if it will meet the needs of an HPC system. 3.4.1 Technology Trends Everything is getting smaller while becoming more powerful. Moore's Law [Moore79] suggests that the number of devices that can be economically fabricated on a single chip increases exponentially at a rate of about 50% per year and hence quadrupling every 3.5 years. Meanwhile, the delay of a simple gate has decreased by 13% a year, halving every 5 years and chip sizes are increasing by 6% a year. Such breakthroughs have been possible with improvements in the semiconductor industry scaling down gate lengths from 50 pm in the 1960s to 0.18 pm 40 years later. Advancements in mobile computing and portable devices are driving the miniaturization of ICs and their packaging while simultaneously providing increased capabilities and consuming less power. Higher speeds and smaller form factors require tight integration between interconnects and components. With such improvements in technology, the problems associated with systemslevel engineering in high-speed digital systems have become more critical. 3.4.2 Component Choices The FINI card will be designed on a PCB using COTS components with cables providing the interconnection medium. Much of FINI's physical performance will therefore depend upon the capabilities of the chosen components. For instance, the supported bandwidth will be largely determined by the capabilities of the chosen transmitter and receiver. Because component availability is often dictated by cost and manufacturing limitations, future versions of the FINI 25 card should take into consideration alternative choices. Chapter 8 provides some suggestions, but with more COTS components appearing every day, a thorough industry search is beneficial. 3.4.3 Design Trends Rapid advancement in VLSI technology and the constant development of new applications keeps the digital systems engineering field in a constant state of flux. Due to time constraints and lack of familiarity with new technologies, design methodologies used in industry are often recycled for reuse on future projects. This can often lead to a false sense of security. As technology advances, ad hoc approaches that worked in the past become sub optimal and often fail. Due to compatibility problems with existing parts, processes or infrastructure, the result is either a product that fails to operate, fails to meet specifications, or is not competitive by industrial standards. Avoiding this pitfall requires an understanding at both the digital and analog aspects of the design. 3.4.4 FINI Challenges The design challenges associated with the development of FINI are not falling in the previously described traps. Techniques used for high-speed signaling over long-haul networks are unique and not well documented. Careful selection of components and good implementation of design techniques is vital for the creation of a reliable system. System-level engineering constrains what an architect can do and is the major determinant of the cost and performance of the resulting system [DP98]. Each decision along the way will contribute to the performance and reliability of a system. The next two chapters will discuss the theory behind the design techniques used and cover the various components used to implement a highspeed signaling network. 26 CHAPTER 4 HIGH-SPEED INTERCONNECT The previous chapter introduced the FINI architecture while highlighting some of the challenges involved. In this chapter, we address the design issues involved with the physical interface linking two processing nodes together. 4.1 HIGH-SPEED SIGNALING PROBLEM High-speed cables serve as the interconnect between FINI cards, each associated with an HPC processing node. The more processing nodes there are in the ARIES HPC system, the farther apart they may be located from each other. Hence, the FINI cards need to be able to drive signals traveling over long interconnect lines up to 10 meters in length external of the FINI board. Consequently, the interconnect medium will behave like a transmission line and our signaling 27 problem becomes a transmission line design problem. A signaling technology that is immune to noise and that can maintain integrity across such distances is required. This section reviews topics in transmission line theory and examines signaling technologies that will be applicable to the design of the interconnect network. The high-speed region on the network interface occurs both on the FINI board and off the board. The transmission line phenomenon occurs in both regions, but the primary concern of this work is how the high-speed cables behave as transmission lines. The signaling technology chosen will be applicable in both regions. General references are available describing the design of high-speed signals on PCBs ([DP98] [JG93]). 4.2 TRANSMISSION LINES Under certain conditions, an interconnection ceases to act as a simple pair of wires and behaves as a transmission line, which has different characteristics. The term "wire" refers to any pair of conductors used to move electrical energy from one place to another, such as traces with one or more planes (ground of power), coax cables, twisted pairs, ribbon cables, telephone lines and power lines. The transmission lines connecting FINI boards occur in the form of twisted-pair cables. This section describes the conditions under which wires behave as transmission lines, explains how they are modeled, what the potential side effects are, and how to avoid transmission line effects. 4.2.1 Definition of High-Speed Signal The point at which an interconnection ceases to act as a pair of wires and behaves as a transmission line instead depends upon the length of the interconnection and the highest frequency component of the signal. A signal is considered high-speed if the line is long enough 28 such that the signal can change logic levels before the signal can travel to the end of the conductor. When its edge rate (rise or fall time) is fast enough, the signal can change from one logic level to the other in less time than it takes for the signal to travel the length of the conductor. A conservative rule of thumb is that a long interconnection is one whose length is longer than 1/10 of the signal's wavelength [Suth99]. For example, a 1-meter long cable can be treated as a lumped capacitor at signal frequencies up to 19.8MHz. Useful formulas for such a calculation are listed below. wavelength = velocity x time = velocity x velocity = c Tr speed of light - Vrelative dielectric constant - frequency = velocity distance 1 frequency or A= C 0 xT 300x10 6 rn/sec =198x 10 6 n/sec 198 x 106 m/sec = on19.8 =itac l0in MHz In the calculations, the distance (or wavelength) is 10 meters because a 1-meter wavelength is 1/ 1 0 th of that. The relative dielectric constant for cables is 2.3. Table 1 gives the dielectric constants of some common materials. Material Dielectric Constant, Er Air 1 Cable (RG-58) 2.3 PCB (FR-4) 4 Glass 6 Ceramic 10 Barium titanate 1200 Table 1. Dielectric Constants of Some Common Materials [Suth99] 29 Interconnects that are classified as short-length look like a pair of wires and a capacitor. When the length of the interconnection approaches 1/04 the wavelength, resonance occurs and it no longer behaves as a lumped capacitor. For this reason, interconnections should not be near 1/4 the wavelength [Suth99]. A more convenient way of determining if a signal is considered to be high-speed is using its rise time. An interconnection is considered to be at a high frequency when the signal's rise time (tr) is less than twice the flight time (tf), or propagation time (tpd). This is the amount of time it takes the signal to reach the end of the interconnect. t distance velocity t = = d =-- -c/f im 198 x106 rn/sec =5.05 nsec If a signal takes 5.05 nsec to travel one-way down an interconnect, it will take 10.1 nsec to travel round-trip. This 1-meter cable is considered a transmission line if the rise time is less than 10.1 nsec. 4.2.2 Transmission Line Models Two models of transmission lines are shown in Figure 7. Transmission lines have four electrical parameters as shown in the second-order model: resistance along the line, inductance along the line, conductance shunting the line, and capacitance shunting the line. Although this model is more accurate, the more common representation is the first-order model because most transmission line cables and PCB conductors have L and C values that dwarf the R values [Suth99]. 30 R R L -- Ayi L AM~PfV G Y C C 2nd-Order Transmission Line L L C C 1st-Order Transmission Line Figure 7. First- and Second-Order Models of Transmission Lines In many cases, resistance is small enough that it can be ignored. Inductance and capacitance are defined by the geometric shape of the transmission line; they are independent of the actual size and are determined by the ratio of the cross-sectional dimensions. Shunt conductance is almost always negligible because low loss dielectrics are used. If the line is short or the rise time of the signal is long, transmission line effects are not significant. Then inductance of the line is negligible, and the line can be approximated as a lumped capacitor. The delay is determined by the impedance of the driver and the capacitance of the line and can be decreased by reducing the source resistance of the driver [Bakoglu90]. The inductance becomes important if the line is long and as a result has a large inductance or if the rise times get faster. Then the transmission line effects surface, and both the distributed inductance and capacitance must be taken into account. 31 4.2.3 Impedance Discontinuities Impedance is the hindrance (opposition) to the flow of energy in a transmission line. Impedance of a line matters when the time that it takes a signal to change levels is short enough that the transition is completed before the signal has time to reach the far end of the line. The characteristic impedance of line is given by the equation: " I C When impedance discontinuities occur, energy from the signal flowing down the line will be reflected back. This reflection could interfere with the proper operation of circuits connected on the line. Sources of impedance discontinuities include: " Trace width changes on a layer, " Connectors, * Unmatched loads, " Open line ends, * Stubs, * Changes in dielectric constant, * Large power plane discontinuities. In order to insure that high-speed signals arrive at their destinations with a minimum amount of distortion, it is necessary to insure that the impedances of all the components in the circuit are held within a range consistent with the noise margins of the system. The prime area of concern is outlined in Figure 8. Transmission line effects such as reflections and crosstalk occur in highspeed designs when impedance discontinuities occur. 32 Connector FPGA-21 XM qCV FPGA-3 Cable up to 10 meters FPGA-1 RCY _ MT 12 FPGA-4 FINI Board FINI Board High-Speed Region Figure 8. High-Speed Region on Network Interface Transmission line impedance is determined by the geometry of the conductors and the electric permittivity of the material separating them. Impedance can be determined by the ratio of trace width to height above ground for printed circuit board traces. For twisted-pair lines, impedance is dependent upon the ratio of wire diameter to wire separation. Impedance is always inversely proportional to the square root of electric permittivity. The pertinent impedance equations for twisted-pairs and microstrips are provided here. For twisted-pair cables, impedance (9)= 1{In -- d~ S Figure 9. Twisted-Pair Dimensions where d is the diameter of the conductor, s is the separation between wire centers and E, is the relative permittivity of substrate as shown in Figure 9. 33 For microstrips, impedance (Q)= 87 (5.98h In 5.8 e +1.41 0.8w+t *w -------------------- -It hI] Figure 10. Microstrip Dimensions where h is the height above ground (in.), w is the trace width (in.), t is the line thickness (in.) and 6, is the relative permittivity of substrate as shown in Figure 10. 4.2.4 Skin Effects At low frequencies, resistance stays constant, but at high frequencies, it grows proportional to the square root of frequency. When resistance starts increasing, more attenuation (loss) is present in the wire while maintaining linear phase [JG93]. The series resistance of AWG 24 wires is shown in Figure 11. At 65 kHz, resistance begins rising as a function of frequency. The increase in resistance is called the skin effect. 34 101 C 10-' 0 C 4 0 10-5 10-7 101 105 103 107 109 Frequency (Hz) Figure 11. Skin Effect in AWG24 Copper Versus Frequency The skin effect frequency is dependent upon the conductor material. Table 2 lists a few conductors and the frequency at which the skin effect begins to take hold [JG93]. Conductor Skin-Effect Frequency RG-58/U 21 kHz AWG24 65 kHz AWG30 260 kHz 0.010 width, 2 oz PCB trace 3.5 MHz 0.005 width, 2 oz PCB trace 3.5 MHz 0.010 width, 1 oz PCB trace 14.0 MHz 0.005 width, 1 oz PCB trace 14.0 MHz Table 2. Skin-Effect Fequencies for Various Conductors 35 4.2.5 Termination Techniques A multitude of problems arise when interconnects that behave as transmission lines are not properly terminated. At the far end of the cable, a fraction of the attenuated signal amplitude emerges. Additionally, a reflected signal travels back along the cable towards the source. Reflections cause temporary ringing in the form of voltage oscillations. Larger voltages and currents radiate larger electric and magnetic fields and transfer more crosstalk energy into neighboring wires. Terminations are necessary to eliminate reflections and the distortions it would cause in the actual signal. Common termination schemes include parallel termination and series termination. More sophisticated termination techniques have been developed which incorporate voltage controlled output impedance drivers [DeHon93]. Such techniques allow the output impedance to be varied such that it can be automatically matched to the attached transmission line impedance. 4.3 DIFFERENTIAL SIGNALING When driving signals over a long cable, maintaining signal integrity becomes a challenge because noise and attenuation effects will most likely corrupt the signal. Differential signals provide a balanced solution that is capable of dealing with signal degradation over long lines and are reasonably immune to common mode noise. 4.3.1 Differential Signal Basics Differential signals are symmetric such that as the device switches at output A, the complementary output A also switches at the same time in the opposite polarity (see Figure 12). The difference in amplitude between the two signals serves as signal information. If the two 36 signals are kept under identical conditions, then noise induced on the signals will affect both signals and the signal information will be unchanged. At high data rates, the use of differential signals provides increased immunity from noise. A "True" logic gate "Complement" A Figure 12. Differential Signals The maximum immunity from noise and EMI occurs when both signals switch simultaneously. When there is a time shift between true and complementary signals, a differential skew occurs as shown in Figure 13. A A differential skew Figure 13. Differential Skew 37 As the signal travels down the transmission line, noise generated by one side of the differential pair can be effectively cancelled by the other side if the lines are closely coupled as with a twisted pair or shielded pair. This cancellation depends on the simultaneous switching of the signals at the driver as well as all along the transmission line where the signal propagates. Therefore, the transmission media should be designed to minimize skew between differential lines. Skew also occurs if the two wires in the twisted-pair are not of equal length. When one side receives the signal early, the total differential signal will look like a sloped staircase in the input transition region as shown in Figure 14. This can lead to timing uncertainty or jitter at the receiver output. The overall skew will be a function of the skew characteristics of the transmission lines and the length of the interconnect. Shorter, matched-length twisted-pairs and simultaneous switching will minimize skew. A differential signal Figure 14. Effect of Skew on Receiver Input Waveform At the receiving end, the differential receiver compares the two signals to determine their logic polarity. It ignores any common mode noise that may occur on both sides of the differential pair. To optimize this noise canceling effect, the coupled noise must stay in phase. When the noise does not arrive at the receiver simultaneously, noise rejection will not occur and some noise remains on the signal. 38 4.3.1 LVDS Low voltage differential signaling (LVDS) is a data communication standard (ANSI/TIA/EIA644, IEEE 1596.3) using very low voltage swings over two differential PCB traces or a balanced cable. It is a means of achieving a high performance solution that consumes minimum power, generates little noise, is relatively immune to noise and is inexpensive. LVDS has the flexibility of being implemented in CMOS, GaAs or other applicable technologies while running at 5V, 3.3V or even sub-3V supplies. The LVDS transmission medium must be terminated to its characteristic differential impedance to complete the current loop and terminate high-speed signals. The signal will reflect from the end of the cable and interfere with succeeding signals if the medium is not properly terminated. When two independent 50I traces are coupled into a 1002 differential transmission line, a 100n terminating resistor is placed across the differential signal lines as close as possible to the receiver input as shown in Figure 15. This helps to prevent reflections and reduce unwanted electromagnetic emissions. --- Drver D rZve= g 50 ohm s Receiver - Figure 15. LVDS Point-to-Point Configuration with Termination Resistor 4.4 TRANSMISSION MEDIA To transmit signals from one FINI board to another, the driver or transmitter must send out LVDS signals which travel through a connector and out onto the cable. At the other end, these signals 39 enter a connector on the board before reaching the receiver. Design of the transmission media requires three sets of components: a transmitter/receiver pair, connectors and cables. This section describes selection criteria used to determine the most appropriate media to attain the desired results. 4.4.1 Transmitter/Receiver Pair Transmitters and receivers typically come in pairs with complementary functions. The requirements of these chips are: " TTL + LVDS conversion * 3.3V power supply Signals on the FINI card are TTL and need to be converted to LVDS before being transmitted across the cable. To minimize power conversion, the maximum power supply allotted is 3.3V. In addition, the transmitter/receiver pair's specifications should be optimized on these features: " Large number of LVDS output signals * Capability to minimize skew effects. The more LVDS output signals provided, the greater the bandwidth of the FINI card. The number of output signals will determine the cable and connector size necessary. To ensure signal integrity, the receiver should have enhanced capabilities dealing with skew and noise effects. Additional serializing techniques allow easy management of shorter on-board signals. 4.4.2 Cable The requirements of the cable used for transmitting signals are: * Symmetrical or balanced transmission lines * Availability in variable lengths 40 - - _ =-- _ -1-1, __ - - -__ . __ Other factors to consider are: " Minimal skew within pairs and between pairs of wires " Ability to match board and capable impedance. The high-performance cables connecting the FINI cards can be anywhere from 2 to 10 meters in length. When signaling over a long external cable, the return impedance is too high to be ignored. In this case, shielded twisted-pair cables are used as the transmission medium to provide a balanced configuration where the return impedance equals the signal impedance. In a balanced cable, the signal flows out along one wire and back along the other. Proper matching of differential signals is achieved using balanced signals. Jacket (dielectric) First conductor Jacket (dielectric) Second conductor Figure 16. Twisted-Pair Cable When choosing a cable assembly, it must have a sufficient number of twisted-pairs to support the bandwidth of the transmitter/receiver chips. The cable specifications will indicate the amount of skew in between each differential pair. Cables available in a variety of lengths are necessary for the ARIES HPC system. 41 FMOF- ---I" 4.4.3 Connector Connectors serve as the interface between the electrical and mechanical parts of a design. Their reliability often determines the overall reliability of the system. The first requirement of the connector is that it matches the cable and has the required number of pins. The more important considerations are those concerning signal integrity. The connector represents the least shielded and highest impedance discontinuity portion of the signal path in the FINI high-speed interconnect region. Differential signals running through a connector should be placed on adjacent pins. This will allow their returning signal current paths to overlap and cancel. The associated traces on the PCB should be kept close together as well. 42 CHAPTER 5 COMPONENT SPECIFICATION With an endless supply of COTS components on the market, how does an engineer decide which one to use in a design? Even a component as seemingly simple as a capacitor comes in a multitude of variations. The physical performance of the FINI depends greatly on the chosen components. The critical components used on the FINI are described in the following sections. 5.1 LVDS TRANSMITTER AND RECEIVER The main function of FINI is to ensure the reliable and accurate transmission and reception of data. To successfully transmit high-speed digital data over very long distances, a performance solution that consumes minimal power and is immune to noise is necessary. 43 5.1.1 DS90C387/DS90CF388 Data transmission and reception on the FINI card are accomplished using a high-speed point-topoint cabled data link chipset from National Semiconductor. The DS90C387/DS90CF388 transmitter/receiver pair was originally designed for data transmission between a host and flat panel display but its capabilities suit the FINI application well. It converts 48 bits of CMOS/TTL data into 8 LVDS (Low Voltage Differential Signaling) data streams at a maximum dual pixel rate of 112 MHz or vice versa. The LVDS bit rate is 672 Mbps per differential pair providing a total throughput of 5.38Gbps. 5.1.2 Decision Factors A chipset supporting LVDS technology is chosen because it offers better performance than single-ended solutions by supporting higher data rates and requiring lower power while being less susceptible to common mode noise at lower costs. The DS90C387/DS90CF388 offers greater bandwidth than standard LVDS line drivers and receivers [National97]. With 8 LVDS channels, it offers at least twice the bandwidth available from general drivers and receivers. Longer cables increase the potential of differential skew and subsequent cable loading effects. The DS90C387 is equipped with a pre-emphasis feature that adds extra current during LVDS logic transitions to reduce cable loading effects. This helps to reduce the amount of jitter in the signal seen by the receiver and minimize problems associated with high frequency signal attenuation. Cable lengths will vary from 2 meters to 10 meters. At such great lengths, the pair-to-pair skew is potentially quite significant. The DS90CF388 has a built-in deskew capability that can deskew long cable pair-to-pair skews up to +/-1 LVDS data bit time. This increases the tolerance allotted 44 in the cables and decreases the potential bit error rate. Available for less than $10 a chip, this National Semiconductor chipset provides ample capabilities at a low cost. 5.1.3 Functionality of DS90C387/DS90CF388 The transmitter will receive input from the FPGA and send out LVDS signals onto the high performance cables connecting the two FINI boards. At the other end, the receiver will convert the LVDS signals back into CMOS/TTL data that is then passed onto another FPGA. For the FINI card, the transmitters and receivers are configured for dual pixel applications by setting the DUAL pin on the DS90C387 to Vcc. The DC Balance feature sends an extra bit along with the pixel and control information to minimize the short- and long-term DC bias on the signal. Cable loading effects are reduced by using the pre-emphasis feature built into the DS90C387. Additional current is added during LVDS logic transitions by applying a DC voltage at the PRE pin (pin 14). The required voltage is dependent upon both the frequency and the cable length. A higher input voltage increases the magnitude of dynamic current during data transition. A pull-up resistor from the PRE pin to Vcc sets the DC level. See Tables 3 and 4 to set the appropriate voltage level. On board the FINI, a potentiometer is used to provide variable resistance via the same design. RE Resulting PRE Voltage Effects I M9 or NC 0.75V Standard LVDS 50kQ 1.0V 9kW 1.5V 3kQ 2.OV 1kn 2.6V 100 Vcc 50% pre-emphasis 100% pre-emphasis Table 3. Pre-emphasis DC Voltage Level with RpE [National99] 45 Frequency PRE Voltage Typical Cable Length 112 MHz 1.OV 2 meters 112 MHz 1.5V 5 meters 80 MHz 1.0V 2 meters 80 MHz 1.2V 7 meters 65 MHz 1.5V 10 meters 56 MHz 1.0V 10 meters Table 4. Pre-emphasis Needed per Cable Length [National99] 5.2 PROCESSORS The FINI architecture is designed to serve as the interconnect solution for HPC systems as well as be a platform to evaluate on-board components. With this dual intention, the FINI card requires enough flexibility to implement test code and various networking protocols. This can be achieved by using a programmable device as the core processor on board. 5.2.1 Virtex XCV100-6PQ240C The duplicated architecture of the FINI card employs four Xilinx Virtex FPGAs, the XCV 1006PQ240C, as the core processors on board. These 2.5V field programmable gate arrays consist of 108,904 gates, 2700 logic cells and 180 1/0 pins [Virtex99]. Initially, the FPGA will facilitate the testing process in determining the signal quality on board. In the long run, implementing the METRO protocol using FPGAs will permit iterations of the networking protocol to achieve maximum throughput and minimum expected latency. Each XCV 100-6PQ240C is associated with either a transmitter or receiver on the FINI card and controls the data flowing in and out of the respective transmitter or receiver chip. Traffic through 46 the FPGAs can originate from either the associated memory chip or the processor node in the FPS context. For the purposes of validating the FINI design however, random data will be generated in a computer since the FPS is not available at the time of writing. 5.2.2 Decision Factor The Virtex family of products from Xilinx Corp. offers full-featured high-end FPGAs designed and manufactured using the latest technology. FPGAs no longer serve only as glue logic, but are often integral system components. With four full digital delay locked loops, these FPGAs are capable of system clock generation, synchronization and distribution. With more embedded SRAM than previous generations of FPGAs, these chips offer a high-speed flexible design solution that reduces the overall design cycle time required of an engineer. The capabilities of a Virtex FPGA make it an ideal choice for the core processors on the FINI card. Of the various Virtex FPGAs, the XCV100-6PQ240C was chosen due to cost and availability constraints. Another factor of consideration was the number of 1/0 pins. The main components interfacing with the FPGA are the transmitter/receiver and an external SRAM. At component selection time, 180 pins were determined to be sufficient. This will be further evaluated at the conclusion of the design. 5.2.3 Progranuning the XCV100-6PQ240C Of the many ways the Xilinx XCVlOO-6PQ240C can be programmed, SelectMAP mode is the fastest and will be used to program the four FPGAs on the FINI card. In SelectMAP mode, an 8bit bi-directional data bus (DO-D7) is used to write onto or read from the FPGA. To configure the FPGA using SelectMAP mode, the configuration mode pins (M2, Ml, and MO) need to be set according to the table below. 47 M2 M1 Configuration Mode SelectMAP Mode 1 1 MO CCLK Direction Data Width Serial Dout Pre-configuration Pull-ups 0 In 8 No No Table 5. SelectMAP Mode Configuration Code [Virtex99] Once the FPGA is powered-up, the configuration memory needs to be cleared before configuration may take place. /PROG is held at a logic Low for a minimum of 300 ns (no maximum) to reset the configuration logic and hold the FPGA in the clear configuration memory state. While the configuration memory is cleared, /INIT is held Low to indicate that the configuration memory is in the process of being cleared. /NIT is not released until the configuration memory has been cleared. With the configuration memory cleared, the configuration mode is selected and then configuration commences. FPGAD[ :714 FPGACC LK /FPGA-WRI TE I I ~1 A4TJ- FPGA_BU SY M1 MO D[0:7] 0,CCLK /RITE - BUSY ICS /CS2 /PROG DONE FPGA-4 FPGA-3 FPGA-2 /CS3 M1 M1 MO MO D[0:71 COLK /WRITE BUSY /CS /CS4 D[0:7] CCLK /WRITE BUSY /CS -a /PROG DONE /PROG DONE /NIT N /NIT /NIT FPGADO NE /FPGAI NIT /FPGAPROGR AM I Source: XAPPl38 (v2.0) Figure 17. Diagram of Parallel Programming FPGAs in SelectMAP Mode [Xilinx00] Virtex FPGAs cannot be programmed in a serial daisy-chain format, but SelectMAP mode provides an alternative by allowing multiple FPGAs to be connected in a parallel-chain fashion (See Figure 17). Data pins (DO-D7), CCLK, /WRITE, BUSY, /PROG, DONE and /INIT are 48 shared in common between all FPGAs while the /CS input is kept separate so that each FPGA may be accessed individually. Ideally, we would like all four FPGAs arranged in this fashion and programmed by the computer via the parallel port, but due to insufficient pins on the parallel port, only FPGA-1 will be programmed in this manner. The other three FPGAs will be connected in parallel and programmed by the FPGA-1 (hence the Master FPGA) in the manner just described using three distinct chip select pins. Configuration data is written to the data bus D[0:7], while readback data is readfrom the data bus. The bus direction is controlled by /WRITE. When /WRITE is asserted Low, data is being written to the data bus. When /WRITE is asserted High, data is being read from the bus. Data flow is controlled by the BUSY signal. When BUSY is Low, the FPGA reads the data bus on the next rising CCLK edge where both /CS and /WRITE are asserted Low. Data on the bus is ignored if BUSY is asserted High. The data must be reloaded on the next rising CCLK edge when BUSY is Low for it to be read. If /CS is not asserted, BUSY is tri-stated. All reads and writes onto the data bus for configuration and readback is synchronized to CCLK, which can be a signal generated externally or driven by a free running oscillator. 5.2.2 FPGA-1 (Master) The Master FPGA is responsible for programming FPGA-2, FPGA-3 and FPGA-4. The programming procedure for programming the slaved FPGAs is virtually the same as programming the master FPGA. In this case, circuitry within FPGA-l serves the function of the computer and 1/0 pins replace the parallel port's function. Unique chip selected pins (pins 230, 236 and 234) allow it to select which FPGA it is programming. With 5 dedicated controls pins (XMTRFB, XMT1_DCBAL, XMT1_DE, XMTlVSYNC, XMT1_HSYNC) and 48 data pins connected to Transmitter-1, FPGA-1 is responsible for 49 controlling the data flowing through Transmitter-1. In addition, FPGA-1 also controls when data is written into and read from SRAM-1. 5.2.3 FPGA-2 (Slave) FPGA-2 is programmed by FPGA-i (Master) because there are insufficient pins on the parallel port for it to be programmed by the computer directly. The majority of the 1/0 pins on FPGA-2 are dedicated to controlling data flow in and out of Receiver-I and SRAM-2. A 20 MHz clock connected at pin 210 drives the internal circuitry and is used by FPGA-2 to program the direct digital synthesizer (DDS) in the system clock circuitry (See Section 5.4). An 8-bit data bus and 3 control pins (DDSWORDLOAD, DDSFQUD, DDSRESEST) are connected to the DDS to set its frequency tuning word, phase offset and output frequency. FPGA2 also controls the potentiometer used to set the pre-emphasis resistance associated with Transmitter-1 (See Section 5.1.3). 5.2.4 FPGA-3 (Slave) FPGA-3 is programmed by FPGA-I (Master) and is responsible for data traffic flowing through Transmitter-2. It also controls data passage in and out of SRAM-3. The pin out information is similar to that of FPGA-1. In addition, FPGA-3 also controls the potentiometer used to set the pre-emphasis resistance associated with Transmitter-2. 5.2.5 FPGA-4 (Slave) FPGA-4 is programmed by FPGA-i (Master) and is responsible for data traffic flowing through Receiver-2. It also controls data passage in and out of SRAM-4. The pin out information is similar to that of FPGA-2. 50 5.3 MEMORY The FPGAs are responsible for flow of traffic to and from the transmitter and receiver as well as programming all associated components on board to enable the FINI card. Because the FPGAs are responsible for multiple tasks, data coming into a transmitter or receiver may not be ready to be sent out on the next clock cycle. Each FPGA has an associated SRAM to store data for future use. 5.3.1 MT55L256L32P Associated with each FPGA on the FINI card is a Micron MT55L256L36P, an 8 MB pipelined Zero Bus Turnaround (ZBTm) SRAM block with 18-bit word addressing featuring a high-speed, low-power CMOS design. ZBTm SRAMs allow for back-to-back read and write cycles, minimizing overall latency and operate from a 3.3V power supply. Memory access is controlled by the respective FPGA to which it is associated. 5.3.2 Decision Factors With little or no time to spare, higher system speeds and greater bandwidth designs may encounter severe bus-contention issues using conventional synchronous SRAMs. Maximum bus utilization can be achieved using Zero Bus Turnaround (ZBT) technology. It performs alternating reads and writes continuously to achieve an even Read and Write ratio (R/W =1). When transitioning from READ to WRITE, no idle time is wasted, thereby improving the bandwidth of the system. Pipelined ZBT SRAMs represent a significant improvement in memory technology over synchronous SRAMs for high-speed designs. Ideally, an 8 Mb, 256K x 36, ZBTmSRAM would be used for the FINI design, but due to limited availability, the 256K x 32 configuration is used 51 instead. At the time of design, the final spec sheet for the MT55L256L32P had not been released yet. 5.4 CLOCK The FINI system is designed to run on a 100 MHz clock. The system clock is generated from a 16.6 MHz oscillator and a direct digital synthesizer (DDS) with a 6x multiplier from Analog Devices, the AD9851. 5.4.3 Functionality of AD9851 The AD9851 is a 3.3V integrated device that employs DDS technology coupled with an internal high speed, high performance D/A converter, and comparator, to form a digitally-programmable frequency synthesizer and clock generator function [Analog99]. It generates a digitized analog output sine wave that is internally converted to a square wave producing the desired agile and accurate clock. The DDS circuitry is composed of a digital frequency divider function whose incremental resolution is determined by the frequency of the reference clock divided by the 232 number of bits in the tuning word. The phase accumulator is a variable-modulus counter that increments the number stored in it each time it receives a clock pulse. It "wraps around" when the counter reaches full scale enabling the accumulator to deliver a phase-continuous output. 52 Reference Clock Phase Amplitude/Sine Acumlao Conversion Alaorithm D/ALP CLn Converter Cmato Ctromparator ClcOu ->Clock Out Frequency Tuning Word In Digital Domain Figure 18. Basic DDS Block Diagram and Signal Flow of AD9851 [Analog99] 5.4.2 Decision Factors A 100 MHz oscillator is not used for this system because oscillators at high frequencies are more prone to noise resulting in a jittery clock. By coupling a lower frequency oscillator with a DDS, a highly accurate and harmonically pure digital representation of a signal can be generated [Qualcomm99]. The choice of a DDS was based on ease of implementation. Section 5.4.3 describes how to program the AD985 1. The AD985 I's 6x Multiplier feature eliminates the need to locate a high frequency oscillator. Epson Electronics manufactures programmable oscillators (SG-8002) available in a SMT package at any frequency between 1 MHz and 125 MHz. Under normal circumstances, this part would have been easily ordered from Digikey, but due to a worldwide shortage of ceramics caused by the surge in the cellular phone market, these parts were not available. The closest substitute available is a 16.384 MHz oscillator at the time of design. The generated system clock will not be 100 MHz, but will be close enough until the other parts are available. 53 5.4.3 Programming the AD9851 The AD9851 can be programmed via parallel or serial format by loading a 40-bit input register with contents as follows: * A 32-bit frequency tuning word, " A 5-bit phase modulation word, " A 6x REFCLK Multiplier enable, " Power-down function. The register contents are separated into five words (WO-W5) of 8 bytes each for programming purposes. The contents of each word are shown in Table 6 below. Data[7] Data[6] Data[5] Data[4] Data[3] Data[2] Data[l] Data[0] WO Phase-b4 (MSB) Phase-b3 Phase-b2 Phase-bl Phase-bO (LSB) PowerDown Logic 0 6x REFCLK Multiplier Enable WI Freq-b31 (MSB) Freq-b30 Freq-b29 Freq-b28 Freq-b27 Freq-b26 Freq-b25 Freq-b24 W2 Freq-b23 Freq-b22 Freq-b21 Freq-b20 Freq-bl9 Freq-bl8 Freq-bl7 Freq-bl6 W3 Freq-bl5 Freq-bl4 Freq-bl3 Freq-bl2 Freq-b 1 Freq-b10 Freq-b9 Freq-b8 W4 Freq-b7 Freq-b6 Freq-b5 Freq-b4 Freq-b3 Freq-b2 Freq-bl Freq-bO (LSB) Word Table 6. AD9851 8-Bit Parallel-Load Data/Control Word Functional Assignment [Analog99] The 32-bit frequency control word provides output-tuning resolution of about 0.04 Hz with a 180 MHz clock while the 5-bit phase modulation word enables phase shifting of its output in increments of 11.25*. The 6x REFCLK multiplier eliminates the need for a high-speed reference oscillator with minimal impact on SFDR and phase noise characteristics. 54 The parallel loading format is used on the FINI board by connecting the 8-bit data bus (DO-D7) on the AD9851 to 1/0 pins on FPGA-2 (Slave). D7 is the MSB (most significant bit) and DO is the LSB (least significant bit). Before programming commences, RESET (pin 22) on the AD9851 must be asserted high. This active high pin clears the DDS accumulator and phase offset register to achieve 0 Hz and O' output phase. The default setting is parallel mode programming with the 6x multiplier disengaged. The AD9851 is programmed by parallel loading the register contents on the rising edge of W_CLK using consecutive iterations of 8-bit words. Once the 40-bit register is loaded and known to contain only valid data, FFQUD is asserted high to asynchronously transfer the contents of the 40-bit input register to be acted upon by the DDS core. To achieve a 100 MHz system clock, a 16.6 MHz reference clock is connected at pin 9 with the 6x multiplier engaged. This will provide an actual 99.6 MHz system clock at the output of the AD9851 (pin 14). For an output frequency of 20 MHz, the frequency tuning word can be determined by solving the following equation given the system clock. fouT = (AP x SYSCLK) / 232 With phase set to 0.00 degrees, 6x REFCLK multiplier engaged, power-up mode selected, and an output of 20 MHz (for 99.6 MHz system clock) the 40-bit control word looks as follows: WO= 00000001 W1= 00110011 W2= 01100111 W3 = 11010110 W4= 11100000 55 5.5 POWER SUPPLY Components on the FINI require either 2.5V or 3.3V so two power sources are needed. Both power supplies will be generated in a similar fashion using a regulator. 5.5.1 LT1374 Both power sources are designed using Linear Technologies' LT1374, a 4.5A, 500kHz step-down switching regulator. An external resistor divider connected at the feedback pin sets the output voltage (FB, Pin 7). Resistor values are shown in the table below. Output Voltage R2 R1 2.5 V 4.99 kW 1.82 kW 3.3 V 4.99 k 165 K Table 7. Resistor Values for Resistor Divider in Power Supply Circuitry Specifications about the surrounding circuitry required for the LT1374 is located on the data sheet. To achieve high switch efficiency, a voltage higher than the input voltage is needed by the switch driver. This higher voltage is generated with an external 0.27uF capacitor and 1N914 diode connected at the BOOST pin, thereby allowing the switch to saturate. Many possibilities exist for the inductor and output capacitor, but attention should be paid to their respective series resistance. Low ESR capacitors from AVX and Kemet are used on board the FINI. A 1N5821 Schottky diode is used for the catch diode. 5.5.1 Decision Factors A stable and accurate power supply is required to support the FINI board. The regulator and associated components for the power circuitry are all surface mount components, which is good 56 for a space saving design. Previous experience in designing power supplies with the LT1 374 has demonstrated its reliability. 5.6 CABLES AND CONNECTORS Connectors at the edge of the FINI PCB provide the interface between the data lines on the FINI and the wires in the cable connecting each FINI together. Communication between the transmitter and receiver occurs via 8 LVDS data channels and a pair of clock signals for a total of 9 differential pairs. Therefore, the external connector on the FINI must support a minimum of 18 LVDS channels, or 54 individual signals since each LVDS signal is composed of a twisted pair and a ground wire. 5.6.1 LVDS Cables A minimum of 54 wires is required. Since cables are only commercially available in certain pin counts, a 64-pin, 24 pair, 30 AWG composite shielded cable is used for interconnect. These cables contain 20 shielded twisted pairs with grounding and 2 shielded twisted pairs without grounding. This leaves 2 extra pairs with ground references and 2 extra pairs without grounding for alternative purposes, such as testing. All critical signals are routed on the pins with ground references. Cable specifications are listed in the table below: 100 ± 5 i @ TDR Differential Impedance Mutual Capacitance 14 pF/ft Nominal Time Delay 1.43 ns/ft Nominal Skew Between Pairs 0.035 ns/ft Maximum Skew Within a Pair 0.010 ns/ft Maximum 0.13 dB/ft Nominal @ 100 MHz Attenuation 57 1/0% Maximum in 10 meters @ 300 ps Tr Far-End Crosstalk 0.10 f/ft Nominal @ 20 C Conductor DC Resistance Table 8. Electrical Characteristics of Cable [Madison99] 5.6.2 Double-Decker Connector A 64-pin cable requires a matching 64-pin connector. Since the basic functionality of the FINI is duplicated, two connectors for the transceiver interface are required. In the interest of conserving board real estate, a double-decker LVDS connector from Foxconn is used. 5.6 TEST POINTS As this is the first version of the FINI, debugging will be an integral part of the design process. Additional headers are added on the FINI to facilitate ease of debugging. Six 20-pin HP headers are placed on board with critical signals routed to it. Relative placement of the headers to their signal source is not as crucial as ensuring the integrity of the signal itself. High-speed signals such as the LVDS signals, however, cannot be tested in the same manner. To ensure that a clean signal is tested, the signal is measured from an SMA connector connected to a 50 ohm trace flowing from a 4950 0 resister. 58 4950-ohm Resistor 50-ohm Coaxial Cable, SMA Interface Scope 50-ohm Terminator at Scope Input Figure 19. High-Speed Signal Probe Setup 59 CHAPTER 6 BOARD DESIGN Design of a printed circuit board begins with schematic capture, followed by PCB layout and PCB routing. For the purposes of designing the FINI, Protel 99 Design Explorer SE was used. This program has capabilities including schematic capture, embedded applications, signal verification and board layout. 6.1 SCHEMATIC CAPTURE With the high level architecture set and all the critical components selected, the board design process begins with schematic capture, where a diagram of the design concept is drawn in a computer-aided environment. Capturing the design in a diagram creates an integral link between conceptual design of a circuit and its physical expression. Capturing the "logic" of a circuit allows the integration of simulation and physical layout in the design process. This proves extremely useful when it is time to layout the PCB. 60 6.1.1 Component Library A board design is composed of components and wires while schematics are composed of symbols and lines. A component library is a collection of symbols, each representing a unique component. Each pictorial representation contains the pin out information for the component including the name and number of each pin. Each pin can also be designated as an input, output, power, ground or bi-directional signal. This designation will be useful when the design rule checker is run to verify the schematics. Additional information can be associated with each component to designate the part number, footprint, manufacturer, manufacturer pin number, etc. These are entered in text-fields associated with each part and can be reported in the Bill of Materials (See Section 6.1.4). Complicated components can be sectioned off into various 'parts.' For instance, the schematic symbol for the Xilinx FPGA is a compilation of 11 parts, each looking like its own schematic symbol. The 11 parts correspond to the 8 banks of the FPGA, two power sections and a set of ground pins. Another potential use of the 'part' feature is for multipart components, such as one gate in a 7400. Components are broken down into multiple parts for schematic clarity. A schematic symbol for every component used on the FINI card must be available. Protel 99 SE ships with several component libraries from common manufacturers. For common parts such as resistors, capacitors and inductors, the included libraries are used. For all other components on the FINI, a new schematic symbol is hand created. The option of using component symbols from included libraries exists, but these libraries are not always 100% error free. Although using preexisting libraries would save time, an error would be a costly irreversible mistake in the long run. 61 6.1.2 Schematic Sheets Once schematic symbols of all components are drawn and stored in the component library, the schematics are drawn using Protel's Schematic Editor, which supports single sheet, multiple sheet or fully hierarchical designs. FINI is designed on multiple sheets in a flat hierarchy. Components are placed onto schematic sheets directly from the Schematic Library Editor. Changes to components are performed at the library-level so a global update can be performed to update all components on schematic sheets. After components are placed on the schematic sheets, they are "wired" together as though the circuit were being physically hooked up. Each wire segment is called a net, and is identified with net labels on a sheet. Labeling nets is not mandatory, but is good practice in board design as it helps the debugging process later on. The completed FINI design is shown on 11 schematic sheets in Appendix A. 6.1.3 Schematic Verification To identify components on the PCB, a unique designatoris assigned to each component. This will prove helpful in identifying errors in the schematics as well on the board layout later on. The process of assigning and reassigning designators is called annotation.A designator is composed of a letter followed by a number, such as RI, R2, C3, C4, or U5. The letter represents the type of component as explained in Table 8 below while the number makes the designator unique. C Capacitor D Diode F Fuse J Header 62 L Inductor R Resistor U Integrated circuit X Crystal Table 9. Designator Key After the schematic is completed and wired, it must be verified for accuracy. Running the Electrical Rules Check (ERC), a feature in Protel 99 SE along with most other similar programs, identifies basic electrical errors such as floating input pins on parts and shorts between two differently named parts. Specifics of the ERC can be set in the ERC Dialog Box. The ERC can also generate a text report listing all ERC report information. The ERC will report errors using designators and either net labels specified by the designer or an internally generated net label. Locating the error is much more efficient using designer specified net labels as previously mentioned. 6.1.4 Bill of Materials A Bill of Materials (BOM) is generated to assist in ordering components. A BOM is a spreadsheet generated by Protel 99 SE that contains information about all parts used on the PCB, including but not limited to quantity of each component, manufacturer, manufacturer pin number, footprint and description. See Appendix B. 6.2 BOARD LAYOUT PCB layout is the process of arranging components and wires on a board in their desired location. Where the components are placed and how the signals are wired play a direct role in the signal 63 integrity of the board. This section will explore the issues associated with board layout, routing and signal integrity. 6.2.1 Footprint Library A footprint library is a collection of component footprints. Each footprint depicts the size of the solder pad needed on the PCB for each pin of the component. Each solder pad is identified with a number (or name) corresponding to the pin number (or name) of the component. The size of a solder pad should be larger than the pin, providing enough room to hand-solder the component on the board. As a general rule, the pads should be about 150% the size of the pin itself. Component dimensions can be found in mechanical drawings that accompany the data sheets. 6.2.2 Component Placement Protel 99 SE has the capability of synchronizing a design by updating the target document (PCB) based on the latest design information in the reference document (schematic). All component and connectivity information is extracted from the schematic, matched with their respective footprints and placed on the PCB workspace of choice; initially this should be an empty workspace. Connection lines are automatically added between connected component pairs. The result of synchronizing the design is what appears to be a heap of components and wires all piled on top of each other on the PCB workspace. Much of the challenge in designing PCBs is where to place the components and how to clean up this heap. The physical placement of each part and component on the PCB determines the routing of wires on the board, which has a direct influence on signal integrity. Minimizing wire lengths and ensuring straight wires decreases noise. Therefore, components with many common nets should be placed closely together when possible. 64 With over 250 individual components on the FINI, it would take a long time to manually place each component. One of the most time consuming processes is placing the decoupling capacitors with their respective components. To speed up this process, Protel is equipped with an AutoPlace feature. It can separate the heap of components and wires, but is not aware of many other signal integrity issues. After attempting to use the AutoPlace feature a few times, I found it wise to take advantage of only some of Protel's classification and sorting features and manually place the remaining components. One convenient way of locating a group of components on the PCB is by highlighting it in the schematic first and telling Protel to select those same components on the PCB. Once the components are selected are the PCB, they can be dragged out of the pile. By doing this for each group of components on the FINI card, I was able to separate the heap and group all components in a logical fashion lumping relative components together. Components can be assigned to a class, a name associated to a group of components. For example, FPGA-l and all its associated circuitry and components belong to the class titled 'FPGA-1.' One or more classes of components can be assigned to a room. A room is a virtual grouping of one or more classes of components and allows the designer to move the entire room of components at once by dragging a box on the screen in the PCB Layout window. Once all the components are separated into classes and assigned rooms, manual placement is more much more manageable. Each class of components is laid out as its own entity. When all classes have been laid out, the board is pieced together by arranging the rooms in the desired locations. 65 Figure 20. Top View of FINI Board Figures 20 and 21 present a top and bottom view of the FINI card with all major components depicted to scale. Data enters the FINI card from the parallel port located at the edge of the board. The FPGAs are centered in the middle. To their right are the transmitter and receiver and on the far right is the double-decker header serving as the output of the networking interface. To enable on-board testing, the design is for the most part duplicated on the reverse side of the board. The only items not duplicated are the 3.3V power supply, the clock and the parallel port. The double-decker header contains two separate 68-pin LVDS connectors and is shared by the top and bottom configurations. 66 Figure 21. Bottom View of FINI Board Before proceeding to routing, the routing density of the board should be checked. A density map displays a colored map of the board where green represents "cool" or less dense regions and red represents "hot" or very dense regions. Regions with a lot of red may need to be analyzed in hopes of relieving the hot zones because there is probably insufficient room for all the signals. If there is not enough room on the board to spread out the components, more routing layers may be required. 6.2.3 PCB Board Stack (Layers) Before routing can begin, the number of layers in the PCB must be determined. Most high performance digital system PCBs have between four to eight layers, including a top layer, multiple mid layers and a bottom layer. Components can be placed on the top and/or bottom 67 layers. Of the mid layers, one is usually a dedicated ground plane and another is a dedicated power plane. The remainder, if any, is for routing signals. Dedicated power planes minimize the resistive loss between power supplies and the components while providing low inductance paths between the power supplies and the power leads on the ICs. More signal layers can be added if there is insufficient room to route the signals, but it will increase the cost of the PCB. Core Prepreg Layer 1 (Horizontal) m } Core Layer 2 (Vertical) Prepreg GND Plane W=} Core +V33 Plane Layer 3 (Horizontal) m=4 Layer 4 (Vertical) wm } Figure 22. FINI Board Stack Diagram The FINI card is a six layer board with components on both the top layer and bottom layer. There are four signal layers (1H, 2V, 3H, 4V), a ground plane (GND) and a power plane (+V3.3). Signals on layers 1 and 3 (1 H, 3H) run predominantly horizontal while signals on layers 2 and 4 (2V, 4V) are mostly vertical. This alternating scheme is used to maximize routable board space. By keeping signals on adjacent layers orthogonal, crosstalk is prevented because signals are isolated and do not interfere with each other. The ground and power planes serve as additional shields between parallel layers. Components on the FINI card run on either a +2.5V or a +3.3V power supply. The majority of the power nets are +3.3V, so the internal power plane is initially assigned to +3.3V. To accommodate the additional +2.5V nets, part of the +3.3V power plane is sectioned off, or "split" off and 68 assigned to +2.5V. A split plane for +2.5V is created by placing special boundary tracks to encompass all pins on this net as seen in Figure 23. Figure 23. Split Planes on the Internal +V33 Power Plane 6.2.4 Routing Once all the components are set in place, the PCB is routed by translating the logical connections depicted in the schematic into physical connections on the board. The board is routed according to a set of design rules, a set of guidelines set by the board designer. The design rules used for the FINI card are as follows: Trace width constraint: 5 mil minimum, 6 mil preferred Clearance constraint: 5 mil space Vias: 10 mil hole, 24 mil width 69 Clearance to edge of board: 25 mil Silkscreened lines: 8 mil Manually routing every single net by hand would take months to complete. Protel 99 SE has various features than can help speed up the routing process. It can automatically route the entire board or just a portion of it. Sections on the FINI card with critical signals are routed by hand first and locked in place. Protel then automatically routes the remainder of the board. The Auto Router does a mediocre job at routing and usually completes routing around 90-95% of the nets at best. This leaves some nets to be hand routed in the end. In addition, many routes will need to be cleaned up after the Auto Router has finished its task. Excessively long traces are identified and rerouted by hand as well. Power Supply The FINI board contains one +3.3V and two +2.5V power supplies. These critical signals are routed using a wider 12 mil trace to provide more current. On the power supply circuitry, the external bypass and catch diode are placed close to the VN pin (Pin 5) to minimize the inductance. At switch off, a voltage spike will be created by any present inductance adding to the VCE voltage across the internal NPN. The leads into the catch diode, switch pin, and input bypass capacitor are kept as short as possible to minimize magnetic field radiation. The lengths and area of all traces connected to the switch pin and BOOST pin are minimized to decrease electric field radiation. A ground plane is used to connect the GND pin of the LT1374 and the "ground" end of the load to prevent interplane coupling and ensure they are at the same voltage by preventing other currents to flow in between them. The path between the input bypass and the GND pin is also minimized to reduce EMI interference. 70 Feedback resistors and compensation components are placed far away from the switch node. The high current ground path of the catch diode and input capacitor is kept very short and separate from the analog ground line. High-speed Differential Signals The transmitter and receiver are clocked at 100MHz. Each set of differential signals must travel from the transmitter to the connector, across the cable which may run between 1 to 10 meters in length, and finally from the connector at the other end to the receiver. The validity of these signals depends upon how well matched the lengths of each pair of differential signals are. The amount of skew from transmission to reception can be calculated in three sections. First there is the skew on the traces between the transmitter and the connector. Then there is the skew in the cabling. And finally there is some amount of skew on the traces between the connector and the receiver. The signal between the transmitter and connector as well as between the connector and receiver on the opposite end must be carefully matched in length. Protel is equipped with the capability to match lengths, but its algorithm accomplishes this by increasing the lengths of all the nets until they are equal. After several attempts at using features within Protel, I realized that its capabilities did not meet our needs. The FINI design requires matched length signals that are as short and straight as possible. Protel fails to support the latter criteria. Critical signals such as these are therefore routed by hand to within +/- 5 mils. The skew in the cable is specified to be a maximum of 0.010 ns/ft. Clock signals A clean clock signal is crucial for all components on the FINI card to function properly. Special care is taken when placing the DDS and clock circuitry components and associated signals. All components are placed on the same layer (1H) and arranged linearly such that the signal flows in 71 a straight line as much as possible from source to destination. When possible, turns and angles are avoided to prevent reflections. A ground plane is poured around the clock circuitry on layer 1 and 2 (1H, 2V) isolating it and preventing noise from elsewhere on the board to interfere with the signal. The 100 MHz system clock, SYSCLK, is generated by the DDS and clock circuitry. It is routed by hand from the output of the DDS to each component on the FINI in the order that data travels through those components. For the data transmitting circuitry, the clock signal should arrive at the FPGA and SRAM before the transmitter. For data receiving circuitry, SYSCLK should reach the receiver before the FPGA and SRAM. Routing SYSCLK becomes more of a challenge when the design is duplicated on the board. The general routing requirements are still mandatory while keeping the length of the SYSCLK net as straight and short as possible with no branches. 6.3 BOARD FABRICATION With the completion of schematic capture and board layout, a Bill of Materials (BOM) list and Net list are generated. These items, along with the board layout, are sent to a third-party vendor to fabricate the PCB. Once the boards return from the vendor, it is assembled with the appropriate components. 6.4 PCB MATERIAL AND PACKAGE PROPERTIES Printed circuit boards are constructed of alternating layers of conductors (copper) and insulators. Most PCBs are coated on both sides with solder mask to protect circuits during solder operations. In all but ceramic boards, the conductor metal is copper foil. The characteristics of conductors, 72 especially the thickness of the trace, are important. Trace thickness directly influences its impedance as discussed in Chapter 4. The FINI is constructed using 1 oz. Copper. 1 oz of copper is the amount of copper weight spread over 1 square foot with a thickness of 1.4 mils. Copper is electro-deposited onto the board through a chemical building process in which individual copper particles are electroplated to form the desired sheet thickness. Layers 1 and 6 are sheets of copper foil. The layer pairs are exposed using the films created from CAD, developed, and etched. After all the layers are etched then they are stacked with prepreg put between the layers. The stack is then put into a high temperature press that heats the material letting the prepreg flow. After the stack is cooled, holes are drilled and plated. The board is complete after solder mask is applied and the silkscreen is put on. When possible, surface mount (SMT) components are used instead of through-hole components. They are smaller, take up less real estate, and reduce the distance between components. Thus the travel time associated with routing over a distance is decreased. The parasitic loading due to the package leads is also significantly reduced. Packaging costs for SMT packages are less than that of through hole, and lead inductance is lower, reducing ground bounce. 6.5 BOARD ASSEMBLY After the bare FINI card is visually inspected for defects, the components are soldered onto the board. A systematic approach to stuffing the board will help in the verification and debugging process. First, the components enabling the system power, SYSDCIN, are soldered on followed by the three power supply circuits, two +2.5V and one +3.3V. Each power supply circuit is tested using a volt-meter to ensure that the correct voltage is outputted. Next, the clock circuitry is 73 assembled and tested, followed by the parallel port and FPGA-1. Once these are verified to be working, the rest of the components are placed on the board. 74 CHAPTER 7 APPLICATIONS The advantages of implementing a design with an FPGA as the core processor are apparent after all the physical design issues have been resolved. Re-programmability of the FPGA facilitates the possibility of implementing various tests on the FINI to determine the capabilities of the components and design itself. But more importantly, the FINI can serve as the basic building block for the interconnect network in HPC systems. Because the ARIES HPC processing nodes are not ready at the time of writing, the FINI card cannot be tested at its full capacity. This section describes some tests that may be implemented to test the capability of the FINI card. It also gives a preview on how FINI cards can be used to implement the interconnect network of an HPC system. 75 7.1 CAPABILITY TESTS The tests that can be implemented on the FINI are nearly endless. The nature of the tests depends on what the goal in mind is. This section presents some possible tests that can easily be implemented in Verilog. These tests will require a PC to program the FNI card via its parallel port, and two FINI cards connected by high-performance cables. 7.1.1 Latency Test Latency is defined as the time it takes for data to travel from the source node to the destination node for a given cable length. With both boards synchronized to the same clock, a random sequence is generated inside FPGA-1. Because the amount of data to be sent will vary, latency is calculated from when the last data word leaves the source node (FPGA-1) to when it arrives at the destination node (FPGA-4) based upon a time stamp attached to each data word. To test the signal integrity of the interconnect network, a known sequence of data can be transferred in place of the random sequence. This allows the possibility of verifying that the sequence at the destination node is indeed the same as the one sent from the source node. FPGA-2 FPGA-3 RCV t = ? FPGA-4 FPGA-1 FINI Board FINI Board Figure 24. Diagram of Latency Test 76 7.1.2 Round Trip Test Data traveling between processor nodes within an HPC system may be routed through multiple network nodes before arriving at its intended destination. The number of networking nodes it must be routed through depends upon the size of the HPC system and the implemented network topology. To gain an understanding of the delay inherent in routing data through one FINI card, the round trip test is performed. Connecting two FINI boards in a transmit-receive loop allows testing of the round trip delay. On the first FINI board, Transmitter-I sends data to Receiver-2. Inside the second FINI board, data from Receiver-2 is routed to Transmitter-2, which then sends data out to Receiver-1. Like the latency test, all clocks are synchronized and a time stamp is appended to the last data word. Upon arrival at the destination, a comparison of the arrival time and that on the time stamp will reveal the time it takes one data word to make the round trip loop. For varying data lengths, the total round trip time required for an entire sequence of data can be determined by placing the time stamp at the beginning of the sequence. Comparing the time the first data word left the source node to when the last data word arrives at the destination node will reveal the total time it takes to route data through a network node. FPGA-2 FPGA-3 RCV =? XMT RCV F PA-4 FPGA-1 FINI Board FINI Board Figure 25. Diagram of Round Trip Test 77 7.1.3 Cable Integrity Test The DS90C387/DS90CF388 transmitter/receiver chips are specified to support cable lengths up to 10 meters. To verify this length and to test the limits of the chip technology, data will be transmitted from Transmitter-I to Receiver-2 for varying cable lengths at various bit rates (variable clocks). The bit error for each trial will be recorded. Resulting data should show bit error versus cable length. FPGA-2 RCV FPGA-1 1 FPGA-3 xmrd RCV Cable Length 2 FINI Board FPGA-4 FINI Board Figure 26. Diagram of Cable Integrity Test 7.2 NETWORK DESIGN As described in Chapter 3, the FINI is the high-speed network interface in the ARIES HPC system that facilitates communication between the HPC processor node and the network itself (see Figure 27). This section will describe how the FINI architecture can be duplicated to implement the physical network on which the METRO Routing Protocol will run. The actual network design will be dependent upon the number of HPC processor nodes and the application at hand. 78 HPC HPC Processor Node FINI FINI Processor 4dNode HPC po.HPC Processor Node FINI 14FINI Processor Node HPC HPC Processor Node FINI Network FINI Processor Node Nod Nod e__j Figure 27. Diagram of HPC System (Processor Nodes, FINI, and Network) 7.2.1 Routers The network connecting all the HPC processor nodes is composed of a grid of routers. Each router is characterized by the following parameters: " i, the number of input ports " o, the number of output ports " r, the router radix " d, the router dilation. Generally, square routers are used in which i = o = r - d. A 2 input and 2 output router with dilation-1, radix-2 as shown in Figure 28 can be assembled by connecting two FIN cards together. 79 Figure 28. Dilation-1, Radix-2 Router Larger routers can be assembled by grouping more FINI cards together. A dilation-2, radix-2 router is pictured in Figure 29. This is essentially the basic FINI architecture duplicated 8 times, achieved using 4 FINI cards. The second generation of FINI boards should have the enhanced capabilities required of a router. Figure 29. Dilation-2, Radix-2 Router 7.2.1 Multipath, Multistage Networks Utilizing routers as the building blocks, a multipath, multistage network can be built to accommodate varying needs: more bandwidth, increased fault tolerance, larger (smaller) machines, and decreased latency. Figure 30 gives one example of an 8 x 8 multipath, multistage network configuration. Implementing such a network in a 16 node HPC system would require 64 FINI cards: 16 for the FINI interface, 16 for the dilation-I routers, 32 for the 8 dilation-2 routers. 80 FINI Interface Radix-2, Dilation-2 Router 'A Radix-2, Dilation-1 Router Figure 30. 8 x 8 Multipath, Multistage Network Assuming that the interconnect speed is limited by the capability of COTS components, additional bandwidth can be achieved architecturally. Increasing the number of connections to and from the network will increase bandwidth as well as network fault tolerance. Routers with greater dilation provide more potential for path expansion and hence better fault tolerance. Flat, multistage networks may provide better performance for smaller networks while fat-trees or hybrid fat-trees are better choices for very large machines [DeHon93]. The physical components and system requirements will determine where the crossover lies. 81 Latency can be improved, but only at a price. A larger router radix will decrease the number of stages necessary, thereby providing a lower latency. At the same time however, router radix is limited by the number of pins on the routing component. To increase the radix, a larger component with greater pin count is necessary. This has the side effect of increased costs that may not be desirable. Implementation of a robust, low-latency network for large-scale computing is dependent upon not only the components used at the physical layer, but also on the wiring scheme. The network design should complement the limitations of the physical design. In most cases, architectures that tolerate faults in networks do not necessarily require additional network latency. However, minimizing latency may increase costs of the overall system. Resolving these trade-offs will depend upon the nature of the application. Using FINI cards as the building blocks, various network designs may be experimented with to determine the most efficient interconnect network. 82 CHAPTER 8 CONCLUSION Much of the reliability and scalability of an HPC system is dependent upon the high-speed interconnect network. The FINI card not only serves as the building block for the interconnect network of the ARIES HPC system, but its design serves as an evaluation of component capabilities and validation of high-speed signaling techniques. This section will analyze the FINI design in the context of the ARIES HPC system. The current design will be evaluated and topics for future work will be suggested. 8.1 FINI ARCHITECTURE The primary function of the FINI is to serve as the network interface for an HPC processor node. Additionally, its modular architecture lends itself to being the basic building block for the network routers. Although the FINI architecture as depicted in Figure 5 meets the requirements of 83 the interface and routers, subtle differences between the requirements of the two suggest that tailoring the FINI architecture to specific needs will reduce costs. 8.1.1 Network Interface Architecture The original FINI architecture only featured one transmitter/receiver chipset. This minimal design is sufficient to test the on-board components and signaling technology used to facilitate communication between two processor nodes. To improve fault tolerance and support multipath communication, the FINI card designed replicated the original architecture to contain two transmitter/receiver pairs. The architectural design of this interface is sound and improvements should be focused on maximizing component capabilities. 8.1.2 Basic Router Architecture The METRO Router Protocol running on the routers is essentially a state machine that passes data from source to destination based on information coded in the message header. Because it does not need to store information for extended periods of time, the SRAM modules in the original FINI architecture may be removed from the router components. This frees up many 1/0 pins on the FPGA, but not quite enough to only use one XCV 100-6PQ240C to control the transmitter and receiver. Future versions of the router component may consider using one larger FPGA as the controller for each transmitter/receiver pair. The basic router component can be simplified to look as depicted in Figure 31. The actual router board should duplicate this design on the reverse side of the PCB to produce a dilation-I radix-2 router. 84 XMT RCV FPGA FPGA RCV XMT Figure 31. Basic Router Building Block Architecture 8.2 HIGH-SPEED INTERCONNECT An efficient signaling technology should maximize the bandwidth per pin, minimize power dissipation and have good noise immunity to provide maximum system performance. Differential signaling is a widely used technology for high-speed applications and suits the needs of our application well. Components supporting differential signaling are also widely available from an array of manufacturers. 8.2.1 Transmitter/Receiver The network's bandwidth is determined in part by system architecture but ultimately limited by semiconductor technology. The pin count on the transmitter/receiver determines the available bandwidth. Although more than one transmitter and receiver may be placed in parallel to increase the bandwidth, the gains often do not justify the increase in cost and board real estate (which further increases production costs). 85 Table 10 below compares several commercially available chips that may be considered for use on future implementations of the network. This is not a comprehensive list of available choices as component manufacturers are constantly introducing new products. An investigation of other transceivers on the market should be completed before a decision is made. Manufacturer Serializer Technology Data speed Enhanced Capabilities National LVDS 5.38 Gbps Deskews +/- 1 LVDS bit time Semiconductor DS90C387/ 300 ps skew tolerance DS90CF388 Texas Instrument GMII 2.5 Gbps GMII 1.25 Gbps TLK2500 Motorola WarpLink Quad 40-bit times of link-to-link media delay skew frequency offset tolerance +/- 250ppm Table 10. CommercialTransceivers Although the receiver technology may vary, the more critical elements of consideration are data rates, bandwidth and skew tolerances. Future models of HPC systems will be faster and require more bandwidth than today's applications. Designing for tomorrow's applications will ensure the HPC system is not a shelved product. Data traveling over long cables are prone to skew which can ultimately lead to signal distortion at the receiver end. The transmitter's ability to deskew and handle tolerance will increase the reliability of the data transferred. The advantage of this is a decrease in system costs by supporting cheaper cables. 86 8.2.2 Cables and Connectors The cables and connectors used to link the network together will be dependent upon the chosen transceiver's technology. The important considerations in choosing a cable are its impedance, time delay and skew both between pairs and within a pair. Some cable companies to consider are AMP, Foxconn and Gore. AMP sells a wide range of cables, Foxconn supplies mostly large OEM customers and Gore carries very high quality but more costly cable assemblies. Choosing a connector for a high-speed application is much more complicated than deciding on cable. Ground structures internal to the connector provide low-impedance signal return paths for low crosstalk. They also increase the parasitic capacitance to ground of each pin, balancing it with the pin's series inductance to minimize signal distortion. AMP, Augat, Foxconn, Gore, Teradyne, and 3M all offer connectors designed for high-speed applications. The connector of choice must be compatible with the cable type. 8.3 FPGAs The core of the FINI architecture is the FPGA. It is responsible for controlling the functionality of the peripheral chips on the FINI card. In addition, the METRO Router Protocol will run on the FPGA. The chosen FPGA must have enough 1/0 pins to interface with the other chips on board. To ensure that it has enough logic cells and gates for the application, it is suggested that the Verilog code be developed prior to the board design. This ensures that the smallest FPGA that can house the compiled design is chosen. 87 8.4 CONTINUAL EVALUATION The first version of FINI that was designed examined many existing technologies, including highspeed signaling techniques and various COTS components. The design process revealed both capabilities and deficiencies of the components used. As semiconductor technology continues to improve, higher density packages will be available. Chips with improved capabilities will be commercialized and new design techniques will evolve. The design cycle as observed in the design of the FINI card will continue. With each design experience, knowledge gained may be passed on for future engineers. A keen eye for industrial advancements and courage to explore new design techniques will ensure long term digital design success. 88 APPENDIX A FINI BOARD SCHEMATICS Schematics of the FINI card were designed using Protel Design Explorer 99 SE. A full set of schematics can be found in this section. 89 00= . .I . . I - 1-1.. I ". 1. -- . 1--,...-...1.....-.-._ ul 1914+V25 M +V25 C6 LT1374RReulator 5 Vin BOS )X-=-C Vsw -FB ED I .. D6 +2.5V Power Supply .SYSDCIN .. .. .. . . I-' 4 4 3 D .. . .. I-- . .... I .... .. - - D C13 0.27uF 7 .luF uF. 25V A, IN5821/MBR30 4.9K +V25 BOTTOM 150F C9 330uF I uF C14 +25V Power Supply C J13 SYSDCIN FI SYSDCIN 5 C D7 MM U2 LT1374 Regulator Vin 14 C8 6 0.27uF wt7 SMDI0O D 2A RASM712 0 I uF, 25V 4.99K IA. IN5821/MBR33O 150OpF C1 B B D5 JPAMBD914 +3.3V Power Supply +V33 SYSDCIN 5 U3 LT1374Reuao Vill BOOST )(-1-C V C7 7iD 0.7uF +V33 l0uH- C12 10uF, 25VR3 L15 0.luF 1.82K IA IN5821/MBR330 A r l50pF 4.9K Title 330uF Size A Date: Pie: I1 A POWER SUPPLY CIO 2 2 Revision Number 1.4 INI 1.0 8-May-2000 C:\Aiies\FiNl\FNLDdb bf S Drawn B: 11Ieet 4 Chen ........ .... . ....... .... ...... ...... 2 6 5 4 3 D +V33 +V3 DECOUPLING CAPACITOR V33 045 . I .JUF Iun44 ii 47 0 h. h. +V3 R46 .0 dhm +V3 L4 L5 L6 o" CAPS 0.IuF I O.UF 0.I.F I0Dp I IpFj 22pF C246 DDS D0 4 D L.SB T 11 88190 T Iff C 4 pF L T3 pF 5 89 22F 3 497p D 18K I** VDS 137 +V3. W MMLA +V33 XE B W-ax DD R85 2 15 B M DED..AD9 w W I2M6 RESISTORS GND -DDS-A A A Ckx-k Si File I 2 2 3 4 -d DDS FINI 1.0 C iaIN1INLDb 1.4 1 D 6 Pg) Ch I A M4A $3333145SiEEE iii7iii iiiiiiiiii 2 SP a,-l 4s 2' On - >J4lll i 2222222222 ma- 92 - g MT8 l 2222292go n ie F-s 1-1 rz 8 Art 6'"d MOM + a in a --- =Mm= ownasaaa mkkkkkkk u -- I Al' H -+- II 8 .11 - -d T~ + -~ -~ ~- ~ I- j- 18 18 18 9 ! 2 all2~ 2 lip? IIMP2 j ii I.'am " A e - |i+ MW 2 1. Rmss E ill11 mum asA E6 a av LA vsci MW to 11 I 1 ...... .... -sn """"iEr m:Im ? Po 0" >1 ilIlll 1> illl 1 A kflIT10 T lll I LLLLLLLLLLLLLI4kkLLThkLkk1~LkkkI I % 94 li >91 22222 + coo ITmaI -4 I -1 -4 -4 Lin, 1L '~- ~FL I I H H H H .11[ I 4W --i I Sr 71- r Mis a U -x m0 C* s m I nI ~2 0 4o MTHMMrffnfM=11r~ = et61m a m+ L11I IIiL iLLiJLiIT P, n an!! !!!s umaamsmma -I s+e 4 a W 2 3 4 5 *V23 40 41 1206 RESISTORS 0do +V33 0105 CAPS 16 1 IU 1 - p"X 0DJ IIFIIEZ 0 11, JC :11 C31XOL-r I )DATI RIO RCV1 RIO RCV II AiIM AOP AIM All' II F.14 114 P13 .314 Ris 1 R16 a AR' RcVl 010 11 010 AVP ., 015 0 16 017 ISISEL 4 XM I mi-alt XMI'l U17 PAL 73 RC7VI 02D RCV I G21 )(MTI 02D MU -0 1 020 021 23 024 iu 026 P11 P12 P13 P14 PCVI-VSC 70 RCVI IISYOC i RC7VI RcVI 59 60 DE 6, P27 211 19 BIT1 64 MOM P22 62 3W 3W 3m w ftwvl 824 3 Ap2 AOo-o AP AIM ASP 023 Vo 7 P0P C AVP AM ASP R25 P26 P27 4AM 91 010 At' ATM AlP 011 012 013 014 015 016 017 020 021 022 023 024 023 026 027 POT ITU00AL PRE PLSPL A6 XMTI 3~ OD VSYNCL JATN RyP 8 BIO RJDE P12 .13 B14 P15 B16 BI7 T DUAL El XMT SOC PAL 82D B21 BM2 B23 P24 B25 826 P21 DE 2 1 .) 1IbpP 05 HEvo J16 A 12 0 ow 05 I02I size GND 15 IP20-If'I\ IFADFR Pi I 2 I o obw DI DO r.%-n XMTI-B2D 3m- I I-XII 0 72 B13 14 II I Vs)III)-P26 74 65 I 21 P23 02 93 92 all-of- I AOP AIM Alp A2M R2D R21 R22 R23 2 1 100 3W 677 P15 16 I CNTLP/74C P20 B21 71I 94 .. 111 Xhffl-BIO RIO XMTI.1II AOM .6TRANSMITTER - 1259C387 RI7 3w RCVI III RcVj D 12 I I14 CIISPIOC 20 20 0 A um #j R13 R14 RI5 mu xw L514 4 4 UXoq-vm6N aOIJL2MNC 1 3w () I I 22 STOPL2JC 69 a. .V33 RIO 014 DESXEW RCVIDE 10 LGIO 1011 012 6 012 013 DECOUPLING CAPACITOR - RD R22 P23 P24 )(MTI-m 77 P25 3mr11m-----r6 P26 Rcvl 0605 CAP'S A 1 116 1 xMTI-R2O R20 MUIJUI R21 1. I RU R2 0 1:1I 23 1 110)p 1I31I~I0. I aFh 669u 4 CIO 1 1 A2P AMM ASP M4M AV' M5M ASP ACM :7:4 " I~ --2 11 SYSCLK LKU ,>,J R12 P 52 10P 1 42 CVI ,V33 +V33 oNDJ'LPI .CLITM C f73 +V33 fIOLP48 I 1206 RES ,6 Fde: I 4 5 6M~ RRoo. DIw CAI0D,.\"oTNFiLd M2 Lo,,P U B 6 OopC 3 2 I 74 01 1206 RESISTORS 4 +V33 D 1206 RESI1 STOPS +V33 Z -13 ,CAS 73 O~m a +V33 WV3 132 131 7 3 -F,: 3J fo F J08j100OP 064- j0If1 i000DPF L19 Q.RIN-P 0I -012 TT 4 00 ESE 113 4.2 ,is 11.4 R! R17 " A3M AM RCV5aL 6 ASP AM AeM A71' &CVZ P12 R21) R21 . A. R23 Z 4 R24 R25 R26 R27 GID RCV2-GIO DESKEW Puja BAL 02D 021 022 G23 G24 025 026 .17 RIYDE RCVLnOPCLX 10 mQ1,011 )Dfn 017- As 74 73 RCV7-G2D 1 IM-61 91 B12 B13 B14 B is 16 V92 B17 - - mm1md--- 72 MUG 2RE9.24 -.-- IM RUL-64 XMT2-Gr 66 65 92 to 92 CNTLEINC51 70 VOW000- 40 .64 63 XVTZJM- 6T- TRANSMITER - D990C397 616 AOM AIM AlP AgISM-o8UT-V AM ANP AM A31 AM AV AM AV 620 R21 622 623 624 625 R26 R27 XMT2-B26B27 drJJW ALP ASP 030o-% + 6I 3 0A13 .14 015 016 017 020 021 022 023 024 025 PRE PLLO6L 026 B RIO D12 DII DO ND )_22 _ff DUJAL BAL 820 321 622 24 XMU.DCBA DE 9 1 VI 2 020.,., A 68 7 B 20F File;, . ... ... ... .... ....... c AM GOl B23 524 -2--. 525 SU- 526 527 XMTL025 I 1116 IV:134 R13 R14 B16 at17 xbmjim 0 .PP GND& ': 513 I4 RCVZ-B2D R re A =LKO 812 11 I B20 B21 B22 B23 B24 B25 B26 827 DE 69NC 7 RCV2m I R CLI C~oZuP-vmc 012 2RGja13.. 53 Ls 2 1 73 *2L xx-11 RCV2 IISYNC I" 16 15 14 77. R.7 72 013 49 .. i 94 lwv Rob Kk;v2 Kv 012 013 a, i GIs 016 017 tZ 7 -- 6- 1I XhM-R17 146 RCV2 J= RCV2-JUI I RIO R12 14 14 li 11 --10 MAII RCV2 R12 AM AOP AIM U05CV AMp AM SYSCILK 10 R 0 DECOUPLING CAPACITOR 08 CP C-nsT~iLd FFNI11.0 IIy:P I I 4 5 6 h a OL 0UMA OFWA OWA WPA WPA WPA WPA WPA fWMA b-WA 0-MA 6-MA 6UWA R u U-M I LLI WA WPA WPA WPA WPA PPE tIA 98 ([NO 1* a NO NO aNU GNO UNO GINU aNO (IND (IND ([NO UNO UNO (IND 0 ~ g -~ (NO 1X GND GN ~ II, II, j I 0 .. ............... 3 4 6 .133~~iFFFF +V33 17 #FWFH 1+f+ If; SYSCLK S9 SKAM4 CKEO 97 D 37 SQ. SRAM3- DOO A. 1 56SQ.SRA.V3 SAO SA SA SA SA 1,7 r DQ. FPGA3 SQ. -To.S2DQ. SA SA SA SA SA SA SA ELM 69 SQa Q SRAM3 SQ, SQb SQ, DQb SQ. m &A 3 7 NCJSQL') 93 SQ. DQ. SQ. SQ. JL; C _ WAM-(,-W SQ. 97: SQd MOEr DQd 504U mfu 504U -- 24 SR WN % WAM4, LX4 SRAM4 OEN 4 24 Is.. 5041 NC/(SQd) --DQS 22 -Ax SRAM3 - -- - SRAM to MPA4 -oD.,. Q. SQ. SRA?44J)07 63 SQM DQb SQ. D02t SQ. S®RA M04 SQ. SQ. 4 SQ. 4 SQ. 9 SQ. SQ. 4 Q . M r 96 24 sm4LXQr is _a-ku 24 SQ. SQ. MSSEU 5041 1 SQ. M4 t9 I 1211 113 14 I 5I SRAM4-M]6 7 m V7 fflffilml NF 10 NIY(DQb) W.-NC/(SQ) 39 Xfs -4&1 5 E D SRAM 21 59 SQ. .AW 26 W-AM3,jQ27 SQ. 29 SQ. SQd >NNC(SQ.) SQ. SQ 1 2 NM SQ. SQ. Wa# SRA)A3J)016 ::#..StAM3-M]7 18 7 1DQ. L717 53 SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA SA qAWSA7...-4j SRAM4 46 9 SAW 10 II 12 13 14 1 > SAO SLAW _SAA _U_ SM 34 Wsm si SQ. SRAM to SA 141 6 lm zw NCJ(SQd) - 9MM4LDQLU lix NF8 xp SUB-ZBISRAM ,H+* +4# IMB-7-BISRAM B B DLCOUPING CAPACIOR0 .V33 +V31 DECOUPLNG CAPACITOR 1 214-J-15TT 0.1.8 0Z.1 ,I1.1 1.1 z E0.181' A Z 216+-217 0..1 &.F 0.111 .0, 1 !22 0TAUT . 0.1.1F 2771 1 ,22511 -01111 223 0.11.11.U 0.19F.18- 01 222 . OI11F A SRAM3. FII B IM., I 3 .. .............. ..... ........ .. .............. ........... ...... I .... ............ -11 .1-- . ....... ' .1 ........... I .... .. ........ I. ......... ........... .. ....... ....... .. .... .. 5 SR.AM 4 1.4 .10 !-200o Pil C7Ao..lP380lPNLSI 10.oIU I S.By: 6 PeOm 3 1 1 JIA AOMLIN AO TI '2 N IN4 D TI T2 T35 Tf36 T37 .fs 39 T40 T3 AL A' T6 ffE T12 T13 T14 T46 T47 T43 A7T15 T16 AFOT T17 ITIS T19 AFN T20 T22 T23 T24 T25 T26 T29 7,M AdP 0r .- A7 - A B42 -- AW IN 2 - E A L- U3 - B13 B14 A6 2 w- PP C3- 2 PP a"A4-I PP - -N -- M D6 7 _ . PP D U ~or.x PP - D6-Irr A QUT_72 A Al PP S5+ BUSY VI- A A4 B64 B66 867 PP S4 1-4- c D"2 4 "2 W- 9 B34 1 PP C2+ B49 B33 2 Wff-PlNX2_A )ER XMTI AND -- 2 CtJ-- 832 TOP CONNECTOR (FOR B4..2 B B29 B31 4A'T T68 D B7 843 B50 B53 19: a B52 B20 B54 B21 B55B36 B22 B23 R57 B24 B58 B25 B59 B26 B60 827 B61 B28 B62 be3 urf 132 T56 EXR-P T34 B8 J2 A49-OT 2 :'.6 EX - T6 5 B39 B40 B15 6 IN - T0T64 85 B6 9OUT 2 AB36U2 B44 110 B45 B12 B46 B47 IN -' -- T49 T50 T51-T52 T53 T54 T56 T57 T511-T59 TGO T61 AM 3B3 6 4 B3 WT-2 B9 - T27 T211 T62 C A 'P T42 T9 T43 Aj-~t TIO T44 TII T4AUATo BI B2 Ot A3P 1 PARALLEL PORT CONNECTOR JIB A0M-O~rf-2B AOP 2 IN A. 1 4 (P2- 6-INX2_HEADER RCV I) BOTTOM CONNECTOR (FOR XMT2 AND RCV2) B HIGH SPEED TEST POINTS - 50 AMWIN ohm RECEIVER2 1 rc SMATESIPOINT 40L All IN HIGH SPEED TEST OUT AOM 0h 13 ra 1 AOM J12 RI 4950 POINTS -TRANSMITTERI -N 2 50 R5 ohm 6ce ohm ta so I I SMATESTPOINT 14 OL T 5'12 i rce 49.0L A0P IN-2 R6 4950 CLMN P A P 34 4950 50 ohm 4950 mrae I R7. 50 N -fa = J9 I IN SMA J6S CLKIN S SMATESTPOINT ohm awe I SMA-TESTPO~iT S 50 ohm 419 SMAETSIPOINT CLKOUIT SMAJESTPOINT IN SMATESTPONT AOP - J7 .8 50 A TESTPOINT Jil olhm tramwt 1 50 HEADERS AND TESTPOINTS Size SMATESTPOINT I - 4 File: Number SFINI 1.0 C:AriesNFIN\N 4 I Ddb 6 Dr By: Pegy Chan APPENDIX B FINI BILL OF MATERIALS (BOM) The Bill of Materials (BOM) contains a list of all components used on the FINI card. Component details including manufacturer, manufacturer P/N, footprint and description are imported from Protel Design Explorer 99 SE into Excel. 101 FLEXIBLE INTEGRATED NETWORK INTERFACE BILL OF MATERIALS Revised: Wednesday, March 8, 2000. 9:08 AM By: Peggy Chen . 21_g Deslanator Part Tvoe W Footprint Descriotlon 1 193 C?? 0.1uF Kemet 805 Cer Cap, 0.1uF, 50V, 10% 2 10 C?? 1000pF Panasonic ECU-V1H1O2JCX 805 3 3 C1, C2, C3 1500pF Panasonic ECU-V1H152KBN 4 1 C12 10uF, 25V Panasonic ECS-TIED106R 5 3 C241 22pF Panasonic ECU-V1 H220JCM 1206 Cer Cap, 22pF, 6 1 C242 33pF Panasonic ECU-V1H330JCM 1206 Cer Cap, 33pF, 7 1 C243 1pF Panasonic ECU-V1HO10CCM 8 1 C244 5.6pF 9 1 C245 10 1 11 Mfr P/N Ordered Rom Prie Cer Cap, I000pF, 50V, 5%, NPO, SMT 20 $0.148 805 Cer Cap, 1500pF, 5V, 10%, 0805, X7R, SMT 10 $0.068 7243, D Tant Cap, 1OuF, 25V, 20%, TE Series 50V, 5% 10 $0.100 10 $0.100 1206 50V, 5%, NPO Cer Cap, 1pF. 50V. 5%, NPO 10 $0.093 Panasonic ECU-V1H5R6CCM 1206 Cer Cap, 5.6pF, 5OV, +/-.25pF, NPO 10 $0.093 4.7pF Panasonic ECU-V1 H4R7CCM 1206 Cer Cap ,4.7pF, 50V, +/- .25pF, NPO 10 $0.093 C246 470pF Panasonic ECU-V1 H471KBM 1206 10 $0.089 2 C4, C5 68uF, 25V Kemet T494D686K016AS D Cer Cap, 470pF, SOV, 10%, X7R ant Uap,-Mul-, I5V, IU%, U, SM1, Low USH, Industrial Grade 10 $0.526 12 3 C6, C7, C8 0.27uF Panasonic ECJ-2YB1C274K 805 Cer Cap, .27uF, 16V, 10% 10 $0.258 13 3 C9, C10, C11 330uF AVX TPSD337KO1OR0150 7243, D Tant Cap, 330uF, D, 10V, 10%, SMT, Low ESR 10-Jan samples 14 1 D1 2A Vishay B230T SMA Schottky Rectifier Barrier Diode, SMB 15 3 D2, D3, D4 1A, 1N5821/MBR330 Vishay B130T SMA Schottky Rectifier Barrier Diode, SMA 5 $0.580 16 3 D5, D6, D7 MMBD914 Fairchild MBD914 MMBD914 Diode, SOT23 10 $0.000 17 1 F1 SMD100 Raychem SMD100-2 RAYCHEME2 Fuse 3 $0.820 18 10 J?? SMATESTPOINT Amphenol SMATP SMA Connector, Vertical Receptacle 15 $2.040 19 1 J1 68-PINX2_HEADER Foxconn 901-144--8-RFX - -pin ouie connector 1 J13 RASM712 Switchcraft RASM722 1 6 J14, J15, J16, J18, J19, J20 HP_20-PINHEADER AMP 22 1 J2 DB25 AMP 23 3 L1, L2, L3 10uH 24 1 L4 470nH 25 2 L5, L6 390nH Panasonic er _ $1.130 - PINHEADER Double decker 68-pin Header RASM712 Right angle miniature power jack xox HP.20-PIN-HEADER 20-pin HP header 745783-4 DB25RA/M Parallel port connector Coilcraft D03316P-103 COILCRAFTDO3316 1OuH power inductor, 20%, SMT Panasonic ELJ-NDR47JF 805 EU-NDR39JF 805 $1.290 - 2 $3.450 Inductor, 470nH, 5%, SMT, 0805 3 $1.180 Inductor, 390nH, 5%, SMT, 0805 5 $1.180 26 36 R?? 0 ohm Panasonic ERJ-6GEYOROOV 27 30 R?? 0 ohm Panasonic 28 10 R?? 5.1K Panasonic 29 2 R11, R12 165 30 1 R13 1.82K 31 3 32 2 33 805 Resistor, 0.0 Ohm, 1/10W, 5%, 0805, SMT 50 $0.041 EXC-ML32A680U 1206 Inductor, Ferrite Bead, 68 Ohm, 3A, 25%, SMT 40 $0.259 ERA-6YEB512V 805 Resistor, 4.95K Ohm, 1/8W, 1%, 0805, SMT 20 $0.916 Panasonic ERJ-6ENF1650V 805 Resistor, 165 Ohm, 1/OW, 1%, 0805, SMT 10 $0.087 Panasonic ERA-6YEB182V 805 Resistor, 1.8K Ohm, 1/1OW, 0.1%, 0805, SMT 10 $0.916 R14, R15, R16 4.99K AVX CR21-4991 F-B 805 Resistor, 4.99K Ohm, 1/8W, 1%, 0805, SMT R83, R84 4.7K Panasonic ERJ-6GEYJ472V 805 Resistor, 4.7K Ohm, 1/1OW, 5%, 0805, SMT 10 $0.076 1 R85 3.92K Panasonic ERJ-8GEYJ392V 1206 Resistor, 3.9K Ohm, 1/8W, 5%, 1206, SMT 10 $0.084 34 1 R86 100 Panasonic ERJ-8GEYJ101V 1206 Resistor, 100 Ohm, 1/8W, 5%, 1206, SMT 10 $0.084 35 2 R87, R88 1OK Panasonic ERJ-8GEYJ104V 1206 Resistor, 100K Ohm, 1/8W, 5%, 1206, SMT 10 $0.084 36 2 R89, R90 200 Panasonic ERJ-8ENF2000V 1206 Resistorm 200 Ohm, 1/8W, 1%, 1206, SMT 10 $0.098 37 3 U1,U2, U3 500 kHz regulator Linear Technology LT1374CR DD7 4.5 A, 560kHz step-down switching regulator, 7DD 38 2 U10, U11 Transmitter National DS9C387 QUAD100 Transmitter, DS90C387 39 2 U12, U13 X9317 Xicor X9317US8 -2.7 SOIC8 Single digitally-controlled potentiometer 40 4 U17 8MBZBTSRAM Micron MTS5L256L36PT10 TQFP100 8MB ZBTSRAM, 256Kx32, 3.3V 1 1 U18 DDS-AD9851 Devices AD9851 SOP28 Direct Digital Synthesizer, AD9851 42 4 U4, U5, U6, U7 XCV100-GPQ240C Xilinx XCV100-GPQ240C QUAD240 Virtex FPGA, XCV100-GPQ240C 3 2 U8, U9 Receiver National DS90C388 QUAD100 Receiver, DS90C388 1 Xl 16.666 MHZ CTS CB3LV-3C-16.3840-T OSCILLATOR_4PIN 16.384 MHz Clock Oscillator, SMT 2 $9.500 1 X2 20.000 MHZ CTS CB3LV-3C-20.0000-T 20MHZ-CLOCK 20 MHz Clock Oscillator, SMT 2 $9.500 0 45 3 in stock APPENDIX FINI BOARD LAYOUT Appendix C provides 7 snapshots of the FINI board layout. Figure 32. FINI Board - All Layers 104 C ... ....... Figure 33. FINI Board - Layer 1 (1H) Figure 34. FINI Board - Layer 2 (2V) 105 Figure 35. FINI Board - Layer 3 (3H) Figure 36. FINI Board - Layer 4 (4V) 106 Figure 37. FINI Board - Ground Plane Figure 38. FINI Board - +3.3V Plane 107 BIBLIOGRAPHY [Analog99] Analog Devices Staff. "AD9851: CMOS 180 MHz, DDS/DAS Synthesizer," Analog Devices Data Sheet. Norwood, MA, 1999. [Bakoglu90] H.B. Bakoglu, Circuits, Interconnections,and Packagingfor VLSI, Addison- Wesley Publishing Company, Menlo Park, CA, 1990. [BCDEKMP94] Matthew Becker, Frederic Chong, Andre DeHon, Eran Egozy, Thomas F. Knight, Jr., Henry Minsky, Samuel Pretz, "METRO: A Router Architecture for High-Performance, Short-Haul Routing Networks," InternationalSymposium on Computer Architecture,May 1994 [BFTV92] Dave Blake, Christine Foster, Craig Theorin, and Herb VanDeussen, "Considerationsfor DifferentialSignal Interconnects," W.L. Gore & Associates, Inc., Electronics Products Division. [BG92] Dimitri Bertsekas, Robert Gallagher, DataNetworks, Prentice Hall, 1992. [Buchanan96] James E. Buchana. Signal and Power Integrity in DigitalSystems. McGraw- Hill, Inc., 1996. 108 [DeHon90] Andre DeHon, "Fat-Tree Routing for Transit," A.I. Technical Report No. 1224, MIT Artificial Intelligence Laboratory, 545 Technology Sq., Cambridge, MA 02139, September 1990. [DeHon93] Andre DeHon, "Robust, High-Speed Network Design for Large-Scale Multiprocessing," A.I Technical Report No. 1445, MIT Artificial Intelligence Laboratory, 545 Technology Sq., Cambridge, MA 02139, September 1993. [DLAPT98] William J. Dally, Ming-Ju Edward Lee, Fu-Tai An, John Poulton, and Steve Tell. "High-Performance Electrical Signaling," [DP98] William J. Dally, John W. Poulton. DigitalSystems Engineering.Cambridge University Press, Cambridge, United Kingdom, 1" edition, 1998. [DZ83] J. D. Day and J. Zimmermann. "The OSI Reference Model," IEEE Transactions on Communications, 71:1334-1340, December 1983. [Huang99] Andrew Huang, "Performance Modeling and Simulation of CommunicationsIntensive Processing HPC Systems," 1999 Al Lab Abstract Book. [Hwang93] Kai Hwang, Advanced Computer Architecture: Parallelism,Scalability, Programmability,McGraw-Hill, Inc., 1993. [JG93] Howard W. Johnson, Martin Graham. High-Speed DigitalDesign: The Handbook of Black Magic. PTR Prentice-Hall, Inc., Englewood Cliffs, New Jersey. 1993. [KC99] Steve Kaufer and Kellee Crisafulli. "Terminating Differential Signals on PCBs," PrintedCircuitDesign, 1999. [Lei85] Charles E. Leiserson. "Fat-Trees: Universal Networks for Hardware Efficient Supercomputing," IEEE Transactionson Computers, C-34(10):892-901, October 1985. [Linear99] Linear Technology Staff. "LT1374: 4.5A, 500kHz Step-Down Switching Regulator," Data Sheet, Linear Technology, 1999. [LM92] Tom Leighton and Bruce Maggs. "Fast Algorithms for Routing Around Faults in Multibutterflies and Randomly-Wired Splitter Networks," IEEE Transactionson Computers, 41(5):1-10, May 1992. [Madison99] Madison Staff. "24 Pair 30 AWG Composite Shielded Cable," Madison Cable Data Sheet. Madison Cable Corporation. Worcester, MA. December 16, 1999. [Micron99] Micron Semiconductor Products Staff. "8Mb ZBT@ SRAM," Data Sheet., Micron Technologies, Inc., September 1999. [Moore79] Gordon Moore. "VLSI: Some Fundamental Challenges,: IEEE Spectrum, Vol. 16, 1979, p. 30. 109 [Motorola90] Motorola Staff. "Transmission Line Effects in PCB Applications," Application Note AN1051. Motorola, Inc., 1990. [Motorola99] Motorola Staff. "MC92600: WarpLink Quad User's Manual," Data Sheet. Rev. 0.2, Motorola, Inc., September 1999. [National96] National Semiconductor Staff. "Data Transmission Lines and Their Characteristics," Application Note AN-806, National Semiconductor, 1996. [National97] National Semiconductor Staff. "LVDS Owner's Manual and Design Guide," National Semiconductor, Spring 1997 [National99] National Semiconductor Staff. "DS90C387/DS90CF388: Dual Pixel LVDS Display Interface (LDI)-SVGA/QXGA," Preliminary Data Sheet, National Semiconductor, September 1999. [Protel99] Protel Staff. Protel99: Designer'sHandbook, Protel. Star Printery Pty Ltd. 1999. [Qualcomm99] Qualcomm Staff. "Direct Digital Synethizers: Overview," Qualcomm Incorporated, ASIC Products. San Diego, CA. 1999. [Ritchey99] Lee W. Ritchey. "How to Design a Differential Signaling Circuit," Printed Circuit Design. 1999. [RB96] Lee W. Ritchey and James C. Blankenhorn. "High Speed PCB Design," SMT Plus Design Seminars, Fourth Edition. SMT Plus Inc. and Ritch Tech. California. August 1996. [Sutherland99] James Sutherland. "As edge speeds increase, wires become transmission lines," EDN, October 14, 1999. [T199] Texas Instruments Staff. "TLK2500: 1.5 Gbps to 2.5 Gbps Transceiver," Data Sheet, Texas Instruments Inc., June 1999. [Upfal89] E. Upfal. "An O(log N) deterministic packet routing scheme." In 21't Annual ACM Symposium on Theory of Computing, pages 241-250. ACM, May 1989. [Virtex99] Xilinx Staff. "Virtex 2.5V Field Programmable Gate Array," Xilinx Data Book 1999, Version 1.5, Xilinx Corporation, May 13, 1999. [VirtexE99] Xilinx Staff. "Virtex-E High Performance Differential Solutions: Low Voltage Differential Signaling (LVDS)," Xilinx Corporation. December 17, 1999. [Xicor99] Xicor Staff. "X9317: Low Noise, Low Power, 100 Taps, Digitally-Controlled (XDCPm) Potentiometer," Data Sheet, Xicor, March 22, 1999. [Xilinx00] Xilinx Staff. "Virtex FPGA Series Configuration and Readback," Application Note: Virtex Series, XAP138 (v2.0), Xilinx Corporation, February 24, 2000. 110