Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Tri-Service Communication and I/O Protocols RASSP Education & Facilitation Program Module 25 Version 3.00 Copyright 1995-1999 SCRA All rights reserved. This information is copyrighted by the SCRA, through its Advanced Technology Institute (ATI), and may only be used for non-commercial educational purposes. Any other use of this information without the express written permission of the ATI is prohibited. Certain parts of this work belong to other copyright holders and are used with their permission. All information contained, may be duplicated for non-commercial educational use only provided this copyright notice and the copyright acknowledgements herein are included. No warranty of any kind is provided or implied, nor is any liability accepted regardless of use. The United States Government holds “Unlimited Rights” in all data contained herein under Contract F33615-94-C-1457. Such data may be liberally reproduced and disseminated by the Government, in whole or in part, without restriction except as follows: Certain parts of this work to other copyright holders and are used with their permission; This information contained herein may be duplicated only for non-commercial educational use. Any vehicle, in which part or all of this data is incorporated into, shall carry this notice . Copyright 1995-1999 SCRA 1 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Rapid Prototyping Design Process Tri-Service RASSP DESIGN LIBRARIES AND DATABASE Primarily software HW DESIGN SYSTEM DEF. FUNCTION DESIGN HW & SW CODESIGN Primarily hardware VIRTUAL PROTOTYPE HW & SW PART. HW FAB INTEG. & TEST SW DESIGN SW CODE Communications & I/O Copyright 1995-1999 SCRA 2 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Module Goals Tri-Service Introduce communication and I/O protocols Describe trends and state-of-art Describe the impact of RASSP Understand the selected communication protocols used by RASSP systems Copyright 1995-1999 SCRA 3 Methodology RASSP Reinventing Module Outline Electronic Design Architecture DARPA Infrastructure Tri-Service Introduction State of the Art Impact of RASSP Advanced Applications SCI SCI overview Communication protocols Cache coherence Physical layer An Example: 2-D FFT Conclusion RACEway FutureBus+ PI-Bus Conclusion Summary Copyright 1995-1999 SCRA 4 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Module Outline Tri-Service Introduction State of the Art Impact of RASSP Advanced Applications Summary Copyright 1995-1999 SCRA 5 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Motivation for Protocol Design Tri-Service Current digital signal processors have been optimized for DSP applications The data rate is usually very high for the real-time signal processing The computational requirements are extremely demanded Copyright 1995-1999 SCRA 6 Methodology RASSP Reinventing Requirements of High Bandwidth Data Networks Electronic Design Architecture DARPA Infrastructure Tri-Service High scalability The Efficient cache coherence protocols The inter-processor communications are reduced High throughput rate It bandwidth scales well implies high bandwidth No starvation The Copyright 1995-1999 SCRA bandwidth can be shared fairly 7 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Communication Elements Tri-Service Shared Multi-drop Buses Switch Networks Point-to-point Links Copyright 1995-1999 SCRA 8 Methodology RASSP Reinventing Communication Buses Electronic Design Architecture DARPA Infrastructure Tri-Service Control Buses FutureBus+, Data Buses SCI, HIC, RACEway Test and Maintenance buses Serial PI-bus, VME Bus, TM-Bus, MTM-Bus, JTAG Input/Output Buses FC, Copyright 1995-1999 SCRA SCSI 9 Methodology RASSP Reinventing Control Buses Electronic Design Architecture DARPA Infrastructure Tri-Service NAME FB+ IEEE 896.x ISO/IEC 10857:1994 “FutureBus+” PI-bus STATUS Released 1994 Being revised PERFORMANCE 3200 MBytes/sec - 256 parallel 100 MBytes/sec - 32 parallel Required by NGCR as backplane control bus Migration path for VMEbus 50 MBytes/sec - 32 parallel Required by JIAWG as backplane control bus JIAWG J89-N1A VME64 IEEE P1014 Rev D VME IEEE 1014-1987 “VersaModule Europa” Copyright 1995-1999 SCRA Recent revision Released 1987 Widely used INTENDED APPLICATION 80 MBytes/sec - 64 parallel 40 MBytes/sec - 32 parallel Upgrade for VME bus Commercial backplane control bus for high-performance systems [Lockheed95] 1 Methodology RASSP Reinventing Data Buses Electronic Design Architecture DARPA Infrastructure Tri-Service NAME SCI IEEE 1596-1992 “Scalable Coherent Interface” HIC IEEE P1355 “Heterogeneous interconnect” RACEway VITA Copyright 1995-1999 SCRA STATUS Released 1992 PERFORMANCE 1000 MBytes/sec - 16 parallel 250 MBytes/sec - serial Under development 250 MBytes/sec - serial Proposed by Mercury 160 MBytes/sec - 32 parallel INTENDED APPLICATION Heterogeneous parallel processor Low-cost parallel processor Real-time multicomputer interconnect [Lockheed95] 1 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Test and Maintenance Buses Tri-Service NAME STATUS PERFORMANCE INTENDED APPLICATION Serial Bus Under development 6 MBytes/sec - backplane 50 MBytes/sec - cable Required by NGCR as T & M bus Used with FutureBus+ Being revised 0.8 MBytes/sec - serial Required by JIAWG as T & M bus Used with PI-bus Under development 1.2 MBytes/sec - serial Interconnect JTAG modules Based on TM-bus Release 1990 Widely used 3 MBytes/sec - serial Interconnect JTAG modules (in hierarchical structures) IEEE P1394 “High Performance Serial Bus” “FireWire” TM-bus JIAWG J89-N1B MTM Bus IEEE P1149.5 “Module Test and Maintenance Bus” JTAG IEEE 1149.1-1990 Copyright 1995-1999 SCRA [Lockheed95] 1 Methodology RASSP Reinventing Electronic Design Architecture Infrastructure DARPA Input/Output Buses Tri-Service NAME FC ANSI X3T9.3 “Fiber Channel” SCSI “Small Computer System Interface” 1553B MIL-STD-1553B Copyright 1995-1999 SCRA STATUS PERFORMANCE INTENDED APPLICATION Under development 100 MBytes/sec - serial Proposed by NGCR for data channel (sensor input and video output) Released Widely used 10 MBytes/sec - 8 parallel Interconnect workstation peripherals Released Military use 0.1 MBytes/sec - serial Interconnect separate boxes in a military system [Lockheed95] 1 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Module Outline Tri-Service Introduction State of the Art Impact of RASSP Advanced Applications Summary Copyright 1995-1999 SCRA 1 Methodology RASSP Reinventing Busing/Networking Technology Trend Electronic Design Architecture DARPA Infrastructure Tri-Service 64 32 Bus Width (Number of Data Wires) 256 V-Bus PCI RACEway ISA 16 SCSI 1553 JTAG TM Ethernet FDDI 100Base HPSB Ethernet 1 10 M Copyright 1995-1999 SCRA 100 M HIC FC 1G Throughput (Bits/Sec) ATM 10 G 100 G [Lockheed95] 1 Methodology RASSP Reinventing Examples of DSP Architectures Electronic Design Architecture DARPA Infrastructure Tri-Service Pave Pillar Architecture Displays Data Network Sensors S D D N FE FE FE ECB ECB ESU ESU ECB = Element Control Bus ESU = Element Supervisor Unit FE = Functional Element (Memory, Processor, I/O) SDDN = Sensor Data Distribution Network VDDN = Video Data Distribution Network Copyright 1995-1999 SCRA FE V D D N Control Bus Support Elements System Interface User Console [Lockheed95] 1 Methodology RASSP Reinventing Application’s I/O Profiles Electronic Design Architecture DARPA Infrastructure Tri-Service The data type The nature of the data to be transferred The overhead A function of how much control data the interlink requires in the data stream The data distribution The Copyright 1995-1999 SCRA proximity of the data to its processing point 1 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Performance Metrics Tri-Service Function metrics Real-time constraints Processor metrics Computational throughput and latency Interconnect metrics Communication speed Scalability metrics Future Copyright 1995-1999 SCRA upgradeability 1 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Scalable Networks Tri-Service A scalable network is not sensitive to the details of the underlying technology and is broadly applicable to a very wide range of application size, speed, cost, etc. Copyright 1995-1999 SCRA 1 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Scalability Tri-Service Programmability Reconfigurability Upgrade cost Scalability limits Copyright 1995-1999 SCRA 2 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Module Outline Tri-Service Introduction State of the Art Impact of RASSP Advanced Applications Summary Copyright 1995-1999 SCRA 2 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Architecture Selection Process Tri-Service Problem Algorithm Select Architectures Building Modules Simulate Architecture Solutions Compare Performance Metrics Bus Protocol Performance Models /Behavior Models Problem Solution Candidate Architectures Map Algorithm onto Selected Architectures Reuse VHDL Library Copyright 1995-1999 SCRA [Lockheed95] 2 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Rules of Scalability Tri-Service Use building blocks Separate configuration control from application source code Provide for deterministic I/O Provide self-routing data packets Use transparent buffering of slave transfers to memory from external I/O Provide I/O that fits a broadcast capability and shared-memory regions No centralized arbitration Copyright 1995-1999 SCRA 2 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Why Broadcast Capability? Tri-Service The proportion of messages that need to be received by a large number of nodes at the same time There is lots of process synchronization and a resulting exchange of control flags Copyright 1995-1999 SCRA 2 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Module Outline Tri-Service Introduction State of the Art Impact of RASSP Advanced Applications Summary Copyright 1995-1999 SCRA 2 Methodology RASSP Reinventing Section Outline Electronic Design Architecture DARPA Infrastructure Tri-Service Advanced Applications SCI SCI overview Communication protocols Cache coherence Physical layer An Example: 2-D FFT Conclusion RACEway FutureBus+ PI-Bus Conclusion Copyright 1995-1999 SCRA 2 Methodology RASSP Reinventing The Objective of SCI Electronic Design Architecture DARPA Infrastructure Tri-Service Define an interconnect system that Scales well as the number of attached processors increases Provides Defines Copyright 1995-1999 SCRA a coherent memory system a simple interface between nodes 2 Methodology RASSP Reinventing Levels in SCI Protocols Electronic Design Architecture DARPA Infrastructure Tri-Service Logical level describes Initialization and address space Communication Error recovery facilities Coherence level provides Optional protocols cache coherence mechanisms Physical level specifies Electrical, mechanical, and thermal characteristics of connectors and cards Copyright 1995-1999 SCRA 2 Methodology RASSP Reinventing Basic SCI Nodes Electronic Design Architecture DARPA Infrastructure Tri-Service Use unidirectional input/output links Handshake on a packet basis rather than a bus cycle basis The speed-of-light barrier to system growth is removed No cycle-by-cycle handshakes Are always chained into ring networks Sender can get back the status information by flowing around a ring Copyright 1995-1999 SCRA 2 Methodology RASSP Reinventing Basic SCI Node Model Electronic Design Architecture DARPA Infrastructure Tri-Service responder Bypass FIFO Stripper requests responses requests responses Transmit Queues MUX (Optional) inserter encoder Active Buffer requester Receive Queues Elastic Buffer SavedIdle Linc Copyright 1995-1999 SCRA © IEEE 1993 [IEEE93] 3 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Components of SCI Nodes Tri-Service Elastic Buffer: synchronizes input data Stripper: strips incoming packets Bypass FIFO: delays pass-through packets SavedIdle Buffer: saves the needed information from incoming idle symbols Inserter: inserts initialization packets Encoder: re-inserts the flag symbols Dual queues: keeps responses independent of requests Copyright 1995-1999 SCRA 3 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Basic Packet Types Tri-Service Group Description Request-send Request-echo Response-send Response-echo request subaction content request subaction local acknowledge response subaction content response subaction local acknowledge Copyright 1995-1999 SCRA © IEEE 1993 [IEEE93] 3 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Send Packet Formats Tri-Service targetId command sourceId control AddressOffset{00..15} AddressOffset{16..31} AddressOffset{32..47} ext(0 or 16 bytes) data (0, 16, 64, or 256 bytes) CRC targetId command sourceId control status forwId backId ext(0 or 16 bytes) data (0, 16, 64, or 256 bytes) CRC Request-send Packet Format Response-send Packet Format coherent transaction status Copyright 1995-1999 SCRA © IEEE 1993 [IEEE93] 3 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Tri-Service targetId command sourceId control Echo Packet Echo Packets and Special Packets targetId distanceId stableId uniqueId0 uniqueId1 uniqueId2 uniqueId3 CRC Initialization Packet Copyright 1995-1999 SCRA CRC or idle FFFF 0000 0000 0000 0000 0000 0000 0000 any FFFB FFFB FFFB FFFB FFFB FFFB 0000 0000 Sync Packet Abort Packet © IEEE 1993 [IEEE93] 3 Methodology RASSP Reinventing Idle Symbols Electronic Design Architecture DARPA Infrastructure Tri-Service Format ip r ( 2 ) ac cc hg lg o ld lt ip r * ( 2 ) ac* cc* hg* lg * o ld * lt* The first idle symbol between packets is called non-consumable idle symbol Idle symbols are generated when There is no packet to transmit Send or echo packets are stripped from the ring A send packet is produced Copyright 1995-1999 SCRA 3 Methodology RASSP Reinventing Section Outline Electronic Design Architecture DARPA Infrastructure Tri-Service Advanced Applications SCI SCI overview Communication protocols Cache coherence Physical layer An Example: 2-D FFT Conclusion RACEway FutureBus+ PI-Bus Conclusion Copyright 1995-1999 SCRA 3 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Logical Protocols (1) Tri-Service Transactions are initiated by a requester and completed by a responder Send packets are transmitted from source nodes to destination nodes Echo packets are generated by destination nodes and notify source nodes if the send packets were accepted Copyright 1995-1999 SCRA 3 Methodology RASSP Reinventing Subactions Electronic Design Architecture DARPA Infrastructure Tri-Service r e q u e s t e r Copyright 1995-1999 SCRA request subaction target source source request send target target request echo source response send response echo source target © IEEE 1993 response subaction r e s p o n d e r [IEEE93] 3 Methodology RASSP Reinventing Logical Protocols (2) Electronic Design Architecture DARPA Infrastructure Tri-Service Sending a packet If the bypass FIFO is empty and the node is not transmitting a packet Transmit the packet and save its copy either at the head of transmit queues or at an optional active buffer Otherwise, Copyright 1995-1999 SCRA place the packet in transmit queues 3 Methodology RASSP Reinventing Place Packets in transmit queues Electronic Design Architecture DARPA Infrastructure Tri-Service responder Bypass FIFO Strip requests responses requests responses Transmit Queues MUX (Optional) insert encode Active Buffer requester Receive Queues Elastic Buffer SavedIdle Linc Copyright 1995-1999 SCRA © IEEE 1993 [IEEE93] 4 Methodology RASSP Reinventing Logical Protocols (3) Electronic Design Architecture DARPA Infrastructure Tri-Service Upon arrival the downstream node Parse the send packet and either pass it along the ring or strip it If the bypass FIFO is not empty and the node is sending a packet Route the passing packet into the bypass FIFO If a passing packet and a transmitting packet are ready to send at the same time, transmit queues are given priority Copyright 1995-1999 SCRA 4 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Route the Passing Packet into the Bypass FIFO Tri-Service responder Bypass FIFO Strip requests responses requests responses Transmit Queues MUX (Optional) insert encode Active Buffer requester Receive Queues Elastic Buffer SavedIdle Linc Copyright 1995-1999 SCRA © IEEE 1993 [IEEE93] 4 Methodology RASSP Reinventing Logical Protocols (4) Electronic Design Architecture DARPA Infrastructure Tri-Service Stripping a send packet If receive queues are not full Place the packet into receive queues Otherwise, discard the send packet Insert idle symbols and replace the last four symbols of the received send packet with an echo packet Copyright 1995-1999 SCRA 4 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Tri-Service Place a Packet into Receive Queues, Insert Idle Symbols and an Echo Packet responder Bypass FIFO Strip requests responses requests responses Transmit Queues MUX (Optional) insert encode Active Buffer requester Receive Queues Elastic Buffer SavedIdle Linc Copyright 1995-1999 SCRA © IEEE 1993 [IEEE93] 4 Methodology RASSP Reinventing Stripping A Packet Electronic Design Architecture DARPA Infrastructure Tri-Service Packet being stripped echo[4] send[N] Idle[N-4] echo[4] Idle[4] Copyright 1995-1999 SCRA © IEEE 1993 [IEEE93] 4 Methodology RASSP Reinventing Logical Protocols (5) Electronic Design Architecture DARPA Infrastructure Tri-Service Receiving an echo packet, either Pass it along the ring or Discard or retransmit the copy of the send packet in transmit queues or the active buffer Copyright 1995-1999 SCRA 4 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Bypass FIFO Tri-Service Hold portions of packets that arrive during the transmission When transmit queues is done transmitting a packet, the bypass FIFO output resumes Consumable idle symbols are not inserted into the bypass FIFO, but its idle.lg, idle.hg, and idle.old bits are merged into the savedIdle buffer Copyright 1995-1999 SCRA 4 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Starvation and Deadlock Tri-Service Forward progress is guaranteed Nothing prevents packets from being forwarded before they have arrived completely Queue reservation protocol ensures the fair use of part of the bandwidth Avoids starvation Maintaining dual queues keeps responses independent of requests Avoids Copyright 1995-1999 SCRA deadlock 4 Methodology RASSP Reinventing Starvation Electronic Design Architecture DARPA Infrastructure Tri-Service requester2 out responder out requester2 out Copyright 1995-1999 SCRA in (2) echo (done) requester2 requester1 out (1) send responder requester1 in (4) echo (busy) out (3) send responder (5) in © IEEE 1993 requester1 out (6) [IEEE93] 4 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Starvation (Cont.) Tri-Service requester2 out responder (8) echo (done) requester2 out requester1 in out (7) send responder (10) echo (busy) requester1 in out (9) send If this cycle repeats, the less-fortunate requester2 could be forever starved by the activity of requester1 Copyright 1995-1999 SCRA © IEEE 1993 [IEEE93] 5 Methodology RASSP Reinventing Queue Reservation Electronic Design Architecture DARPA Infrastructure Tri-Service requester2 responder requester1 SERVE_NA out in out (2) echo (1) send command.bsy=0, command.phase=DONE requester2 command.phase=DOTRY responder requester1 SERVE_NA out in out (4) echo (3) send command.bsy=1, command.phase=BUSY_A requester2 command.phase=DOTRY responder requester1 SERVE_A out Copyright 1995-1999 SCRA (5) in © IEEE 1993 out (6) [IEEE93] 5 Methodology RASSP Reinventing Queue Reservation (Cont.) Electronic Design Architecture DARPA Infrastructure Tri-Service requester2 responder requester1 SERVE_A out (8) echo (done) in (7) send command.bsy=1, command.phase=BUSY_B requester2 out command.phase=DOTRY responder requester1 SERVE_A out in out (10) echo command.bsy=0, command.phase=DONE (9) send command.phase=RETRY_A Queue reservation protocol ensures the fair use of part of the bandwidth Copyright 1995-1999 SCRA © IEEE 1993 [IEEE93] 5 Methodology RASSP Reinventing Deadlock Electronic Design Architecture DARPA Infrastructure Tri-Service res_A Blocked res_B Blocked processor req1 req2 processor req3 Linc req4 Linc req1 res_A req2 req1 res_A req2 req4 res_B req3 req4 res_B req3 Without Dual queues Copyright 1995-1999 SCRA With Dual queues 5 Methodology RASSP Reinventing Deadlock (Cont.) Electronic Design Architecture DARPA Infrastructure Tri-Service res_A1 req_A1 res_B2 proc_A proc_B Linc Linc Linc Linc proc_1 proc_2 req_B1 Copyright 1995-1999 SCRA req_B2 req_A2 5 Methodology RASSP Reinventing SCI Bridge Electronic Design Architecture DARPA Infrastructure Tri-Service RING A requests responses requests responses Linc RING B Linc Copyright 1995-1999 SCRA 5 Methodology RASSP Reinventing Remote Transaction Components Electronic Design Architecture DARPA Infrastructure Tri-Service requester req res responder req res 1. request send requester req res responder req res agent agent req req res res 2. request echo (a) local request subaction requester req res requester req res responder req res agent res agent 6. response echo 8. response echo 5. response send 7. response send (c) remote response subaction Copyright 1995-1999 SCRA 4. request echo (b) remote request subaction responder req res req 3. request send req res (d) local response subaction © IEEE 1993 [IEEE93] 5 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Deadlocks in Multiple Rings Tri-Service If routing is not carefully thought out, circular wait and deadlock might occur The deadlock problem is classical for store-andforward or virtual cut-through networks Copyright 1995-1999 SCRA 5 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Routing Mechanisms Tri-Service Store-and-forward routing Virtual cut-through routing Wormhole routing Virtual cut-through routing can be used in SCI networks Copyright 1995-1999 SCRA 5 Methodology RASSP Reinventing Routing Mechanisms(Cont.) Electronic Design Architecture DARPA Infrastructure Tri-Service Virtual cut-through routing can be used in SCI networks for the following reasons A bad packet might have a bad address, and thus get routed to the wrong place, but that should not cause any harm as long as it is checked before it is used A packet starts out the other side of a switch before it has been entirely received or checked SCI Copyright 1995-1999 SCRA reserves sufficient space to store entire packets 5 Methodology RASSP Reinventing Section Outline Electronic Design Architecture DARPA Infrastructure Tri-Service Advanced Applications SCI SCI overview Communication protocols Cache coherence Physical layer An Example: 2-D FFT Conclusion RACEway FutureBus+ PI-Bus Conclusion Copyright 1995-1999 SCRA 6 Methodology RASSP Reinventing Cache Coherence Electronic Design Architecture DARPA Infrastructure Tri-Service SCI does not use broadcast and eavesdropping mechanisms SCI uses a sharing list structure Distributed directory-based cache coherence protocol Scale well SCI supports both weak and strong sequential consistency Copyright 1995-1999 SCRA 6 Methodology RASSP Reinventing Standard Optimizations Electronic Design Architecture DARPA Infrastructure Tri-Service Fresh copy optimization DMA transfer optimization DMA controller can often fetch its data without joining the sharing list Pairwise sharing optimization Data The Copyright 1995-1999 SCRA are directly transferred from one cache to the other directory pointers need not be changed 6 Methodology RASSP Reinventing Sharing-list Structure Electronic Design Architecture DARPA Infrastructure Tri-Service processors CPU_A CPU_B CPU_C execution CPU_D units caches memory forwId backId cState memId central mState forwId addressOffset data[64] data[64] about 4% extra Copyright 1995-1999 SCRA about 7% extra © IEEE 1993 [IEEE93] 6 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Sharing-list Additions Tri-Service (2) Prepend New Cache A (invalid) (1) Read-cached Old Cache B (head) New Cache A (head) (cached) Memory (cached) Memory Before Copyright 1995-1999 SCRA Old Cache B (middle) After © IEEE 1990 [James90] 6 Methodology RASSP Reinventing Sharing-list Removals Electronic Design Architecture DARPA Infrastructure Tri-Service (1) Purge second entry Writer Cache A (head) (To memory) Copyright 1995-1999 SCRA Reader Cache B (middle) Reader Cache C (tail) (2) Purge next entry © IEEE 1990 [James90] 6 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Coherence Tags Tri-Service Memory tags A lock bit A 2-bit memory state field A 16-bit forwId field 64-Byte cache line Small tag overhead Reasonable efficiency Uniformity Cache tags 7-bit cache state Two 16-bit pointer fields, forwId and backId A 16-bit memId 48 bits of addressOffset Copyright 1995-1999 SCRA 6 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure The Options of the Cachecoherence Protocols Tri-Service Minimal set Maintain cache coherence in a trivial but correct way Typical set Has provisions for read sharing, robust recovery from errors, efficient read-only data accesses, efficient DMA transfers, and local data caching Full set Implement all of the defined options. In addition to the provisions of the typical set, the full set supports clean cache-line states, cleansing and washing of dirty cacheline states, pairwise sharing, and QOLB Copyright 1995-1999 SCRA 6 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Constraints of Optional Subsets Tri-Service The options can be implemented without significantly increasing the size of tags in the memory directory or caches The options work with the basic SCI transactionset definitions The options should not affect the correctness of the basic SCI cache-coherence specification Copyright 1995-1999 SCRA 6 Methodology RASSP Reinventing Section Outline Electronic Design Architecture DARPA Infrastructure Tri-Service Advanced Applications SCI SCI overview Communication protocols Cache coherence Physical layer An Example: 2-D FFT Conclusion RACEway FutureBus+ PI-Bus Conclusion Copyright 1995-1999 SCRA 6 Methodology RASSP Reinventing Physical Layer of SCI Electronic Design Architecture DARPA Infrastructure Tri-Service SCI links transmit symbols that contain A 16-bit data A 1-bit packet-delimiter (flag signal) A clock Notation of link types Type <numbers of signals>-<kind of signals>-<bit rate per signal in Mbit/s> All signals are unidirectional differential signals Copyright 1995-1999 SCRA 7 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Physical-layer Alternatives Tri-Service SCI node SCI node application logic application logic queues and control queues and control T (18 pairs) R T (18 pairs) Type 18-DE-500 Copyright 1995-1999 SCRA R Type 1-FO-1250 © IEEE 1993 [IEEE93] 7 Methodology RASSP Reinventing Section Outline Electronic Design Architecture DARPA Infrastructure Tri-Service Advanced Applications SCI SCI overview Communication protocols Cache coherence Physical layer An Example: 2-D FFT Conclusion RACEway FutureBus+ PI-Bus Conclusion Copyright 1995-1999 SCRA 7 Methodology RASSP Reinventing 2-D FFT Electronic Design Architecture DARPA Infrastructure Tri-Service C0 R0 R1 R2 C1 C2 C0 00 01 02 03 04 05 06 07 08 10 11 12 13 14 15 16 17 18 20 21 22 23 24 25 26 27 30 31 32 33 34 35 36 40 41 42 43 44 45 50 51 52 53 54 60 61 62 63 70 71 72 80 81 82 C1 C2 00 01 02 13 14 15 26 27 28 10 11 12 23 24 25 06 07 08 28 20 21 22 03 04 05 16 17 18 37 38 30 31 32 43 44 45 56 57 58 46 47 48 40 41 42 53 54 55 36 37 38 55 56 57 58 50 51 52 33 34 35 46 47 48 64 65 66 67 68 60 61 62 73 74 75 86 87 88 73 74 75 76 77 78 70 71 72 83 84 85 66 67 68 83 84 85 86 87 88 80 81 82 63 64 65 76 77 78 (a) Natural Order R0 R1 R2 (b) Column-dependent Vertical Roll Within each node: roll rows up zero places in C0 roll rows up one places in C1 roll rows up two places in C2 Copyright 1995-1999 SCRA 7 Methodology RASSP Reinventing 2-D FFT (Cont.) Electronic Design Architecture DARPA Infrastructure Tri-Service C0 R0 R1 R2 C1 C2 C0 00 01 02 13 14 15 26 27 28 i 06 07 08 10 11 12 23 24 15 ii 03 04 05 16 17 18 20 21 22 30 31 32 43 44 45 56 57 36 37 38 40 41 42 53 33 34 35 46 47 48 60 61 62 73 74 66 67 68 70 63 64 65 76 C2 00 01 02 10 11 12 20 21 22 03 04 05 13 14 15 23 24 25 iii 06 07 08 16 17 18 26 27 28 58 i 30 31 32 40 41 42 50 51 52 54 55 ii 33 34 35 43 44 45 53 54 55 50 51 52 iii 36 37 38 46 47 48 56 57 58 75 86 87 88 i 60 61 62 70 71 72 80 81 82 71 72 83 84 85 ii 63 64 65 73 74 75 83 84 85 77 78 80 81 82 iii 66 67 68 76 77 78 86 87 88 (c) Lockstep Shift Between nodes: i. right shift 0 node ii. right shift 1 node iii .right shift 2 nodes Copyright 1995-1999 SCRA C1 R0 R1 R2 (d) Column-dependent Row Interchange Within each node: interchange row 2&3 in C0 interchange row 1&2 in C1 interchange row 1&3 in C2 7 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure 2-D FFT (Cont.) Tri-Service Stripper output node 1 node output e3 p(1,2) node 2 node output p(2,3) node output e3 e1 e2 Stripper output node 3 e3 e1 Stripper output e2 p(3,1) e1 e2 p(3,2) p(1,3) p(1,3) p(2,1) p(2,1) p(3,2) e1 e2 e i: echo packet for node i p(3,2) e2 e3 e3 p(1,3) e3 e1 Start time Idle symbols e2 e1 p(2,1) End time p(i, j): packet from node i to node j Lockstep Timing Diagram Copyright 1995-1999 SCRA 7 Methodology RASSP Reinventing 2-D FFT (Cont.) Electronic Design Architecture DARPA Infrastructure Tri-Service Lockstep Row FFT Lockstep Lockstep Column FFT Lockstep Time 1 Lockstep Time = P * L + (P-1) * 4 + (2P-1) N = The number of nodes per dimension P = The number of transmitted packet per node, [1+2+...+(N-1)] L = Packet Length (Unit: Block = 2 Byte) Inter-communication Time = 4 * ( Lockstep Time) 2-D FFT Timing Diagram Copyright 1995-1999 SCRA 7 Methodology RASSP Reinventing Section Outline Electronic Design Architecture DARPA Infrastructure Tri-Service Advanced Applications SCI SCI overview Communication protocols Cache coherence Physical layer An Example: 2-D FFT Conclusion RACEway FutureBus+ PI-Bus Conclusion Copyright 1995-1999 SCRA 7 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure SCI - A Suitable Protocol for High Bandwidth Data Networks Tri-Service SCI networks provide a high throughput rate of 1GByte per second per node The bandwidth of SCI networks is scalable A point-to-point SCI network provides a high degree of parallelism Copyright 1995-1999 SCRA 7 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Why Is SCI Suitable for RASSP? Tri-Service A point-to-point SCI network is useful for building a local communication scheme for highly parallel real-time signal processing SCI can meet the requirements of parallelism and communication overhead Copyright 1995-1999 SCRA 7 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Why Is SCI Suitable for RASSP? (Cont.) Tri-Service Mesh architectures often provide very large speedups after an image is loaded. SCI networks can promote the performance of image processing tasks because mesh architecture is one of the network topologies which SCI favor SCI can be a good candidate for a large RASSP application because SCI can support a large system with up to 64K nodes and SCI interface is simple and inexpensive Copyright 1995-1999 SCRA 8 Methodology RASSP Reinventing Section Outline Electronic Design Architecture DARPA Infrastructure Tri-Service Advanced Applications SCI SCI overview Communication protocols Cache coherence Physical layer An Example: 2-D FFT Conclusion RACEway FutureBus+ PI-Bus Conclusion Copyright 1995-1999 SCRA 8 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure RACEway Overview Tri-Service Has a 32-Bit data path Operates at a 40 MHz clock rate Is designed for embedded applications where footprint and power are a concern The peak throughput is 160 MBytes per second per data path The basic element is a crossbar chip with six input/output channels Copyright 1995-1999 SCRA 8 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure RACEway Crossbar Tri-Service Each port is either a transaction initiator, recipient, or intermediary Data packet priorities allow for decentralized arbitration and preemption Data packet flow through the crossbar network is based on routing data contained in the packet header The data flow is bi-directional, but can only go in one direction at a time Copyright 1995-1999 SCRA 8 Methodology RASSP Reinventing Crossbar ASIC Electronic Design Architecture DARPA Infrastructure Tri-Service Data bus C0-31 Request output C Request input C Reply I/O C Strobe I/O C Read conditional C Data bus B0-31 Request output B Request input B Reply I/O B Strobe I/O B Read conditional B Data bus A0-31 Data bus D0-31 Request output A Request output D Request input A Reply I/O A Crossbar ASIC Request input D Reply I/O D Strobe I/O A Strobe I/O D Read conditional A Read conditional D Read conditional F Strobe I/O F Reply I/O F Request input F Request output F Data bus F0-31 Copyright 1995-1999 SCRA Read conditional E Strobe I/O E Reply I/O E Request input E Request output E Data bus E0-31 Reprinted with permission from Electronic Design, June 13, 1994. Copyright 1994, Penton Media, Inc. * [Isenstein94A] 8 Methodology RASSP Reinventing Electronic Design Level 1 DARPA Infrastructure RACEway Fat-tree Topology Tri-Service Level 2 Architecture Level 3 Bisection BW=160*8 MBytes/sec Copyright 1995-1999 SCRA P=44 [Isenstein94B] 8 Methodology RASSP Reinventing RACEway Fat-tree Topology (Cont.) Electronic Design Architecture DARPA Infrastructure Tri-Service In a P(=4i) processor machine Network The diameter is 2[log4P] hops bandwidth across any bisection is 160 P MBytes/s in one direction Each RACE crossbar port is capable of 160 MBytes/s providing an aggregate 480 MBytes/s for a single chip Copyright 1995-1999 SCRA 8 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Software Architecture Tri-Service Application Programming Interface (API) Tools interface Hardware device driver interface Kernel service interface Copyright 1995-1999 SCRA 8 Methodology RASSP Reinventing Section Outline Electronic Design Architecture DARPA Infrastructure Tri-Service Advanced Applications SCI SCI overview Communication protocols Cache coherence Physical layer An Example: 2-D FFT Conclusion RACEway FutureBus+ PI-Bus Conclusion Copyright 1995-1999 SCRA 8 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Objectives of FutureBus Tri-Service System architecture independent Maximum interoperability of boards and systems Fault-tolerant system support Use IEC mechanical standards and connectors System configuration support Cost effective Copyright 1995-1999 SCRA 8 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure FutureBus Tri-Service An asynchronous bus, using 32-bit multiplexed address and data path A fair distributed bus arbiter for allocating bus access A two-cycle handshake for transferring multiple data items Consists of: the address/data bus, the command bus, the status bus, and the arbitration bus Copyright 1995-1999 SCRA 9 Methodology RASSP Reinventing FutureBus Interface Electronic Design Architecture DARPA Infrastructure Tri-Service Control Bus Data Bus Go Done Request Receive machine dk*, di* AS*, DS* FutureBus drivers/receivers ak*, ai* Parity 5 Data 34 AK*, AP* DK*, DI* as*, ds* FutureBus drivers/receivers Acknowledge Data pipeline latch 34 Busy Go Done Request Acknowledge Send machine Grant ap*, aq*, ar*, ac*, ab* AP*, AQ*, AR*, AC*, AB* FutureBus drivers/receivers Request Arbiter FutureBus drivers/receivers FutureBus Copyright 1995-1999 SCRA © IEEE 1991 [Jones91] 9 Methodology RASSP Reinventing Arbitration Electronic Design Architecture DARPA Infrastructure Tri-Service Fully distributed No central control Asynchronous Fully handshake Fairness mode normal Priority mode access Up to 32 logical units Parity error control Copyright 1995-1999 SCRA 9 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Compete for Mastership Tri-Service Each competing board applies its unique arbitration number an*[7 ... 0] to the bus signals AB*[7 ... 0] by asserting ab[n] if and only if an*[n] is asserted, and For all m>n when an*[n] is released, so is AB*[m] After a certain time delay, the signals on AB*[7 ... 0] will be the largest arbitration number of any competing board Copyright 1995-1999 SCRA 9 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Parallel Bus Protocol Tri-Service Totally asynchronous and technologyindependent handshake Totally transparent communication with memory using individual byte buses Mechanism for Broadcast on all sequences Multiple levels of locking for mutual exclusion Built-in hooks to extend the protocol Copyright 1995-1999 SCRA 9 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Data Transmission Protocol Tri-Service The block length can be zero to 4Gbytes The protocol is asynchronous and uses a twoedged handshake A data transfer rate of up to 117 MBytes/sec can be achieved Broadcast data transfers over the full 4Gbyte address space are supported Copyright 1995-1999 SCRA 9 Methodology RASSP Reinventing Three-Wire Synchronization Protocol Electronic Design Architecture DARPA Infrastructure Tri-Service 0 1 2 3 4 5 0 1) The arbiter asserts AR* and release AP* and AQ* AP* 2) When an arbiter is ready to move to the next state, it asserts ap* and release ar* AQ* 3) Only when all arbiters are ready to move will all boards release ar*, thus releasing the AR* bus signal. All arbiters register the release of AR* and then move to the next state AR* AC0* AC1* AB* 4) When an arbiter is ready to move on again, it asserts aq* and releases ap* 5) As before, only when all arbiters are ready to move will all boards release ap* and then bus signal AP* 6) Subsequent state transitions repeat a similar process Copyright 1995-1999 SCRA © IEEE 1991 [Jones91] 9 Methodology RASSP Reinventing Data Transfer Protocol Electronic Design Architecture Infrastructure DARPA Tri-Service AS* AK* AI* DS* DK* DI* AD* Address 2 3 Copyright 1995-1999 SCRA Data 4 5 6 Data 7 8 © IEEE 1991 Data Data 9 10 [Jones91] 9 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Data Transfer Protocol (Cont.) Tri-Service 1 The sender gains control of the bus 2 The master places the address of the recipient on the address/data signals 3 When AS* is asserted, each and every board decodes the address 4 When AI* releases, the master can safely remove the address from the bus 5 Next, the master places the first word to be transmitted on the bus 6 When DS* asserts, the recipient reads the data from the bus. 7 When DI* releases, the transfer of the first data word completes. The master then places the next data word 7 When DS* releases, the recipient grabs the next data 9 Data transfer now continues in this way until the master has no more to send A The recipient, and all other slaves, acknowledge the release of AS* by releasing ak* and asserting ai* B When AK* releases, the transaction completes Copyright 1995-1999 SCRA 9 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Electrical Specifications Tri-Service Single +5V power rail Total open-collector system in a transmission line environment Driver specification to solve the bus driving problem Trapezoidal driver to reduce crosstalk Receiver filters to solve the wired-OR glitch problem Copyright 1995-1999 SCRA 9 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Major Features Tri-Service A bus driver can switch all receivers in one bus propagation FutureBus does not require settling time The protocol provides a fast, two-edged blocktransfer mode Arbitration may take place concurrently with data transfer A truly technology-independent handshake Copyright 1995-1999 SCRA 1 Methodology RASSP Reinventing Section Outline Electronic Design Architecture DARPA Infrastructure Tri-Service Advanced Applications SCI SCI overview Communication protocols Cache coherence Physical layer An Example: 2-D FFT Conclusion RACEway FutureBus+ PI-Bus Copyright 1995-1999 SCRA Conclusion 1 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure PI-Bus Overview Tri-Service The PI-Bus is a multidrop parallel (16-bit or 32bit) bus which uses its DMA capability to transfer data between the modules The PI-Bus has two levels of logical addressing capabilities Logical module Identifiers (LIDs) allow a message to be sent to a “logical” module The logical addressing capability provides efficient fault recovery Copyright 1995-1999 SCRA 1 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Logical Module Addressing Tri-Service LIDs 12 16 77 129 Send Message to LID 12 LIDs 10 110 210 143 PI-Bus LIDs 12 18 221 230 Copyright 1995-1999 SCRA LIDs 15 16 12 27 © IEEE 1987 [Hinkey 87] 1 Methodology RASSP Reinventing Labeled Addressing Capability Electronic Design Architecture DARPA Infrastructure Tri-Service Module Memory 500 200 100 . . . 10B3DH -- Busy Bus_A Bus_B Int Address High Address Low Label Translation Table (LTT) LTT entry 0000 0111 0000 0001 x + 2000 0000 1011 0011 1101 BIU x PI-Bus [ ]: Data Copyright 1995-1999 SCRA [500] [200] [100] <1000> < >: Label © IEEE 1987 [Hinkey87] 1 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Guidelines for Developing the Software Tri-Service Each process or group of processes that will always be on the same processor will be assigned an LID Each data object in a process that can be referenced by another process has a label assigned to it Copyright 1995-1999 SCRA 1 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Features of PI-Bus Tri-Service Provides for addressing modules on the bus in a logical manner The memory locations in each of the processors on the bus can be referenced logically The application software developer need not be at all concerned about a processor failing Copyright 1995-1999 SCRA 1 Methodology RASSP Reinventing Section Outline Electronic Design Architecture DARPA Infrastructure Tri-Service Advanced Applications SCI SCI overview Communication protocols Cache coherence Physical layer An Example: 2-D FFT Conclusion RACEway FutureBus+ PI-Bus Conclusion Copyright 1995-1999 SCRA 1 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Pros and Cons of Shared Bus Architectures Tri-Service Bus systems have a broadcast capability Buses have reached practical and inherent limits, such as The speed of light The capacitance of transceivers and connectors The one-talker-at-a-time bottleneck Shared buses are limited in the degree of scalability by the fixed interconnect bandwidth Copyright 1995-1999 SCRA 1 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Loading Problems of Shared Buses Tri-Service The fundamental physics limits - the speed of light Solution: Unidirectional links Crosstalk between adjacent signal Solution: Copyright 1995-1999 SCRA Differential signaling 1 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Pros and Cons of RACEway Tri-Service High degree of scalability The I/O cost per processor remains the same High composite bandwidth Each crossbar can transfer three messages simultaneously Topology independence RACE system can be configured in many different kinds of networks Easy and compact interfacing RACEway use CMOS signaling levels at 40 MHz with no requirement for exotic design or manufacturing techniques to overcome wire density or crosstalk problems. Copyright 1995-1999 SCRA 1 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Pros and Cons of SCI Tri-Service Pros SCI networks scale well SCI solves loading problems SCI makes signaling speed independent of the size of the system Independent transfers can take place concurrently Cons Lacks a broadcast capability A long sharing list turns out heavy traffic loads Copyright 1995-1999 SCRA 1 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Comparison of Scalability Tri-Service Building blocks Separate configuration control Deterministic I/O ports Self-routing data packets Transparent buffering Broadcast capability Centralized arbitration Copyright 1995-1999 SCRA Shared Bus System No No No No No Yes Yes SCI RACEway Yes No Yes Yes Yes No No Yes Yes Yes Yes Yes Yes No 1 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Tri-Service Shared Buses RACEway SCI Copyright 1995-1999 SCRA Applicability in High Bandwidth Data Network Design Scalability High throughput Efficient cache protocols Starvations bad excellent good bad good excellent excellent good good sometimes occur No No 1 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Module Outline Tri-Service Introduction State of the Art Impact of RASSP Advanced Applications Summary Copyright 1995-1999 SCRA 1 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure Summary Tri-Service High bandwidth data network is demanded for RASSP applications High throughput, high scalability, and forward progress are required in high bandwidth data network design SCI and RACEway are suitable for high performance RASSP systems Copyright 1995-1999 SCRA 1 Methodology RASSP Reinventing Electronic Design Architecture DARPA Infrastructure References Tri-Service [Analog94] Analog Device, ADSP2106x SHARC: Preliminary User’s Manual, March, 1994. [Hinkey 87] Michael Hinkey and Drew Clarke, “A Fault Recovery Mechanism Using Logical Bus Addressing”, IEEE, 1987; © IEEE 1987 [IEEE] All referenced IEEE material is used with permission. [IEEE93] “IEEE Standard for Scalable Coherent Interface (SCI)” , IEEE Computer Society, Aug 2, 1993 ; © IEEE 1993 [Isenstein94A] Barry Isenstein, “Scaling I/O Bandwidth With Multiprocessors,” Electronic Design, June 13, 1994 also also http://www.mc.com. [Isenstein94B] Barry S. Isenstein and Bradley C. Kuszmaul, “Overview of the RACE Hardware and Software Architecture,” 1st RASSP Annual Conference, 1994 also http://www.mc.com. [James90] David V. James, et al., “Distributed-Directory Scheme: Scalable Coherent Interface,” Computer, June, 1990 ; © IEEE 1990 [Jones91] Simon L. Peyton Jones and Mark S. Hardle, “A FutureBus Interface from Off-the-Shelf Parts”, IEEE Micro, Feb, 1991 ; © IEEE 1991 [Lockheed95] Shirley F., et. al., RASSP Architecture Guide , Lockheed-Sanders, Revision B, Jan 5, 1995. This work was performed by Sanders, a Lockheed Martin Company, as a part of the Sanders RASSP program under contract N00014-93-C-2172 to the Naval Research Laboratory, 4555 Overlook Avenue, SW, Washington, DC 20375-5326. The Sponsoring Agency is: Defense Advanced Research Projects Agency, Electronic System Technology Office, 3701 North Fairfax Drive, Arlington, VA 22203-1714. The Sanders RASSP team consists of Sanders, Motorola, Hughes, and ISX. [Richards97] Richards, M., Gadient, A., Frank, G., eds. Rapid Prototyping of Application Specific Signal Processors, Kluwer Academic Publishers, Norwell, MA, 1997 Copyright 1995-1999 SCRA 1