Lecture 15: Busses and Networking (1) Prof. Jan Rabaey Computer Science 252, Spring 2000 Based on slides from Dave Patterson, John Kubiatowicz Bill Dally, and Sonics, Inc JR.S00 1 A Communication-Centric World • Computation is getting distributed … – Internet, WAN, LAN, BodyLAN, Home Networks, Microprocessor Peripherals, Processor-Memory Interface, System-on-a-Chip • Efficient Networking and Communication is Crucial • The System-on-a-Chip implies the Networkon-a-Chip • In Next Set of Lectures: – Busses and Networks – But more importantly, the impact of integration JR.S00 2 What is a bus? A Bus Is: • shared communication link • single set of wires used to connect multiple subsystems Processor Input Control Datapath Memory Output • A Bus is also a fundamental tool for composing large, complex systems – systematic means of abstraction JR.S00 3 Busses JR.S00 4 Advantages of Buses Processer I/O Device I/O Device I/O Device Memory • Versatility: – New devices can be added easily – Peripherals can be moved between computer systems that use the same bus standard • Low Cost: – A single set of wires is shared in multiple ways JR.S00 5 Disadvantage of Buses Processor I/O Device I/O Device I/O Device Memory • It creates a communication bottleneck – The bandwidth of that bus can limit the maximum I/O throughput • The maximum bus speed is largely limited by: – The length of the bus – The number of devices on the bus – The need to support a range of devices with: » Widely varying latencies » Widely varying data transfer rates JR.S00 6 General Organization of a Bus Control Lines Data Lines • Control lines: – Signal requests and acknowledgments – Indicate what type of information is on the data lines • Data lines carry information between the source and the destination: – Data and Addresses – Complex commands JR.S00 7 Master versus Slave Master issues command Bus Master Data can go either way Bus Slave • A bus transaction includes two parts: – Issuing the command (and address) – Transferring the data – request – action • Master is the one who starts the bus transaction by: – issuing the command (and address) • Slave is the one who responds to the address by: – Sending data to the master if the master ask for data – Receiving data from the master if the master wants to send data JR.S00 8 Types of Busses • Processor-Memory Bus (design specific) – Short and high speed – Only need to match the memory system » Maximize memory-to-processor bandwidth – Connects directly to the processor – Optimized for cache block transfers • I/O Bus (industry standard) – Usually is lengthy and slower – Need to match a wide range of I/O devices – Connects to the processor-memory bus or backplane bus • Backplane Bus (standard or proprietary) – Backplane: an interconnection structure within the chassis – Allow processors, memory, and I/O devices to coexist – Cost advantage: one bus for all components JR.S00 9 Example: Pentium System Organization Processor/Memory Bus PCI Bus I/O Busses JR.S00 10 A Computer System with One Bus: Backplane Bus Backplane Bus Processor Memory I/O Devices • A single bus (the backplane bus) is used for: – Processor to memory communication – Communication between I/O devices and memory • Advantages: Simple and low cost • Disadvantages: slow and the bus can become a major bottleneck • Example: IBM PC - AT JR.S00 11 A Two-Bus System Processor Memory Bus Processor Memory Bus Adaptor I/O Bus Bus Adaptor Bus Adaptor I/O Bus I/O Bus • I/O buses tap into the processor-memory bus via bus adaptors: – Processor-memory bus: mainly for processor-memory traffic – I/O buses: provide expansion slots for I/O devices • Apple Macintosh-II – NuBus: Processor, memory, and a few selected I/O devices – SCCI Bus: the rest of the I/O devices JR.S00 12 A Three-Bus System Processor Memory Bus Processor Memory Bus Adaptor Backplane Bus Bus Adaptor Bus Adaptor I/O Bus I/O Bus • A small number of backplane buses tap into the processor-memory bus – Processor-memory bus is only used for processor-memory traffic – I/O buses are connected to the backplane bus • Advantage: loading on the processor bus is greatly reduced JR.S00 13 North/South Bridge architectures: separate busses Processor Processor Memory Bus Memory “backside cache” Bus Adaptor Backplane Bus Bus Adaptor I/O Bus I/O Bus • Separate sets of pins for different functions – – – – Memory bus Caches Graphics bus (for fast frame buffer) I/O busses are connected to the backplane bus • Advantage: – Busses can run at different speeds – Much less overall loading! JR.S00 14 What defines a bus? Transaction Protocol Timing and Signaling Specification Bunch of Wires Electrical Specification Physical / Mechanical Characteristics – the connectors JR.S00 15 Synchronous and Asynchronous Bus • Synchronous Bus: – Includes a clock in the control lines – A fixed protocol for communication that is relative to the clock – Advantage: involves very little logic and can run very fast – Disadvantages: » Every device on the bus must run at the same clock rate » To avoid clock skew, they cannot be long if they are fast • Asynchronous Bus: – It is not clocked – It can accommodate a wide range of devices – It can be lengthened without worrying about clock skew – It requires a handshaking protocol JR.S00 16 Busses so far Master Slave °°° Control Lines Address Lines Data Lines Bus Master: has ability to control the bus, initiates transaction Bus Slave: module activated by the transaction Bus Communication Protocol: specification of sequence of events and timing requirements in transferring information. Asynchronous Bus Transfers: control lines (req, ack) serve to orchestrate sequencing. Synchronous Bus Transfers: sequence relative to common clock. JR.S00 17 Bus Transaction • Arbitration: • Request: • Action: Who gets the bus What do we want to do What happens in response JR.S00 18 Arbitration: Obtaining Access to the Bus Control: Master initiates requests Bus Master Data can go either way Bus Slave • One of the most important issues in bus design: – How is the bus reserved by a device that wishes to use it? • Chaos is avoided by a master-slave arrangement: – Only the bus master can control access to the bus: It initiates and controls all bus requests – A slave responds to read and write requests • The simplest system: – Processor is the only bus master – All bus requests must be controlled by the processor – Major drawback: the processor is involved in every transaction JR.S00 19 Multiple Potential Bus Masters: the Need for Arbitration • Bus arbitration scheme: – A bus master wanting to use the bus asserts the bus request – A bus master cannot use the bus until its request is granted – A bus master must signal to the arbiter the end of the bus utilization • Bus arbitration schemes usually try to balance two factors: – Bus priority: the highest priority device should be serviced first – Fairness: Even the lowest priority device should never be completely locked out from the bus • Bus arbitration schemes can be divided into four broad classes: – Daisy chain arbitration – Centralized, parallel arbitration – Distributed arbitration by self-selection: each device wanting the bus places a code indicating its identity on the bus. – Distributed arbitration by collision detection: Each device just “goes for it”. Problems found after the fact. JR.S00 20 The Daisy Chain Bus Arbitrations Scheme Device 1 Highest Priority Grant Bus Arbiter Device N Lowest Priority Device 2 Grant Grant Release Request wired-OR • Advantage: simple • Disadvantages: – Cannot assure fairness: A low-priority device may be locked out indefinitely – The use of the daisy chain grant signal also limits the bus speed JR.S00 21 Centralized Parallel Arbitration Device 1 Grant Device 2 Device N Req Bus Arbiter • Used in essentially all processor-memory busses and in high-speed I/O busses JR.S00 22 Simplest bus paradigm • All agents operate synchronously • All can source / sink data at same rate • => simple protocol – just manage the source and target JR.S00 23 Simple Synchronous Protocol BReq BG R/W Address Data Cmd+Addr Data1 Data2 • Even memory busses are more complex than this – memory (slave) may take time to respond – it may need to control data rate JR.S00 24 Typical Synchronous Protocol BReq BG R/W Address Cmd+Addr Wait Data Data1 Data1 Data2 • Slave indicates when it is prepared for data xfer • Actual transfer goes at bus rate JR.S00 25 Increasing the Bus Bandwidth • Separate versus multiplexed address and data lines: – Address and data can be transmitted in one bus cycle if separate address and data lines are available – Cost: (a) more bus lines, (b) increased complexity • Data bus width: – By increasing the width of the data bus, transfers of multiple words require fewer bus cycles – Example: SPARCstation 20’s memory bus is 128 bit wide – Cost: more bus lines • Block transfers: – – – – Allow the bus to transfer multiple words in back-to-back bus cycles Only one address needs to be sent at the beginning The bus is not released until the last word is transferred Cost: (a) increased complexity (b) decreased response time for request JR.S00 26 Increasing Transaction Rate on Multimaster Bus • Overlapped arbitration – perform arbitration for next transaction during current transaction • Bus parking – master holds onto bus and performs multiple transactions as long as no other master makes request • Overlapped address / data phases – requires one of the above techniques • Split-phase (or packet switched) bus – completely separate address and data phases – arbitrate separately for each – address phase yield a tag which is matched with data phase • ”All of the above” in most modern memory buses JR.S00 27 1993 CPU- Memory Bus Survey Bus Originator Clock Rate (MHz) Address lines Data lines Data Sizes (bits) Clocks/transfer Peak (MB/s) Master Arbitration Slots Busses/system Length MBus Sun 40 36 64 256 320(80) Multi Central 1 Summit HP 60 48 128 512 4 960 Multi Central 16 1 13 inches Challenge SGI 48 40 256 1024 5 1200 Multi Central 9 1 12? inches XDBus Sun 66 muxed 144 (parity) 512 4? 1056 Multi Central 10 2 17 inches JR.S00 28 Asynchronous Handshake (4-phase) Write Transaction Address Master Asserts Address Data Master Asserts Data Next Address Read Req Ack t0 • • • • • • t1 t2 t3 t4 t5 t0 : Master has obtained control and asserts address, direction, data Waits a specified amount of time for slaves to decode target t1: Master asserts request line t2: Slave asserts ack, indicating data received t3: Master releases req t4: Slave releases ack JR.S00 29 Read Transaction Address Master Asserts Address Data Next Address Slave Data Read Req Ack t0 t1 t2 t3 t4 t5 • t0 : Master has obtained control and asserts address, direction, data • Waits a specified amount of time for slaves to decode target\ • t1: Master asserts request line • t2: Slave asserts ack, indicating ready to transmit data • t3: Master releases req, data received • t4: Slave releases ack JR.S00 30 1993 Backplane/IO Bus Survey Bus SBus TurboChannel MicroChannel PCI Originator Sun DEC IBM Intel Clock Rate (MHz) 16-25 12.5-25 async 33 Addressing Virtual Physical Physical Physical Data Sizes (bits) 8,16,32 8,16,24,32 8,16,24,32,64 8,16,24,32,64 Master Multi Single Multi Multi Arbitration Central Central Central Central 32 bit read (MB/s) 33 25 20 33 Peak (MB/s) 89 84 75 111 (222) Max Power (W) 16 26 13 25 JR.S00 31 High Speed I/O Bus • Examples – graphics – fast networks • Limited number of devices • Data transfer bursts at full rate • DMA transfers important – small controller spools stream of bytes to or from memory • Either side may need to squelch transfer – buffers fill up JR.S00 32 PCI Read/Write Transactions • All signals sampled on rising edge • Centralized Parallel Arbitration – overlapped with previous transaction • • • • All transfers are (unlimited) bursts Address phase starts by asserting FRAME# Next cycle “initiator” asserts cmd and address Data transfers happen on when – IRDY# asserted by master when ready to transfer data – TRDY# asserted by target when ready to transfer data – transfer when both asserted on rising edge • FRAME# deasserted when master intends to complete only one more data transfer JR.S00 33 PCI Read Transaction – Turn-around cycle on any signal driven by more than one agent JR.S00 34 PCI Write Transaction JR.S00 35 The System-on-a-Chip Nightmare System Bus DMA CPU DSP Mem Ctrl. Bridge The “Board-on-a-Chip” Approach MPEG I O O C Custom Interfaces Control Wires Peripheral Bus JR.S00 36 Sonics SOC Integration Architecture DMA CPU DSP MPEG { MultiChip Backplane™ SiliconBackplane™ (patented) Open Core Protocol™ C MEM I O SiliconBackplane Agent™ JR.S00 37 Open Core Protocol Goals • • • • • Bus Independent Scalable Configurable Synthesis/Timing Analysis Friendly Encompass entire core/system interface needs (data, control, and test flows) JR.S00 38 Data, Control, and Test Flows • Data Flow – Signals and protocols associated with moving data – Includes address, data, handshaking, etc. – Similar to services provided by traditional computer buses • Control Flow – Signals and protocols associated with non-data communication – Sideband - not synchronized to data flow (out of band) – Examples include interrupts, high-level flow control, etc. • Test Flow – Signals and protocols related to debug and manufacturing test JR.S00 39 OCP Overview • Point-to-point, uni-directional, synchronous – easy physical implementation • Master/Slave, request/response – well-defined, simple roles • Extensions – added functionality to support cores with more complex interface requirements • Configurability – pay only for the features needed for a given core JR.S00 40 Master vs. Slave IP Core Master Open Core Protocol Initiator Slave IP Core Master IP Core Slave Slave Request Slave Master Target Response Master On-Chip Bus JR.S00 41 Basic OCP MCmd [3] MAddr [N] MData [N] SCmdAccept SResp [3] SData [N] MCmd, MAddr SCmdAccept SResp, SData MCmd, Maddr, MData SCmdAccept Slave Master Clk Read: Command, Address Command Accept Response, Data Write (posted): Command, Address, Data Command Accept JR.S00 42 Protocol Phases • Request Phase (begins Transfer) – Master presents request (command, address, etc.) to Slave • Response Phase (ends Transfer) – Slave presents response (success/fail, read data) to Master – Only available for read transfers (posted write model) • Datahandshake Phase (Optional) – Allows pipelining request ahead of write data – Only available for write transfers • Phase ordering – Request -> Datahandshake -> Response JR.S00 43 OCP Extensions • Simple Extensions – – – – Byte Enables Bursts Flow Control Data Handshake • Complex Extensions – Threads and Connections • Sideband Signals JR.S00 44 The Backplane: Why Not Use a Computer Bus? Transmit FIFO IP Core Receive FIFO Arbiter Address Data IP Core Computer Bus IP Core IP Core Time • Expensive to decouple • Not designed for real-time JR.S00 45 Communication Buses Decouple and Guarantee Real Time Transmit FIFO IP Core Receive FIFO TDMA TDMA Data IP Core Communications Bus IP Core IP Core Time • Connections are expensive • Poor read latency JR.S00 46 SiliconBackplane™ Employs Best of Both From Computing • Address-based selection • Write and read transfers • Pipelining DMA CPU C MEM From Communications • Efficient BW decoupling • Guaranteed BW & latency • Side-band signaling DSP I MPEG O JR.S00 47 Guaranteed Bandwidth Arbitration • Independent arbitration for every cycle includes two phases: - Distributed TDMA - Round robin Current Slot • Provides fine control over system bandwidth Arbitration Command JR.S00 48 Guaranteed Latency • Fixed latency between command/address and data/response phases • Matches pipelined CPU model ensuring high performance access to on-chip resources • Pipelined data routed through SiliconBackplane™ • Latency re-programmable in software • Variable-latency blocks do not tie up the SiliconBackplane JR.S00 49 Integrated Signaling Mechanism • Dedicated SiliconBackplane™ wires (Flags) support: – Bus-style out-of-band signaling (interrupts) – Point-to-point communications (flow control) – Dynamic point-to-point (retry mechanism) • Same design flow, timing, flexibility as address/data portion of SonicsIA™ JR.S00 51 MultiChip Backplane™ Extends SonicsIA™ Between Chips SiliconBackplane CPU-Based ASSP MultiChip Backplane FPGA ASSP Seamless integration of protocols JR.S00 52 Validation / Test • SiliconBackplane highly visible for test ™ – All subsystems communicate through SiliconBackplane • Test Interfaces: MultiChip Backplane™ Test Vectors Test Vectors – MultiChip Backplane: 100’s MB/sec. – ServiceAgent: Scan-based • Each subsystem can be tested/validated stand-alone JR.S00 53 Summary • Busses are an important technique for building large-scale systems – Their speed is critically dependent on factors such as length, number of devices, etc. – Critically limited by capacitance – Tricks: esoteric drive technology such as GTL • Important terminology: – Master: The device that can initiate new transactions – Slaves: Devices that respond to the master • Two types of bus timing: – Synchronous: bus includes clock – Asynchronous: no clock, just REQ/ACK strobing • System-on-a-Chip approach invites new solutions – Well-defined and clear communication protocols – Physical layer hidden to designer JR.S00 54