Lecture 6: Vector

advertisement
Lecture 15:
Busses and Networking (1)
Prof. Jan Rabaey
Computer Science 252, Spring 2000
Based on slides from Dave Patterson, John Kubiatowicz
Bill Dally, and Sonics, Inc
JR.S00 1
A Communication-Centric World
• Computation is getting distributed …
– Internet, WAN, LAN, BodyLAN, Home Networks,
Microprocessor Peripherals, Processor-Memory
Interface, System-on-a-Chip
• Efficient Networking and Communication is
Crucial
• The System-on-a-Chip implies the Networkon-a-Chip
• In Next Set of Lectures:
– Busses and Networks
– But more importantly, the impact of integration
JR.S00 2
What is a bus?
A Bus Is:
• shared communication link
• single set of wires used to connect multiple
subsystems
Processor
Input
Control
Datapath
Memory
Output
• A Bus is also a fundamental tool for composing
large, complex systems
– systematic means of abstraction
JR.S00 3
Busses
JR.S00 4
Advantages of Buses
Processer
I/O
Device
I/O
Device
I/O
Device
Memory
• Versatility:
– New devices can be added easily
– Peripherals can be moved between computer
systems that use the same bus standard
• Low Cost:
– A single set of wires is shared in multiple ways
JR.S00 5
Disadvantage of Buses
Processor
I/O
Device
I/O
Device
I/O
Device
Memory
• It creates a communication bottleneck
– The bandwidth of that bus can limit the maximum I/O throughput
• The maximum bus speed is largely limited by:
– The length of the bus
– The number of devices on the bus
– The need to support a range of devices with:
» Widely varying latencies
» Widely varying data transfer rates
JR.S00 6
General Organization of a Bus
Control Lines
Data Lines
• Control lines:
– Signal requests and acknowledgments
– Indicate what type of information is on the data lines
• Data lines carry information between the source and
the destination:
– Data and Addresses
– Complex commands
JR.S00 7
Master versus Slave
Master issues command
Bus
Master
Data can go either way
Bus
Slave
• A bus transaction includes two parts:
– Issuing the command (and address)
– Transferring the data
– request
– action
• Master is the one who starts the bus transaction by:
– issuing the command (and address)
• Slave is the one who responds to the address by:
– Sending data to the master if the master ask for data
– Receiving data from the master if the master wants to send data
JR.S00 8
Types of Busses
• Processor-Memory Bus (design specific)
– Short and high speed
– Only need to match the memory system
» Maximize memory-to-processor bandwidth
– Connects directly to the processor
– Optimized for cache block transfers
• I/O Bus (industry standard)
– Usually is lengthy and slower
– Need to match a wide range of I/O devices
– Connects to the processor-memory bus or backplane bus
• Backplane Bus (standard or proprietary)
– Backplane: an interconnection structure within the chassis
– Allow processors, memory, and I/O devices to coexist
– Cost advantage: one bus for all components
JR.S00 9
Example: Pentium System Organization
Processor/Memory
Bus
PCI Bus
I/O Busses
JR.S00 10
A Computer System with One Bus:
Backplane Bus
Backplane Bus
Processor
Memory
I/O Devices
• A single bus (the backplane bus) is used for:
– Processor to memory communication
– Communication between I/O devices and memory
• Advantages: Simple and low cost
• Disadvantages: slow and the bus can become a
major bottleneck
• Example: IBM PC - AT
JR.S00 11
A Two-Bus System
Processor Memory Bus
Processor
Memory
Bus
Adaptor
I/O
Bus
Bus
Adaptor
Bus
Adaptor
I/O
Bus
I/O
Bus
• I/O buses tap into the processor-memory bus via bus
adaptors:
– Processor-memory bus: mainly for processor-memory traffic
– I/O buses: provide expansion slots for I/O devices
• Apple Macintosh-II
– NuBus: Processor, memory, and a few selected I/O devices
– SCCI Bus: the rest of the I/O devices
JR.S00 12
A Three-Bus System
Processor Memory Bus
Processor
Memory
Bus
Adaptor
Backplane Bus
Bus
Adaptor
Bus
Adaptor
I/O Bus
I/O Bus
• A small number of backplane buses tap into the
processor-memory bus
– Processor-memory bus is only used for processor-memory traffic
– I/O buses are connected to the backplane bus
• Advantage: loading on the processor bus is greatly
reduced
JR.S00 13
North/South Bridge architectures:
separate busses
Processor
Processor Memory Bus
Memory
“backside
cache”
Bus
Adaptor
Backplane Bus
Bus
Adaptor
I/O Bus
I/O Bus
• Separate sets of pins for different functions
–
–
–
–
Memory bus
Caches
Graphics bus (for fast frame buffer)
I/O busses are connected to the backplane bus
• Advantage:
– Busses can run at different speeds
– Much less overall loading!
JR.S00 14
What defines a bus?
Transaction Protocol
Timing and Signaling Specification
Bunch of Wires
Electrical Specification
Physical / Mechanical Characteristics
– the connectors
JR.S00 15
Synchronous and Asynchronous Bus
• Synchronous Bus:
– Includes a clock in the control lines
– A fixed protocol for communication that is relative to the clock
– Advantage: involves very little logic and can run very fast
– Disadvantages:
» Every device on the bus must run at the same clock rate
» To avoid clock skew, they cannot be long if they are fast
• Asynchronous Bus:
– It is not clocked
– It can accommodate a wide range of devices
– It can be lengthened without worrying about clock skew
– It requires a handshaking protocol
JR.S00 16
Busses so far
Master
Slave
°°°
Control Lines
Address Lines
Data Lines
Bus Master: has ability to control the bus, initiates transaction
Bus Slave: module activated by the transaction
Bus Communication Protocol: specification of sequence of events
and timing requirements in transferring information.
Asynchronous Bus Transfers: control lines (req, ack) serve to
orchestrate sequencing.
Synchronous Bus Transfers: sequence relative to common clock.
JR.S00 17
Bus Transaction
• Arbitration:
• Request:
• Action:
Who gets the bus
What do we want to do
What happens in response
JR.S00 18
Arbitration:
Obtaining Access to the Bus
Control: Master initiates requests
Bus
Master
Data can go either way
Bus
Slave
• One of the most important issues in bus design:
– How is the bus reserved by a device that wishes to use it?
• Chaos is avoided by a master-slave arrangement:
– Only the bus master can control access to the bus:
It initiates and controls all bus requests
– A slave responds to read and write requests
• The simplest system:
– Processor is the only bus master
– All bus requests must be controlled by the processor
– Major drawback: the processor is involved in every transaction
JR.S00 19
Multiple Potential Bus Masters:
the Need for Arbitration
• Bus arbitration scheme:
– A bus master wanting to use the bus asserts the bus request
– A bus master cannot use the bus until its request is granted
– A bus master must signal to the arbiter the end of the bus utilization
• Bus arbitration schemes usually try to balance two
factors:
– Bus priority: the highest priority device should be serviced first
– Fairness: Even the lowest priority device should never
be completely locked out from the bus
• Bus arbitration schemes can be divided into four broad
classes:
– Daisy chain arbitration
– Centralized, parallel arbitration
– Distributed arbitration by self-selection: each device wanting the bus places a code
indicating its identity on the bus.
– Distributed arbitration by collision detection:
Each device just “goes for it”. Problems found after the fact.
JR.S00 20
The Daisy Chain Bus
Arbitrations Scheme
Device 1
Highest
Priority
Grant
Bus
Arbiter
Device N
Lowest
Priority
Device 2
Grant
Grant
Release
Request
wired-OR
• Advantage: simple
• Disadvantages:
– Cannot assure fairness:
A low-priority device may be locked out indefinitely
– The use of the daisy chain grant signal also limits the bus speed
JR.S00 21
Centralized Parallel Arbitration
Device 1
Grant
Device 2
Device N
Req
Bus
Arbiter
• Used in essentially all processor-memory busses
and in high-speed I/O busses
JR.S00 22
Simplest bus paradigm
• All agents operate synchronously
• All can source / sink data at same rate
• => simple protocol
– just manage the source and target
JR.S00 23
Simple Synchronous Protocol
BReq
BG
R/W
Address
Data
Cmd+Addr
Data1
Data2
• Even memory busses are more complex than this
– memory (slave) may take time to respond
– it may need to control data rate
JR.S00 24
Typical Synchronous Protocol
BReq
BG
R/W
Address
Cmd+Addr
Wait
Data
Data1
Data1
Data2
• Slave indicates when it is prepared for data xfer
• Actual transfer goes at bus rate
JR.S00 25
Increasing the Bus Bandwidth
• Separate versus multiplexed address and data lines:
– Address and data can be transmitted in one bus cycle
if separate address and data lines are available
– Cost: (a) more bus lines, (b) increased complexity
• Data bus width:
– By increasing the width of the data bus, transfers of multiple words require fewer
bus cycles
– Example: SPARCstation 20’s memory bus is 128 bit wide
– Cost: more bus lines
• Block transfers:
–
–
–
–
Allow the bus to transfer multiple words in back-to-back bus cycles
Only one address needs to be sent at the beginning
The bus is not released until the last word is transferred
Cost: (a) increased complexity
(b) decreased response time for request
JR.S00 26
Increasing Transaction Rate on
Multimaster Bus
• Overlapped arbitration
– perform arbitration for next transaction during current transaction
• Bus parking
– master holds onto bus and performs multiple transactions as long as no
other master makes request
• Overlapped address / data phases
– requires one of the above techniques
• Split-phase (or packet switched) bus
– completely separate address and data phases
– arbitrate separately for each
– address phase yield a tag which is matched with data phase
• ”All of the above” in most modern memory buses
JR.S00 27
1993 CPU- Memory Bus Survey
Bus
Originator
Clock Rate (MHz)
Address lines
Data lines
Data Sizes (bits)
Clocks/transfer
Peak (MB/s)
Master
Arbitration
Slots
Busses/system
Length
MBus
Sun
40
36
64
256
320(80)
Multi
Central
1
Summit
HP
60
48
128
512
4
960
Multi
Central
16
1
13 inches
Challenge
SGI
48
40
256
1024
5
1200
Multi
Central
9
1
12? inches
XDBus
Sun
66
muxed
144 (parity)
512
4?
1056
Multi
Central
10
2
17 inches
JR.S00 28
Asynchronous Handshake (4-phase)
Write Transaction
Address
Master Asserts Address
Data
Master Asserts Data
Next Address
Read
Req
Ack
t0
•
•
•
•
•
•
t1
t2
t3
t4
t5
t0 : Master has obtained control and asserts address, direction, data
Waits a specified amount of time for slaves to decode target
t1: Master asserts request line
t2: Slave asserts ack, indicating data received
t3: Master releases req
t4: Slave releases ack
JR.S00 29
Read Transaction
Address
Master Asserts Address
Data
Next Address
Slave Data
Read
Req
Ack
t0
t1
t2
t3
t4
t5
• t0 : Master has obtained control and asserts address, direction,
data
•
Waits a specified amount of time for slaves to decode target\
• t1: Master asserts request line
• t2: Slave asserts ack, indicating ready to transmit data
• t3: Master releases req, data received
• t4: Slave releases ack
JR.S00 30
1993 Backplane/IO Bus Survey
Bus
SBus
TurboChannel MicroChannel PCI
Originator
Sun
DEC
IBM
Intel
Clock Rate (MHz) 16-25
12.5-25
async
33
Addressing
Virtual
Physical
Physical
Physical
Data Sizes (bits)
8,16,32
8,16,24,32
8,16,24,32,64
8,16,24,32,64
Master
Multi
Single
Multi
Multi
Arbitration
Central
Central
Central
Central
32 bit read (MB/s) 33
25
20
33
Peak (MB/s)
89
84
75
111 (222)
Max Power (W)
16
26
13
25
JR.S00 31
High Speed I/O Bus
• Examples
– graphics
– fast networks
• Limited number of devices
• Data transfer bursts at full rate
• DMA transfers important
– small controller spools stream of bytes to or from memory
• Either side may need to squelch transfer
– buffers fill up
JR.S00 32
PCI Read/Write Transactions
• All signals sampled on rising edge
• Centralized Parallel Arbitration
– overlapped with previous transaction
•
•
•
•
All transfers are (unlimited) bursts
Address phase starts by asserting FRAME#
Next cycle “initiator” asserts cmd and address
Data transfers happen on when
– IRDY# asserted by master when ready to transfer data
– TRDY# asserted by target when ready to transfer data
– transfer when both asserted on rising edge
• FRAME# deasserted when master intends to
complete only one more data transfer
JR.S00 33
PCI Read Transaction
– Turn-around cycle on any signal driven by more than one agent
JR.S00 34
PCI Write Transaction
JR.S00 35
The System-on-a-Chip Nightmare
System Bus
DMA
CPU
DSP
Mem
Ctrl.
Bridge
The “Board-on-a-Chip”
Approach
MPEG
I
O
O
C
Custom
Interfaces
Control Wires
Peripheral
Bus
JR.S00 36
Sonics SOC Integration Architecture
DMA
CPU
DSP
MPEG
{
MultiChip
Backplane™
SiliconBackplane™
(patented)
Open Core
Protocol™
C
MEM
I
O
SiliconBackplane
Agent™
JR.S00 37
Open Core Protocol Goals
•
•
•
•
•
Bus Independent
Scalable
Configurable
Synthesis/Timing Analysis Friendly
Encompass entire core/system interface needs (data,
control, and test flows)
JR.S00 38
Data, Control, and Test Flows
• Data Flow
– Signals and protocols associated with moving data
– Includes address, data, handshaking, etc.
– Similar to services provided by traditional computer buses
• Control Flow
– Signals and protocols associated with non-data communication
– Sideband - not synchronized to data flow (out of band)
– Examples include interrupts, high-level flow control, etc.
• Test Flow
– Signals and protocols related to debug and manufacturing test
JR.S00 39
OCP Overview
• Point-to-point, uni-directional, synchronous
– easy physical implementation
• Master/Slave, request/response
– well-defined, simple roles
• Extensions
– added functionality to support cores with more complex interface
requirements
• Configurability
– pay only for the features needed for a given core
JR.S00 40
Master vs. Slave
IP Core
Master
Open Core
Protocol
Initiator
Slave
IP Core
Master
IP Core
Slave
Slave
Request
Slave
Master
Target
Response
Master
On-Chip Bus
JR.S00 41
Basic OCP
MCmd [3]
MAddr [N]
MData [N]
SCmdAccept
SResp [3]
SData [N]
MCmd, MAddr
SCmdAccept
SResp, SData
MCmd, Maddr, MData
SCmdAccept
Slave
Master
Clk
Read:
Command, Address
Command Accept
Response, Data
Write (posted):
Command, Address, Data
Command Accept
JR.S00 42
Protocol Phases
• Request Phase (begins Transfer)
– Master presents request (command, address, etc.) to Slave
• Response Phase (ends Transfer)
– Slave presents response (success/fail, read data) to Master
– Only available for read transfers (posted write model)
• Datahandshake Phase (Optional)
– Allows pipelining request ahead of write data
– Only available for write transfers
• Phase ordering
– Request -> Datahandshake -> Response
JR.S00 43
OCP Extensions
• Simple Extensions
–
–
–
–
Byte Enables
Bursts
Flow Control
Data Handshake
• Complex Extensions
– Threads and Connections
• Sideband Signals
JR.S00 44
The Backplane: Why Not Use a Computer Bus?
Transmit FIFO
IP
Core
Receive FIFO
Arbiter
Address
Data
IP
Core
  
  
Computer
Bus
IP
Core
IP
Core
Time
• Expensive to decouple
• Not designed for real-time
JR.S00 45
Communication Buses Decouple
and Guarantee Real Time
Transmit FIFO
IP
Core
Receive FIFO
TDMA
TDMA
Data
IP
Core
  
  
Communications
Bus
IP
Core
IP
Core
Time
• Connections are expensive
• Poor read latency
JR.S00 46
SiliconBackplane™
Employs Best of Both
From Computing
• Address-based selection
• Write and read transfers
• Pipelining
DMA
CPU
C
MEM
From Communications
• Efficient BW decoupling
• Guaranteed BW & latency
• Side-band signaling
DSP
I
MPEG
O
JR.S00 47
Guaranteed Bandwidth Arbitration
• Independent arbitration for
every cycle includes two
phases:
- Distributed TDMA
- Round robin
Current
Slot
• Provides fine control over
system bandwidth
Arbitration
Command
JR.S00 48
Guaranteed Latency
• Fixed latency between command/address and
data/response phases
• Matches pipelined CPU model ensuring high
performance access to on-chip resources
• Pipelined data routed through SiliconBackplane™
• Latency re-programmable in software
• Variable-latency blocks do not tie up the
SiliconBackplane
JR.S00 49
Integrated Signaling Mechanism
• Dedicated SiliconBackplane™ wires
(Flags) support:
– Bus-style out-of-band signaling (interrupts)
– Point-to-point communications (flow control)
– Dynamic point-to-point (retry mechanism)
• Same design flow, timing, flexibility
as address/data portion of SonicsIA™
JR.S00 51
MultiChip Backplane™ Extends
SonicsIA™ Between Chips
SiliconBackplane
CPU-Based ASSP
MultiChip Backplane
FPGA
ASSP
Seamless integration of protocols
JR.S00 52
Validation / Test
• SiliconBackplane highly visible
for test
™
– All subsystems communicate
through SiliconBackplane
• Test Interfaces:
MultiChip
Backplane™
Test
Vectors
Test
Vectors
– MultiChip Backplane: 100’s MB/sec.
– ServiceAgent: Scan-based
• Each subsystem can be tested/validated
stand-alone
JR.S00 53
Summary
• Busses are an important technique for building
large-scale systems
– Their speed is critically dependent on factors such as
length, number of devices, etc.
– Critically limited by capacitance
– Tricks: esoteric drive technology such as GTL
• Important terminology:
– Master: The device that can initiate new transactions
– Slaves: Devices that respond to the master
• Two types of bus timing:
– Synchronous: bus includes clock
– Asynchronous: no clock, just REQ/ACK strobing
• System-on-a-Chip approach invites new
solutions
– Well-defined and clear communication protocols
– Physical layer hidden to designer
JR.S00 54
Download