Presented by: Quinn Gaumer CPS 221

advertisement
Presented by: Quinn Gaumer
CPS 221




16,384 Processing
Nodes (32 MHz)
30 m x 30 m
Teraflop
1992


With 16,384 processors the interconnect
plays a large role
3 Types of Networks
◦ Data
◦ Control
◦ Diagnostic







Easily Attainable High Performance
Scaling
Data Parallel Programming
High Reliability and Availability
Space/Time Shared
Fast Time to Market
Modular
Include
 Control Processor
 Processing Nodes
 Slices of Data and Control Networks
◦ Privileged vs. Non-Privileged


Program Isolation
Time Sharing



Provide Simple View of Network to Processors
Sharing and Fault Tolerance
Decouple Network/Processor by Providing
Contract
◦ Software -> ISA -> Hardware

“The data network promises to eventually
accept and deliver all messages injected into
the network by the processors as long as the
processors promise to eventually eject all
messages from the network when they are
delivered to the processors. ”

Collection of Memory Mapped FIFOs
◦ Outgoing/Ingoing

Restricted Operations
◦ Implemented with protected pages

Physical/Relative(Virtual) Address
◦ Programs use only relative addresses

Network Independent of User
◦ Delivery guaranteed by network not processing
node
◦ Requires network diagnostics

Fat Tree Structure
◦ Closer to the root, thicker the tree
◦ Ensures no bottlenecks at root

User Partitions and I/O are
Sub-trees
◦ Guarantees network
independence
◦ Messages in partition stay within
partition

Many Optimal Node to Node
Paths
◦ Choose randomly among open
links



Data can be only 1-5 Words
Wormhole Routing
CRC Checking done at every Link
◦ Additional !CRC sent when error first found

Primary Errors allow Diagnostic Network to
Determine location



Message Counters at every Link
Kirchoff’s Law to Determine Missing
Messages
What to do with a Bad Chip or Link?
◦ Route Messages Away from Failure
◦ Map Out Nearby Processors
◦ Which is better?
 Both.

Solution: Virtual Channels
◦ One channel for request and response
◦ 4 channels per chip (Incoming and Outgoing)

Deadlock still possible!
◦ User sends but never attempts to receive messages
◦ Higher level languages to implement
communication protocol

Objectives
◦ Clear all messages for new user
◦ Allow all messages in transit to
eventually finish

“All Fall Down” Method
◦ Evenly misroute all messages in
transit to nodes
◦ Message saved at node
◦ Resent when swapped in

Control Processor broadcasts program
◦ Not instructions(SIMD)


Each Processor runs program on data set
Inter-Processor Communication
◦ Hardware Barriers allow for processes to
communicate without shared semaphores

Program smaller than instructions
◦ Easier to deliver

Local fetch allows commodity processors
◦ Fast new RISC processors, less R & D.


Control system useful for other problems
Execution of generic MIMD code
◦ Message passing

Broadcasting
◦ User/Supervisor
◦ Interrupt
◦ Utility

Combining
◦ Reduction
◦ Forward/Backward Scan
◦ Router Done

Global Operations
◦ Synchronous/Asynchronous OR


Binary Tree
Four Types of Packets
◦
◦
◦
◦

Single Source : Broadcasting
Multiple Source: Combining
Idle: Filler
Abstain: Allow control node to skip waiting
Collisions on Network
◦ Multiple/Multiple: Buffering based on arrival time
◦ Multiple/Single: Single Source Packets Prioritized
◦ Single/Single: Error

Control Processor for each Partition
◦ Executes scalar code while processing nodes
execute parallel code

Connect any Control Processor to any
Partition
◦ Problems can occur in control networks too
◦ Diagnostics may show part of control network must
be mapped out

Binary Network
◦ Pods(physical subsystem)
are leaves

JTAG
◦ Designed for Multichip…but
serial


Do JTAG for each Pod
Combine Responses with
OR/AND
Download