Presented by: Quinn Gaumer CPS 221

Presented by: Quinn Gaumer
CPS 221
16,384 Processing
Nodes (32 MHz)
30 m x 30 m
With 16,384 processors the interconnect
plays a large role
3 Types of Networks
◦ Data
◦ Control
◦ Diagnostic
Easily Attainable High Performance
Data Parallel Programming
High Reliability and Availability
Space/Time Shared
Fast Time to Market
 Control Processor
 Processing Nodes
 Slices of Data and Control Networks
◦ Privileged vs. Non-Privileged
Program Isolation
Time Sharing
Provide Simple View of Network to Processors
Sharing and Fault Tolerance
Decouple Network/Processor by Providing
◦ Software -> ISA -> Hardware
“The data network promises to eventually
accept and deliver all messages injected into
the network by the processors as long as the
processors promise to eventually eject all
messages from the network when they are
delivered to the processors. ”
Collection of Memory Mapped FIFOs
◦ Outgoing/Ingoing
Restricted Operations
◦ Implemented with protected pages
Physical/Relative(Virtual) Address
◦ Programs use only relative addresses
Network Independent of User
◦ Delivery guaranteed by network not processing
◦ Requires network diagnostics
Fat Tree Structure
◦ Closer to the root, thicker the tree
◦ Ensures no bottlenecks at root
User Partitions and I/O are
◦ Guarantees network
◦ Messages in partition stay within
Many Optimal Node to Node
◦ Choose randomly among open
Data can be only 1-5 Words
Wormhole Routing
CRC Checking done at every Link
◦ Additional !CRC sent when error first found
Primary Errors allow Diagnostic Network to
Determine location
Message Counters at every Link
Kirchoff’s Law to Determine Missing
What to do with a Bad Chip or Link?
◦ Route Messages Away from Failure
◦ Map Out Nearby Processors
◦ Which is better?
 Both.
Solution: Virtual Channels
◦ One channel for request and response
◦ 4 channels per chip (Incoming and Outgoing)
Deadlock still possible!
◦ User sends but never attempts to receive messages
◦ Higher level languages to implement
communication protocol
◦ Clear all messages for new user
◦ Allow all messages in transit to
eventually finish
“All Fall Down” Method
◦ Evenly misroute all messages in
transit to nodes
◦ Message saved at node
◦ Resent when swapped in
Control Processor broadcasts program
◦ Not instructions(SIMD)
Each Processor runs program on data set
Inter-Processor Communication
◦ Hardware Barriers allow for processes to
communicate without shared semaphores
Program smaller than instructions
◦ Easier to deliver
Local fetch allows commodity processors
◦ Fast new RISC processors, less R & D.
Control system useful for other problems
Execution of generic MIMD code
◦ Message passing
◦ User/Supervisor
◦ Interrupt
◦ Utility
◦ Reduction
◦ Forward/Backward Scan
◦ Router Done
Global Operations
◦ Synchronous/Asynchronous OR
Binary Tree
Four Types of Packets
Single Source : Broadcasting
Multiple Source: Combining
Idle: Filler
Abstain: Allow control node to skip waiting
Collisions on Network
◦ Multiple/Multiple: Buffering based on arrival time
◦ Multiple/Single: Single Source Packets Prioritized
◦ Single/Single: Error
Control Processor for each Partition
◦ Executes scalar code while processing nodes
execute parallel code
Connect any Control Processor to any
◦ Problems can occur in control networks too
◦ Diagnostics may show part of control network must
be mapped out
Binary Network
◦ Pods(physical subsystem)
are leaves
◦ Designed for Multichip…but
Do JTAG for each Pod
Combine Responses with