Presented by: Quinn Gaumer CPS 221 16,384 Processing Nodes (32 MHz) 30 m x 30 m Teraflop 1992 With 16,384 processors the interconnect plays a large role 3 Types of Networks ◦ Data ◦ Control ◦ Diagnostic Easily Attainable High Performance Scaling Data Parallel Programming High Reliability and Availability Space/Time Shared Fast Time to Market Modular Include Control Processor Processing Nodes Slices of Data and Control Networks ◦ Privileged vs. Non-Privileged Program Isolation Time Sharing Provide Simple View of Network to Processors Sharing and Fault Tolerance Decouple Network/Processor by Providing Contract ◦ Software -> ISA -> Hardware “The data network promises to eventually accept and deliver all messages injected into the network by the processors as long as the processors promise to eventually eject all messages from the network when they are delivered to the processors. ” Collection of Memory Mapped FIFOs ◦ Outgoing/Ingoing Restricted Operations ◦ Implemented with protected pages Physical/Relative(Virtual) Address ◦ Programs use only relative addresses Network Independent of User ◦ Delivery guaranteed by network not processing node ◦ Requires network diagnostics Fat Tree Structure ◦ Closer to the root, thicker the tree ◦ Ensures no bottlenecks at root User Partitions and I/O are Sub-trees ◦ Guarantees network independence ◦ Messages in partition stay within partition Many Optimal Node to Node Paths ◦ Choose randomly among open links Data can be only 1-5 Words Wormhole Routing CRC Checking done at every Link ◦ Additional !CRC sent when error first found Primary Errors allow Diagnostic Network to Determine location Message Counters at every Link Kirchoff’s Law to Determine Missing Messages What to do with a Bad Chip or Link? ◦ Route Messages Away from Failure ◦ Map Out Nearby Processors ◦ Which is better? Both. Solution: Virtual Channels ◦ One channel for request and response ◦ 4 channels per chip (Incoming and Outgoing) Deadlock still possible! ◦ User sends but never attempts to receive messages ◦ Higher level languages to implement communication protocol Objectives ◦ Clear all messages for new user ◦ Allow all messages in transit to eventually finish “All Fall Down” Method ◦ Evenly misroute all messages in transit to nodes ◦ Message saved at node ◦ Resent when swapped in Control Processor broadcasts program ◦ Not instructions(SIMD) Each Processor runs program on data set Inter-Processor Communication ◦ Hardware Barriers allow for processes to communicate without shared semaphores Program smaller than instructions ◦ Easier to deliver Local fetch allows commodity processors ◦ Fast new RISC processors, less R & D. Control system useful for other problems Execution of generic MIMD code ◦ Message passing Broadcasting ◦ User/Supervisor ◦ Interrupt ◦ Utility Combining ◦ Reduction ◦ Forward/Backward Scan ◦ Router Done Global Operations ◦ Synchronous/Asynchronous OR Binary Tree Four Types of Packets ◦ ◦ ◦ ◦ Single Source : Broadcasting Multiple Source: Combining Idle: Filler Abstain: Allow control node to skip waiting Collisions on Network ◦ Multiple/Multiple: Buffering based on arrival time ◦ Multiple/Single: Single Source Packets Prioritized ◦ Single/Single: Error Control Processor for each Partition ◦ Executes scalar code while processing nodes execute parallel code Connect any Control Processor to any Partition ◦ Problems can occur in control networks too ◦ Diagnostics may show part of control network must be mapped out Binary Network ◦ Pods(physical subsystem) are leaves JTAG ◦ Designed for Multichip…but serial Do JTAG for each Pod Combine Responses with OR/AND