Advanced Computer Architecture CSE 8383 April 24, 2008 Session 12 Computer Science and Engineering Contents Message Passing Systems (Chapters 5 & 7) Communication Patterns Network Computing Client/Server System Clusters Grid Interconnection Networks Computer Science and Engineering Message Passing Mechanisms Message Format Message arbitrary number of fixed length packets Packet basic unit containing destination address. Sequence number is needed A packet can further be divided into flits (flow control digits) Routing and sequence occupy header flit Computer Science and Engineering Message, Packets, Flits Message Packet Destination Data flit Sequence Computer Science and Engineering Store and Forward Routing Packets are the basic units of information flow Each node uses a packet buffer A packet is transferred from S to D through a sequence of intermediate nodes Channel and buffer must be available Computer Science and Engineering Wormhole Routing Flits are the basic units of information flow Each node uses a flit buffer Flits are transferred from S to D through a sequence of intermediate routers in order (Pipeline) Can be visualized as a railroad train Flits from different packets cannot be mixed up Computer Science and Engineering Latency Analysis L packet length (in bits) W Channel bandwidth (bits/sec) D Distance (number of hops) F flit length (in bits) Computer Science and Engineering Store and Forward Latency L W L W D L W TSF Computer Science and Engineering WH Latency L W D TW Computer Science and Engineering Latency Analysis L packet length (in bits) W Channel bandwidth (bits/sec) D Distance (number of hops) F flit length (in bits) TSF = D * L/W TWH = L/W + D* F/W L/W if L>>F (independent of D) Computer Science and Engineering Communication Patterns Point to Point 1 - 1 Multicast 1 - n Broadcast 1 - all Conference n - n Computer Science and Engineering Routing potential problems Deadlock: When 2 messages, each is holding the resources required by the other in order to move, both messages will be blocked (cyclic dependency for resources) Straightforward solution (but inefficient) is rerouting Another solution is avoidance of occurrence of deadlock using a strict monotonic order of network resources Channel dependency graph (CDG) is a technique for developing a deadlock-free routing algorithm. Computer Science and Engineering A 4-node network and its CDGs c1 1 0 c4 c6 3 c5 c7 c1 c2 c3 c4 c5 c6 c7 c8 c8 c2 2 c3 (a) A 4-node network c1 c5 (b) Channel dependency graph (CDG) c2 c6 c3 c7 c4 c8 (c) CDG for a deadlock-free version of the network Computer Science and Engineering Livelock: A message goes around the network and never reaches its destination It results from using adaptive routing algorithms with dynamic injection, where nodes inject their messages in the network at arbitrary times Policies to avoid livelock are based on assigning a priority to a message injected to the network: Messages are routed according to their priorities Once a message is injected, only a finite number of messages will be injected with higher or equal priority. Computer Science and Engineering Starvation: A node suffers from starvation if it has a message to inject into the network but is never allowed to do so. The simplest policy to avoid starvation is to allow each node to have an injection queue that competes with the queues of the incoming links to the same node. The main disadvantage is that a node with a high message injection rate can slow down all the other nodes in the network. Computer Science and Engineering Routing Efficiency Two Parameters Channel Traffic (number of channels used to deliver the message involved) Communication Latency (distance) Computer Science and Engineering Multicast on a mesh (5 unicasts) Traffic ? Latency ? Computer Science and Engineering Multicast on a mesh (multicast pattern 1) Traffic ? Latency ? Computer Science and Engineering Multicast on a mesh (multicast pattern 2) Traffic ? Latency ? Computer Science and Engineering Broadcast (tree structure) 3 2 3 4 2 1 2 3 1 2 1 Computer Science and Engineering Message Passing in PVM (Revisit) Sending Task User application 1 3 Daemon User application Library 4 2 Receiving Task 5 Library 8 6 7 Daemon Computer Science and Engineering Standard PVM asynchronous communication A sending task issues a send command (point 1) The message is transferred to the daemon (point 2) Control is returned to the user application (points 3 & 4) The daemon will transmit the message on the physical wire sometime after returning control to the user application (point 3) Computer Science and Engineering Standard PVM asynchronous communication (cont.) The receiving task issues a receive command (point 5) at some other time In the case of a blocking receive, the receiving task blocks on the daemon waiting for a message (point 6). After the message arrives, control is returned to the user application (points 7 & 8) In the case of a non-blocking receive, control is returned to the user application immediately (points 7 & 8) Computer Science and Engineering Send (3 steps) 1. 2. 3. A send buffer must be initialized The message is packed into the buffer The completed message is sent to its destination(s) Computer Science and Engineering Receive (2 steps) 1. 2. The message is received The received items are unpacked Computer Science and Engineering Message Buffers Buffer Creation (before packing) Bufid = pvm_initsend(encoding_option) Bufid = pvm_mkbuf(encoding_option) Encoding option 0 1 2 Meaning XDR No encoding Leave data in place Computer Science and Engineering Message Buffers (cont.) Data Packing pvm_pk*() pvm_pkstr() – one argument pvm_pkstr(“This is my data”); Others – three arguments 1. 2. 3. Pointer to the first item Number of items to be packed Stride pvm_pkint(my_array, n, 1); Packing functions can be called multiple times to pack data into a single message Computer Science and Engineering Sending a message Point to point (one receiver) info = pvm_send(tid, tag) broadcast (multiple receivers) info = pvm_mcast(tids, n, tag) info = pvm_bcast(group_name, tag) Pack and Send (one step) info = pvm_psend(tid, tag, my_array, length, data type) Computer Science and Engineering Receiving a message Blocking bufid = pvm_recv(tid, tag) -1 wild card in either tid or tag Nonblocking bufid = pvm_nrecv(tid, tag) bufid = 0 (no message was received) Timeout bufid = pvm_trecv(tid, tag, timeout) bufid = 0 (no message was received) Computer Science and Engineering Different Receive in PVM Time Funciton is called Pvm_recv() wait Pvm_nrecv() Continue execution Time is expired Message arrival Pvm_trecv() wait Resume execution Resume execution Blocking Non-blocking Timeout Computer Science and Engineering Data unpacking pvm_upk*() pvm_upkstr() – one argument pvm_upkstr(string); Others – three arguments 1. 2. 3. Pointer to the first item Number of items to be unpacked Stride pvm_upkint(my_array, n, 1); Computer Science and Engineering Networks Computing Four categories WAN MAN LAN SAN Internet TCP/IP Computer Science and Engineering Other Network technologies Fast Ethernet and Gigabit Ethernet The Fiber Distributed Data Interface (FDDI) High-Performance Parallel Interface (HIPPI) Asynchronous Transfer Mode (ATM) Scalable Coherent Interface (SCI) Computer Science and Engineering 10Gbps SCI HiPPI 1000Mbps 1000 Base T ATM FDDI 100Mbps 100 Base T 10Mbps 10 Base T SAN LAN MAN WAN A representation of network technologies Computer Science and Engineering Client/Server Systems Client Client Server Server Threads Interconnection Network Computer Science and Engineering Sockets Sockets are used to provide the capability of making connections from one application running on one machine to another running on a different machine. Once a socket is created, it can be used to wait for an incoming connection (passive socket) or can be used to initiate connection (active socket). Client Serve r A Socket Connection Computer Science and Engineering A Client Server Framework for Parallel Applications Server 1 Server 2 Server 3 Slaves (Workers) Server n Interconnection Network Client Master (Supervisor) Computer Science and Engineering Computer Clusters Advances in commodity processors and network technology Network of PCs and workstations connected via LAN or WAN forms a Parallel System Compete favorably (cost/performance) Computer Science and Engineering Cluster Architecture Programming Environment Middleware OS M C P OS I/O M C OS I/O P Interconnection Network M C I/O P Home cluster Computer Science and Engineering Grids Geographically distributed platforms. Internet Dependable, consistent, pervasive, and inexpensive access to high end computing. Computer Science and Engineering Interconnection Networks Ethernet A packet-switched LAN technology. All hosts connected to an Ethernet receive every transmission, making it possible to broadcast a packet to all hosts at the same time. Ethernet uses a distributed access control scheme called Carrier Sense Multiple Access with Collision Detect (CSMA/CD). Each computer connected to an Ethernet network is assigned a unique 48-bit address known as its Ethernet address, also called the media access control address, (MAC). Computer Science and Engineering Switches A n1 x n2 switch consists of: n1 input ports n2 output ports Links connecting each input to every output Control logic to select a specific connection Internal buffers The connections between input ports and output ports may be: One-to-one (point-to-point) One-to-many (multicast or broadcast) Many-to-one: may cause conflicts at the output ports and needs arbitration. Computer Science and Engineering When only one-to-one connections are allowed, the switch is called crossbar. An n x n crossbar switch can establish n! connections. If we allow both one-to-one as well as one-to-many in an n x n switch, the number of connections that can be established is nn. (We discussed this before, remember?) Computer Science and Engineering Routing can be achieved using 2 mechanisms: Source-path: the entire path to the destination is stored in the packet header at the source location. Table-based: the switch must have a complete routing table that determines the corresponding port for each destination. 5 0 6 Dest-id Port 0 Port 4 Port 0 Port 4 Port 1 Port 5 Port 1 Port 5 Port 2 Port 6 Port 2 Port 6 Port 3 Port 7 id 6 Port 3 Port 7 Routing table Source-path Routing versus Table-based Routing Computer Science and Engineering Myrinet Clos network Myrinet is a high-performance, packet communication and switching technology. Myrinet switches are multiple-port components that route a packet entering on an input channel of a port to the output channel of the port selected by the packet. Computer Science and Engineering Myrinet Clos network Network Spine Clos “Spreader” Network Connects Spine (upper 8 switches) to Leaves (16 lower switches) 128 Hosts 128-host Clos Network using 16-port Myrinet Switch Computer Science and Engineering Myrinet Clos network Network Spine 2 links each 64 Hosts 64-host Clos Network using 16-port Myrinet Switch (Each line represents 2 links) Computer Science and Engineering Myrinet Clos network Network Spine 4 links each 32 hosts 32-host Clos Network using 16-port Myrinet Switch (Each line represents 4 links) Computer Science and Engineering The Quadrics network (QsNet) Consists of 2 hardware building blocks A programmable network interface called Elan: connects the Quadrics network to a processing node containing one or more CPUs Elan provides substantial local processing power to implement high-level message passing protocols (ex: MPI). High-bandwidth, low-latency communication switch called Elite: QsNet connects Elite switches in a quaternary fat-tree topology. Computer Science and Engineering The Quadrics network (QsNet) Processing Nodes Computer Science and Engineering