By Nitin Bahadur Gokul Nadathur Department of Computer Sciences University of Wisconsin-Madison Spring 2000 Talk Outline • • • • • • • Motivation and Goals General Architecture of the middleware Components of the middleware Providing reliability - handling of node failures Applications developed using the middleware Performance Conclusions and possible extensions Multicast / Reduction Trees Spring 2000 2 Motivation and Goals • A middleware for an application with Master - Worker paradigm • Scalable framework for communication and computing client response (“Reduction”) • Unicast does not scale - so use multicast • Introducing reduction operations dynamically in clients • A general framework for communication among clients Multicast / Reduction Trees Spring 2000 3 The Big Picture... Sends queries Reduces results Hands back results to application Master App ARTL Client App ARTL Client App ARTL Client App ARTL Multicast / Reduction Trees Execute responses to queries Forward queries downstream Reduces incoming results Sends reduced results to master Executes responses to queries Sends back results towards master Spring 2000 4 ART - Library Architecture Application specific callbacks Application Application API ARTL specific message Framework for processing messages Event Handler Outgoing message Incoming Packet Reduction functions ARTL Communication Layer Network ARTL messages :1. Query from master 2. Response from downstream nodes Multicast / Reduction Trees Spring 2000 5 ART - Library Architecture Application specific callbacks Application Application API ARTL specific message Framework for processing messages Event Handler Outgoing message Incoming Packet Reduction functions ARTL Communication Layer Network ARTL messages :1. Query from master 2. Response from downstream nodes Multicast / Reduction Trees Spring 2000 6 Communication Subsystem • Connection Setup – Connect nodes as a Binomial tree • Send and receive ARTL and application messages • Detect node failure and act accordingly • Integrate restarted node in current tree structure Multicast / Reduction Trees Spring 2000 7 Why use Binomial Tree Client App Master App 1 Client App Client App 3 2 Master App Client App 2 1 2 Client App Client App Binomial Tree Query Propagation time = 2 Multicast / Reduction Trees Unicast Mechanism Query Propagation time = 3 Spring 2000 8 Reduction Reduction at 5 and 3 5 7 1 3 6 4 Responses 2 Example Reduction operations: Min(), Max() 8 Multicast / Reduction Trees Spring 2000 9 Tree connection setup 1 5 7 3 6 2 4 8 Multicast / Reduction Trees Spring 2000 10 Tree Setup - Phase I 1 5 7 8 Multicast / Reduction Trees 3 6 2 4 TCP connection setup Spring 2000 11 Tree Setup - Phase II 1 5 7 8 Multicast / Reduction Trees 3 6 2 4 TCP connection setup Spring 2000 12 Tree Setup - Phase III 1 5 7 8 Multicast / Reduction Trees 3 6 2 4 TCP connection setup Spring 2000 13 Inter node communication ARTL Header Data • Unicast and multicast data transmission • ARTL receives application messages for which no receive has been posted – these are sent to a callback function registered by application • ARTL receives data on behalf of application when application explicitly posts a receive Multicast / Reduction Trees Spring 2000 14 ART - Library Architecture Application specific callbacks Application Application API ARTL Encapsulated message Framework for processing messages Event Handler Outgoing message Incoming Packet Reduction functions ARTL Communication Layer Network ARTL messages :1. Query from master 2. Response from downstream nodes Multicast / Reduction Trees Spring 2000 15 Reduction Functions • Implemented as Shared objects • Sent to client during Setup phase • Each reduction function is associated with a particular response it reduces Multicast / Reduction Trees Spring 2000 16 Event Handler Responses for the shaded entry from down stream nodes Table containing Query id and Callback information for currently registered queries Reduced response sent upstream Run Queue of reduction/response operations Response Callback Multicast / Reduction Trees Network Thread Pool Event Handler Application Spring 2000 17 Multithreaded Architecture • No prior Knowledge about behavior of reduction function • Exploit concurrency - multiple processor per node • Static Pool of threads - Creation and destruction of threads is bad (Firefly RPC) Multicast / Reduction Trees Spring 2000 18 Crash Reconfiguration 1 5 7 3 6 2 4 8 Multicast / Reduction Trees Spring 2000 19 Crash Reconfiguration 1 5 7 8 3 6 4 Crash Reconfiguration at depth 1 Multicast / Reduction Trees Spring 2000 20 Crash Reconfiguration 1 5 7 8 3 4 6 Crash Reconfiguration at depth 2 Multicast / Reduction Trees Spring 2000 21 Crash Reconfiguration 1 5 7 8 3 6 2 4 Crash Reconfiguration at depth 1 Multicast / Reduction Trees Spring 2000 22 Crash Reconfiguration 1 3 7 8 6 2 4 Crash Reconfiguration at depth 1 Multicast / Reduction Trees Spring 2000 23 Crash Detection • Break in TCP connection with parent/child – a signal is received at the other end of connection • Use of periodic refresh messages to inform parent that child is up and running – useful in WAN environments Multicast / Reduction Trees Spring 2000 24 Crash Handling • Parent of node down informs master • All nodes are informed of a node failure • Master recomputes tree – If leaf node down, then no problem – If intermediate node down, some reconfiguration is required Multicast / Reduction Trees Spring 2000 25 Node Restart • Restarted node contacts master to tell it about restart • Master sends it current state of network and the shared object(s) • All nodes are informed of a node restart • Master recomputes tree and informs the new node’s parent about its new child • Parent and child establish connections Multicast / Reduction Trees Spring 2000 26 SysMon - A System monitor Monitors the load average from /proc displays Min, Max and average loads Per-node load is also displayed ARTL Reduction operations : Min, Max and Average Multicast / Reduction Trees Spring 2000 27 SysMon - A System monitor Node failures are detected and SysMon pops up an alert Multicast / Reduction Trees Spring 2000 28 File Transfer Application • Transfers a file from master to all clients • File can be executed at clients (if required) – execution can be instantaneous on receiving file – execution can be delayed until all nodes have received the file Multicast / Reduction Trees Spring 2000 29 File Transfer Performance Time in seconds File Transfer Time for 40 MB file 180 160 140 120 100 80 60 40 20 0 Unicast File Transfer time Multicast File Transfer time Expected multicast file transfer time 2 4 8 16 Total number of nodes Multicast / Reduction Trees Spring 2000 30 Total Startup Time vs Number of Nodes Client processes started using ssh on different machines Total Startup Time Time in sec 20 15 10 Startup time in sec 5 0 2 4 8 16 32 Number of Nodes Multicast / Reduction Trees Spring 2000 31 Conclusions and Extensions • A middleware for dynamic operations • Support for crash detection, recovery and dynamic processes • Demonstrated near optimal speedup using real applications • Making response function dynamic - active services • Differential scheduling in thread scheduler for QoS • Making dynamic code secure Multicast / Reduction Trees Spring 2000 32