CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University Frequently asked questions from the previous class survey CS 555: DISTRIBUTED SYSTEMS ¨ ¨ [MESSAGING SYSTEMS] ¨ ¨ Daisy chain MapReduce jobs? Multiple mappers or reducers in a row Mapper has produced intermediate results and then fails before data is fetched by reducer. What happens? If we run the same job multiple times are the combiner groups the same every time? Problem domains where MapReduce is not useful? ¤ ¨ Shrideep Pallickara Computer Science Colorado State University ¨ ¨ ¤ ¨ ¨ ¨ October 6, 2015 CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.1 ¨ ¤ IP ¨ Multicast ¤ Indirect communications ¤ Group-based communications CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.3 Interfaces to the transport layer ¨ October 6, 2015 Instructor: SHRIDEEP PALLICKARA Messaging Systems October 6, 2015 Instructor: SHRIDEEP PALLICKARA org.apache.hadoop.mapred.SkipBadRecords Setting number of mappers: conf.setNumMapTasks(int num) Can we define a combiner that is different from the reducer? If mapper produces (k1, v1) can combiner produce (k2, v2)? CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University Sockets Message Passing Interface (MPI) October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University Primitive Meaning ¤ Introduced Socket Create a new communication endpoint Bind Attach a local address to a socket Listen Announce willingness to accept connections Accept Block caller until a request arrives Connect Actively attempt to establish a connection Send Send some data over the connection Receive Receive some data over the connection Close Release the connection in the 1970s in Berkley Unix X/Open Transport Interface (XTI) by AT&T ¨ Sockets and XTI are very similar October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University SLIDES CREATED BY: SHRIDEEP PALLICKARA L13.4 Socket primitives for TCP/IP Sockets interface ¨ L13.2 Several distributed systems built on top of the service offered by transport layer Topics covered in this lecture ¨ Alternatives to Hadoop? If source code contains while(true) will Hadoop skip it? How do static variables work in Mapper/Reducer Can we configure skip bad records in MapReduce? L13.5 October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.6 L13.1 CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University The communication pattern using sockets Server socket Message Passing Interface (MPI) ¨ bind listen accept read connect close ¨ Synchronization point socket write Communications write read High performance computers need highly efficient communications Primitives must be: ¤ At the right level of abstraction must have minimal overhead ¤ Implementation close Client October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.7 Sockets were deemed inefficient … ¨ Wrong level of abstraction ¤ Only ¨ ¨ for network communications general-purpose stacks ¤ Not suitable for proprietary protocols in HPC ¨ ¤ Uses CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University Most HPC systems were shipped with proprietary communication libraries ¤ High-level Performance issues ¤ Designed Need for hardware and platform independence led to a standard and efficient communication primitives Of course they were all mutually incompatible ¤ Portability L13.9 became an issue CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University October 6, 2015 Instructor: SHRIDEEP PALLICKARA Message Passing Interface (MPI) ¨ Each group is assigned an identifier ¨ Designed for parallel applications ¨ Each process within a group has an identifier ¨ Tailored for transient communications ¨ (groupId, processId) pair ¨ Assumes process crashes/partitions are fatal ¤ Uniquely ¤ Instead automatic recovery October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University SLIDES CREATED BY: SHRIDEEP PALLICKARA L13.10 MPI assumes communications takes place within a group of processes ¨ ¤ No L13.8 Result? simple {send, receive} primitives October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University October 6, 2015 Instructor: SHRIDEEP PALLICKARA L13.11 October 6, 2015 Instructor: SHRIDEEP PALLICKARA indentifies endpoints of transport-level addresses CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.12 L13.2 CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University Some of the message passing primitives in MPI MPI offers a very large number of communication primitives Primitive Meaning MPI_bsend Append outgoing message to local send buffer MPI_send Send message and wait until it is copied to local or remote buffer MPI_ssend Send message and wait until receipt starts ¨ MPI_isend ¤ More ¤ Pick ¨ MPI_sendrecv Send message and wait for reply Pass reference to outgoing message Wait until receipt starts MPI_recv Receive a message; block if there is none MPI_irecv Check if there is an incoming message; But do not block CS555: Distributed Systems [Fall 2015] October 6, 2015 Instructor: SHRIDEEP PALLICKARA Dept. Of Computer Science, Colorado State University possibilities to improve performance and choose most suitable one Cynical view ¤ Committee Pass reference to outgoing message, and continue MPI_issend Official reason ¤ Threw L13.13 could not make up its collective mind in everything! CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University October 6, 2015 Instructor: SHRIDEEP PALLICKARA L13.14 Multicast communications ¨ Uses datagram packets ¨ Sender sends packet to multicast address ¤ 224.0.0.0 ¨ IPv6 ¤ Multicast MULTICAST COMMUNICATIONS CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University October 6, 2015 L13.15 Multicast communications: Routing data ¨ Routers make sure packet delivered to all hosts in multicast group ¤ Choose ¨ points where streams are duplicated to 239.255.255.255 Class D addresses have the prefix ff00::/8 Bits 8 4 4 112 Field Prefix Flags Scape Group ID October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.16 Multicast issues ¨ Packet size restrictions in Multicast ¨ Practical considerations ¤ Turned off at several institutions to curb free-riding Pay attention to TTL for the datagrams ¤ Maximum October 6, 2015 Instructor: SHRIDEEP PALLICKARA number of routers a datagram is allowed to cross CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University SLIDES CREATED BY: SHRIDEEP PALLICKARA L13.17 October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.18 L13.3 CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University Multicast is not particularly suitable in some situations ¤ Consumption patterns change dynamically n ¤ Groups cannot be pre-allocated Representing consumption profiles as groups n n Enormous number of groups – potentially 2N for N consumers Eliminating impossible groups would still require millions of groups APPLICATION LEVEL MULTICASTING October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.19 Application-level multicasting ¨ Sending data to multiple receivers ¨ ¨ ¨ up communication paths CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University Organize nodes into a mesh network node has multiple neighbors paths exist between pairs of nodes ¤ More robust to failures ¤ Multiple are reluctant to support this October 6, 2015 Instructor: SHRIDEEP PALLICKARA path between every pair of nodes ¤ Every No convergence on protocols ¤ ISPs Nodes organize into a tree ¤ Unique Belonged in the domain of network protocols ¤ Set L13.20 Construction of the overlay network ¤ Multicasting ¨ CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University October 6, 2015 L13.21 October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.22 Use of rendezvous nodes ¨ ¨ New node contacts rendezvous node to get a partial list of nodes New node uses this list to determine which nodes to connect to ¤ Metrics n Load, used in decision making delays, bandwidths, etc INDIRECT COMMUNICATIONS October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University SLIDES CREATED BY: SHRIDEEP PALLICKARA L13.23 October 6, 2015 CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.24 L13.4 CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University Direct coupling Indirect communications ¨ ¨ Communications between entities in a distributed system through an intermediary Senders and receivers have no direct coupling ¨ E.g. TCP communications between 2 endpoints ¨ Leads to a rigidity in terms of dealing with change ¨ Example ¤ A client-server system to replace a server with another one that offers equivalent functionality ¤ Difficult October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.25 Key properties of using an intermediary for communications ¨ Space uncoupling ¨ Time uncoupling October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.26 Space uncoupling ¨ Sender does not know (or need to know) the identity of receivers ¤ And ¨ vice versa Developer has a degree of freedom in dealing with change ¤ Participants (senders or receivers) can be replaced, updated, replicated or migrated October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.27 Sender and receiver(s) can have independent lifetimes ¨ Do not need to exist at the same time to communicate ¨ Useful in volatile environments CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University SLIDES CREATED BY: SHRIDEEP PALLICKARA ¨ mobile environments Event disseminations ¤ Situations senders and receivers join and leave constantly October 6, 2015 Instructor: SHRIDEEP PALLICKARA L13.28 In distributed systems where change is anticipated ¤ E.g. ¨ ¤ Where CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University Indirect communications: Primary uses Time uncoupling ¨ October 6, 2015 Instructor: SHRIDEEP PALLICKARA L13.29 where receivers are unknown and may be liable to change October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.30 L13.5 CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University Indirect communications: Disadvantages ¨ ¨ Indirection may imply decoupling in space and time …. but not always the case (1/2) Performance overhead due to the added level of indirection ¨ Space-uncoupled, time-coupled ¨ IP multicast Systems developed using this concept are more difficult to manage ¤ Space ¤ Precisely ¤ Time because of the lack of coupling n All October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.31 Indirection may imply decoupling in space and time …. but not always the case (2/2) ¨ uncoupled n Messages receivers must exist at the time of the message-send to October 6, 2015 Instructor: SHRIDEEP PALLICKARA Time-uncoupled Space coupling Communication directed towards a given receiver(s) Receiver(s) must exist at that moment in time E.g.: Remote invocation Communication directed towards a given receiver(s) Sender and Receiver(s) can have independent lifetimes E.g.: E-mail Space uncoupling Sender does not need to know the identity of the receiver(s) Receiver(s) must exist at that moment in time E.g.: IP Multicast Sender does not need to know the identity of the receiver(s) Sender and Receiver(s) can have independent lifetimes E.g.: publish/subscribe Example: ¤ E-mail October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.33 L13.32 Time-coupled ¤ Sender ¨ CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University Space and time coupling in distributed systems Space-coupled but time-uncoupled knows the identity of specific receiver or a set of receivers ¤ Receiver(s) may not exist at the time of sending directed to a group, not a specific receiver coupled October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.34 Time-coupling and asynchronous communications ¨ Asynchronous communications ¤ Sender ¨ sends message and continues without blocking Time uncoupling ¤ Extra dimension where sender and receiver can have independent existence GROUP COMMUNICATIONS October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University SLIDES CREATED BY: SHRIDEEP PALLICKARA L13.35 October 6, 2015 CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.36 L13.6 CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University Group communications Group communications ¨ ¨ ¨ Service where message is sent to a group ¨ May be implemented over ¤ IP Message is then delivered to all members of the group Multicast network ¤ Overlay Sender is not aware of the identities of the receivers ¨ Adds extra value in terms of ¤ Managing group membership, detecting failures and providing reliability, and ordering guarantees ¨ Group communication is to IP Multicast ¤ What October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.37 The programming model ¨ October 6, 2015 Instructor: SHRIDEEP PALLICKARA Central concept is a group ¨ is associated with group membership ¤ Processes may join or leave the group aGroup.send(aMessage) ¤ It to all members with guarantees relating to reliability and ordering does not issue multiple send operations to individual processes ¨ Communications: ¤ All L13.38 Process issues only one operation to send the message to the group ¤ E.g. Process sends message to this group ¤ Propagated ¨ CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University Key feature of group communications ¤ This ¨ TCP is to point-to-point service in IP This has much more than convenience implications for the programmer members in a group: broadcast with a single member: unicast ¤ Communication October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.39 Advantage of a single group-send over multiple individual-send operations ¨ Enables efficient utilization of bandwidth steps to send the message no more than once over a communication link n For e.g., send it over a distribution tree n Use network hardware support for multicast when available Minimize total time taken to deliver message to all destinations ¤ As compared to transmitting separately and serially October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University SLIDES CREATED BY: SHRIDEEP PALLICKARA L13.41 CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.40 This is also important in terms of delivery guarantees ¨ ¤ Take ¨ October 6, 2015 Instructor: SHRIDEEP PALLICKARA If a process performs multiple, independent send operations? ¤ No way to provide guarantees that affect the group as a whole ¤ If sender fails halfway through sending? n Some members receive while others don’t ¤ Relative ordering of two messages delivered to any two group members is also undefined October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.42 L13.7 CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University As opposed to multiple, independent individual messages … ¨ Group communications can provide a range of guarantees for: Process groups ¨ Closed ¤ If ¤ Reliability ¨ ¤ Ordering only members of the group may multicast to it Open ¤ If processes outside the group may send messages to it ¨ Overlapping ¨ Non-overlapping groups: memberships do not overlap ¤ Entities CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University October 6, 2015 Instructor: SHRIDEEP PALLICKARA L13.43 within a group may be members of multiple groups CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University October 6, 2015 Instructor: SHRIDEEP PALLICKARA L13.44 Reliability in one-to-one communications? ¨ Integrity ¨ Validity ¤ Message ¤ Any ¨ October 6, 2015 CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.45 Extends semantics to cover delivery to multiple receivers ¨ Adds an an additional property to the reliable delivery requirements ¨ ¨ L13.46 order from the perspective of a sending process ¨ Causal ordering ¤ Based message is delivered to one process, it is delivered to all processes in the group SLIDES CREATED BY: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University FIFO ordering ¤ Preserve ¤ If CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University one copy of the message is delivered October 6, 2015 Instructor: SHRIDEEP PALLICKARA Agreement October 6, 2015 Instructor: SHRIDEEP PALLICKARA Duplicate elimination Relative ordering of messages delivered to multiple destinations Group communications ¨ outgoing message is eventually delivered ¤ Exactly PROCESS GROUPS IMPLEMENTATION ISSUES received is the same as the one that was sent L13.47 ¨ on the happens before relationship Total ordering ¤ Same order will be preserved at all processes October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.48 L13.8 CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University Group membership management [1/4] Group membership management [2/4] ¨ Provide interface for group membership changes ¨ Failure detection ¨ Interfaces to: ¨ Monitor group members ¤ Create and destroy process groups ¤ Add or withdraw process from a group ¤ Not ¤ But ¨ just for crash failures at the nodes also for unreachability due to communication failures Relies on failures detectors to reach decisions about group membership n Detector CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University October 6, 2015 Instructor: SHRIDEEP PALLICKARA L13.49 Group membership management [3/4] October 6, 2015 Instructor: SHRIDEEP PALLICKARA Notify members about membership changes ¨ ¨ When a process ¨ ¤ Is n Due ¨ to failure detection withdrawal n Deliberate October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.51 CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.50 Group membership management [4/4] ¨ added ¤ Excluded marks processes as Suspected or Unsuspected Perform group address expansion When process multicasts, it supplies group identifier rather than a list of processes Membership service expands identifier into the current membership for delivery October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.52 IP Multicast ¨ Allows processes to join or leave groups dynamically ¨ Performs address expansion ¤ Senders only specify the IP multicast address as the destination of the message ¨ WHY IP MULTICAST IS A WEAK CASE OF GROUP MEMBERSHIP ¤ Provide members with information abut membership is not coordinated with membership changes ¤ Achieving these properties is referred to as viewsynchronous group communication ¤ Delivery SERVICE October 6, 2015 CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University SLIDES CREATED BY: SHRIDEEP PALLICKARA But it does not: L13.53 October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University L13.54 L13.9 CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University The contents of this slide set are based on the following references ¨ ¨ Distributed Systems: Concepts and Design. George Coulouris, Jean Dollimore, Tim Kindberg, Gordon Blair. 5th Edition. Addison Wesley. ISBN: 978-0132143011. [Chapter 6] Distributed Systems: Principles and Paradigms. Andrew S. Tanenbaum and Maarten Van Steen. 2nd Edition. Prentice Hall. ISBN: 0132392275/978-0132392273. [Chapter 4] October 6, 2015 Instructor: SHRIDEEP PALLICKARA CS555: Distributed Systems [Fall 2015] Dept. Of Computer Science, Colorado State University SLIDES CREATED BY: SHRIDEEP PALLICKARA L13.55 L13.10