Reliable IP Multicast: status and selected topics A CSE620 Presentation YE PU & YAN SUN Overview Introduction Reliable Multicast Protocols Case Studies Multicast congestion control Routing for Multicast The MBone and the Internet2 Summary 5/28/2016 5:01:12 PM Reliable Multicast Introduction Why Multicast? – In many emerging applications, one sender will transmit to a group of receivers simultaneously Unicasting Why Reliable? – – Audio/Video applications do not require reliability Many other exciting applications do, e.g. remote WB, collaborative VR, data dissemination 5/28/2016 5:01:12 PM Reliable Multicast Multicasting Reliable Multicast: Basic Questions • What is the "right" definition of reliable multicast? • Is there a baseline(e.g., reliable delivery of all data)? • should ordering/causality be part of the networking semantics of reliable multicast? • where to draw the line between network- and application-level functionality? • Design Approaches • How important is scalability (large number of participants)? • Are there fundamental differences from one setting to another (1-many vs many-many) that require different approaches? • Are separate designs, each optimized for a different scenario, the way to go? • Can one protocol (or protocol framework) fit all requirements? Will n protocols (or framworks) fit k (n<k) scenarios? • Framework • Is there a value (and if so, what is it) of developing a common framework (a la RTP) in which various reliable multicast protocols can be built • What should that framework look like? -- ACM SIGCOMM Workshop, August 27, 1996 • In terms of IETF, is there any part (which one)Multicast that should be Stanford, standardized? 5/28/2016 5:01:12 PM Reliable Multicast Reliability Mechanism: Who’s Responsible? Sender Initiated ACK Implosion NACK Trigger NACK Implosion 5/28/2016 5:01:12 PM • Sender is responsible for packet loss detection • Based on positive acknowledgements (ACKs) • ACK implosion at large scale multicast, poor scalability Receiver Initiated • Receiver is responsible for packet loss detection • Based on negative acknowledgements (NACKs) • Alleviates ACK implosion, better performance • Potentially NACK implosion Reliable Multicast Loss Recovery: What did ja say? Loss Recovery: Detection and Retransmission of lost packets Global Recovery • Repair are multicasted to the entire group • Efficient where loss is often concentrated at the backbone gateway Local Recovery • Try to recover from packet loss without going all the way to the source • Response multicast within a scope just large enough to reach each affected receiver Forward Error Control (FEC) • Retransmit error-correcting codes instead of original packet data • Simultaneously repair packet losses with a single packet 5/28/2016 5:01:12 PM Reliable Multicast Feedback Control Feedback Control: Mechanism that restricts the amount of feedback generated by multicast group Structure Based Rely on a designated receiver (DR) to process and filter feedback traffic Timer Based Delay retransmission request for a random time interval, uniformly distributed between the current time and one-way trip time to the source 5/28/2016 5:01:12 PM Reliable Multicast Multicast Protocols by 1997 Protocol Data Propagation B-cast M-cast Reliability Mechanism ACK/NACK NACK Repair Request U-cast U-cast Retransmission U-cast M-cast Flow Control Rate Locus of Control Central Central Ordering M-cast U-cast/M-cast ACK/NACK ACK/NACK M-cast U-cast U-cast U-cast/M-cast Window Rate Central Central URGC RTP B-cast M-cast U-cast SRM LBRM RAMP M-cast M-cast M-cast NACK Rate-adjust by Feedback NACK ACK/NACK NACK U-cast - Rate Central Distributed M-cast U-cast U-cast Probabilistic Polling Timer-based Structure-based - M-cast U-cast/M-cast U-cast/M-cast TRM M-cast NACK M-cast Timer-based M-cast Muse M-cast NACK U-cast MDP M-cast NACK M-cast AFDP M-cast NACK U-cast TMTP RMTP MFTP M-cast M-cast U/M/B-cast ACK/NACK ACK ACK/NACK STORM M-cast NACK U-cast U-cast U-cast (ACK) M-cast (NACK) U-cast RBP MTP MTP-2 RMP XTP 5/28/2016 5:01:12 PM Feedback Control - - Timer-based Group Management Explicit Explicit Target Application Total Sequence Number Total - Explicit Explicit General Solution General Solution Explicit - General Solution Interactive Multimedia Rate Distributed Central Distributed Sequence Number Explicit Explicit Explicit Window Distributed Sequence Number - Implicit Interactive Multimedia Interactive Multimedia Interactive Multimedia (Optical Gigabit Network) Interactive Multimedia Sequence Number Sequence Number - Implicit News Article Propagation Data (File) Distribution Explicit Data (File) Distribution Implicit Implicit Explicit Data Dissemination Data Dissemination Data Dissemination Implicit Data Dissemination - U-cast - Distributed M-cast Rate Distributed M-cast Rate Central Structure-based Structure-based Timer-based M-cast U-cast/M-cast M-cast Window Window Window Distributed Distributed Distributed Structure-based U-cast Window Distributed - Reliable Multicast Total Total - Implicit General Solution General Solution Case Study: SRM SRM: Scalable Reliable Multicast • • • • Originally designed for wb Currently operational over the MBone Receiver-reliable, NACK-based Any receiver can multicast NACK or repair packet 5/28/2016 5:01:12 PM Reliable Multicast SRM Loss Recovery Principle Data-driven Recovery (sequence gap detection) Control-driven Recovery (session message sequence) • Source assign unique sequence number • NACK generated when missing data detected 5/28/2016 5:01:12 PM Reliable Multicast SRM: Source Path Message • Each member multicasts periodic session messages that report the sequence number state for active sources • Receivers detect the loss of the last packet in a burst • Members also use session messages to determine the current participants of session • Average session message bandwidth: 5% of data bandwidth 5/28/2016 5:01:12 PM Reliable Multicast SRM NACK Suppression • NACK is multicast to the entire group • Receiver in need of that data can suppress its own NACK • Simultaneous detection of packet loss: random delay and receiver with smallest delay wins 5/28/2016 5:01:12 PM Reliable Multicast SRM: Loss Recovery Algorithm Loss Detection set the backoff parameter b =1 upon miss data D from host S, choose a random delay t on 2b [C1d(S), (C1+C2)d(S)] schedule a request packet, REQD, for transmission in t seconds if we receive REQD from some other host before t seconds, then set b = b +1 and restart the request timer 5. otherwise, if data, D, or the repair reply, REPD, is received before t seconds, cancel REQD 6. otherwise, send REQD after t seconds 1. 2. 3. 4. Retransmission 1. upon receipt of REQD from host A, if D is locally available, choose a random delay t on [C1d(A),(C1+C2)d(A)] 2. schedule the repair packet REPD for transmission in t seconds 3. if REPD is received before t seconds, then cancel the repair timer 4. otherwise, send REPD after t seconds 5/28/2016 5:01:12 PM Reliable Multicast Case Study: RMTP RMTP: Reliable Multicast Transport Protocol • • • • Designed for file dissemination(single-sender) Deployed in AT&T’s billing network Based on a hierarchical structure A special Designated Receiver (DR) is responsible for sending ACKs to sender 5/28/2016 5:01:12 PM Reliable Multicast RMTP: Network Topology • Receivers grouped into local region • Source multicasts packets to receivers • Receivers unicast periodical ACK to its AP/DR • DR provides local repair if data is • available • DR unicasts its own ACK to parent to consolidation of traffic to the next DR in hierarchy • Source determines retransmission based on status send by DR 5/28/2016 5:01:12 PM Reliable Multicast RMTP: ACK Processing & Retransmission A sender’s send window Send Sequence Space send window A receiver’s receive window avail_win swin_lb send_next packet sent but not yet acknowledged 5/28/2016 5:01:12 PM Reliable Multicast RMTP: Formation of Local Region • RMTP assumes there is some information about the approximate location of receivers • Some receivers and servers are chosen as DR • Each DR periodically sends a special packet SEND_ACK_TOME in which TTL field is set to a pre-determined value(say 64) • Each receiver chooses the DR whose SEND_ACK_TOME has the largest TTL value 5/28/2016 5:01:12 PM Reliable Multicast Case Study: PGM PGM: Pragmatic General Multicast • Router supported to provide scaling • Provide no notion of membership • NACK based, with suppression 5/28/2016 5:01:12 PM Reliable Multicast PGM: Data Packet Types ODATA: original content data NACK: selective negative acknowledgement NCF: NACK confirmation RDATA: retransmission data(repair) SPM: source path message TSI: Each PGM packet contains a Transport Session Identifier (TSI) to identify the session and source of data 5/28/2016 5:01:12 PM Reliable Multicast PGM: NACK/NCF Dialogue • NACK + random delay is unicast from router upstream towards source • PGM-aware router keeps forwarding NACKs until it sees a NCF or RDATA • Only one NACK is forwarded for every packet loss • Source multicast NCFs to the whole group to provide NACK reliability 5/28/2016 5:01:12 PM Reliable Multicast PGM: Source Path Message • SPMs are multicast downstream interleaved with ODATA • PGM-aware routers use SPM to determine unicast path forwarding NACKs • Receivers use SPM to determine the last PGM aware router to forward NACK 5/28/2016 5:01:12 PM Reliable Multicast PGM: Retransmission Sender • Retransmit immediately after getting a NACK Router • Maintain retransmission states for every interface that received NACK • Only forward retransmission on one interface per NACK 5/28/2016 5:01:12 PM Reliable Multicast PGM-aware Router Features • Routers intercept SPMs and use them to establish source path state for the corresponding source and group • Routers forward only the first copy of any NACK they receive to the upstream PGM-aware router to constrain NACK forwarding • Routers discard exact duplicates of any NACK for which they already have repair state • Routers use NACKs to maintain repair state consisting a list of interfaces upon which a given NACK was received, and return the RDATA only on these interface • Routers can also optionally redirect NACKs to a designated local retransmitter (DLR) rather than the source 5/28/2016 5:01:12 PM Reliable Multicast Congestion Control Why Congestion Control? • Needs to use available bandwidth fairly among multiple best-effort flows over a shared link TCP Congestion Control • • • • Multiplicative decrease at the indication of congestion Linear increase when there is no congestion Encourage fair sharing of bandwidth No safeguard against aggressive flows (endtoend feedback controlled) Multicast without CC • NonTCPcompatible flows can lock out competing TCP flows • Simultaneous congestion collapses • Need endtoend feedbackbased TCPcompatible congestion control mechanism 5/28/2016 5:01:12 PM Reliable Multicast Control Metrics Fairness - How it shares bandwidth with other connections, and how it discriminates against connections of different lengths. This is the closest thing to the "performance" of a connection Safety - How wide of a range of operating conditions can the algorithm support without causing the network to go in to an unstable operating range Responsiveness - How fast an algorithm adapts to changes in the network load Variability (or accuracy) - How consistent is the performance of the algorithm in the face of a given environment? i.e. what is the variance in throughputs? Scalability - How do these metrics scale in the face of large scale groups? 5/28/2016 5:01:12 PM Reliable Multicast Control Approaches Window-based: “Slow start” TCP-style sliding window algorithm Rate-adaptive: Adjust transmission rate upon receipt of NACKs Forward Error Correction (FEC): Rarely used due to encoding/decoding overhead 5/28/2016 5:01:12 PM Reliable Multicast MTCP: Hierarchical Congestion Control Source Hierarchical Congestion Reports ACKs and summary • Internal tree nodes sender's agent (SA) • receivers send feedback to their SAs • SAs send a summary of the congestion level of their children to their parents MR 1 SA MR 2 MR 4 MR 3 ACKs and summary MR 5 SA MR 6 MR 7 MR 8 ACKs Group Member Group Member MR 9 MR 10 Group Member 5/28/2016 5:01:12 PM Group Member Group Member Reliable Multicast MTCP: Hierarchical CC (cnt’d) Window Based Control • Send controls its rate based on its summary Congestion Window Adjustment (when CWND goes down) • RTD timeout • Fast retransmission (in conjunction with selective acknowledgment) • Three NACKs for the same packet reduces the window (note that not every loss causes CWND to go down by half) • Based on TCPVegas scheme (I.e., long RTT causes it to go down) 5/28/2016 5:01:12 PM Reliable Multicast Forward-error Correction Coding (FEC) 5/28/2016 5:01:12 PM • "Simultaneous repair" utilize(n,k) block codes • Packet stream is grouped into platoons of n packets each Reliable Multicast FEC/ARQ Receiver Sender Receiver • On detected loss the receiver NACKs the platoon rather than the packet • If each receiver indicates the number m of packets loss from that platoon, then the responder can merely send m of k parity packets. 5/28/2016 5:01:12 PM Reliable Multicast Proactive FEC/ARQ Receiver Sender Receiver Proactive: Send some repairs before loss Proactive factor: r • Sender sends round(rk) packets • Recevers NACKs to get add’l repairs 5/28/2016 5:01:12 PM Reliable Multicast Multicast Routing • Requires a significant amount of state and complexity in routers (requires at least per-group state information and often even per-source information) => Very slow deployment and use by Internet standards • Dense Mode: Sender broadcasts traffic and triggers prune messages (DVMRP, PIM-DM) Sparse Mode: Group members explicitly sends join messages (MOSPF, CBT, PIM-SM) • Advantage Less routing state to keep (only routers on the multicast path keep) Explicit join: multicast traffic only flows across links leading to identified receivers Disadvantage Single-point-of-failure at RP Hot spot of multicast traffic at RP and non-optimal path on multicast tree 5/28/2016 5:01:12 PM Reliable Multicast Multicast Routing in Early MBone Multicast Application (sender or receiver) Multicast Application (sender or receiver) R1 R2 MR4 MR3 1. MR3 and MR4, running the Multicast Router Daemon (mrouted), support IGMP. Mrouted encapsulates multicast datagrams in unicast datagrams to send, and decapsulates multicast datagrams from unicast datagrams it receives 2. R1 and R2 are non-multicast enabled routers. They forward unicast encapsulated multicast packets just like any other unicast datagram MBone on non-multicast capable Internet 5/28/2016 5:01:12 PM Reliable Multicast DVMRP Source • local subnet MR 1 • MR 3 • MR 4 MR 5 MR 2 Group Member Group Member MR 8 MR 6 Hops: MR 7 1 2 3 4 • • Group Member Group Member First protocol developed to support multicast routing Tree is constructed on demand using a “broadcast and prune” Reverse Path Forwarding (RPF) ensures no loops in the tree and only shortest paths included RPF uses unicast routing table Does not scale to support multicast groups that are sparsely distributed over a large network 1. the message reaches router 1 2. the message reaches routers 2,3, and 4 3. routers 3 and 4 exchange messages. Each one just drops the message, because it didn’t arrive over the interface that gives the shortest path back to the source 4. the message reaches router 7. Router 7 realizes it is a leaf router and there are no group members on its subnet, so it sends a prune message back to router 6, the upstream router. Router 6, in turn, sends a prune message to router 4. Router 3 also sends a prune message to router 1 5/28/2016 5:01:12 PM Reliable Multicast MOSPF • MR 1 • MR 2 Group Member Group Member MR 5 MR 4 • MR9 MR 3 MR 6 • Group Member MR 7 MR 8 Group Member • Intended for use within a single routing domain Dependent on the use of OSPF Tree is only calculated when a router receives the first datagram in a stream All routers calculate exactly the same tree Does not scale well due to periodic flooding of group membership reports 1. MR 1 computes tree - knows members of group via IGMP and hence knows path to MR 4 is via MR 2, path to MR 8 is via MR 5, etc. 2. MR 2 computes tree - determines path to MR 4 is direct, path to MR 8 is via MR 5 and MR 3 computes tree - determines path to MR 9 is direct 3. MR 5 computes tree - determines path to MR 8 is direct Note that the multicast transmission triggers this process (i.e. data driven process) and each router, when it receives a message, calculates exactly the same distribution tree as its predecessors and uses it to forward the message. 5/28/2016 5:01:12 PM Reliable Multicast Core Based Tree (CBT) • • • • 5/28/2016 5:01:12 PM a single tree that is shared by all members of the group, Multicast traffic for the entire group is sent and received over the same tree, regardless of the source significant savings in terms of the amount of multicast state information that is stored in individual routers concentration of traffic around the core load balancing might be achieved by using more than one core Reliable Multicast PIM-SM MR MR • MR • MR MR • MR MR MR • Initial group-shared tree construction similar to CBT Supports both group-shared tree and shortest-path tree Relies on unicast routing tables to adapt to network topology changes Independent of the particular unicast routing protocol 1. The sender at Source 2 registers at the Rendezvous Point Multicast Router RPt 2. A receiver joins at Rpt; there is now a bigger shared tree 3. The receiver is receiving lots of data from Source 2. The receiver sends an explicit join to Source 2 to construct a shortest path route 5/28/2016 5:01:12 PM Reliable Multicast Interdomain Multicast Routing Near-term Solution - PIM-SM/MBGP/MSDP: • • Multicast Border Gateway Protocol (MBGP): multicast route aggregation and abstraction as well as hop-by-hop policy routing is provided in unicast using the Border Gateway Protocol (BGP) Multicast Source Discovery Protocol (MSDP): works by having representatives in each domain announce to other domains the existence of active sources. MSDP is run in the same router as a domain's RP (or one of the RPs) Long-term Solution - BGMP/MAAA: • • Border Gateway Multicast Protocol (BGMP): first proposed as a long-term solution to Internet-wide, inter-domain multicast. Multicast Address Allocation Architecture (MAAA): consists of Multicast Address-Set Claim (MASC) protocol (domain level), Address Allocation Protocol (AAP) (within a domain), and Multicast Address Dynamic Client Allocation Protocol (MADCAP) (for requesting addresses from a multicast Address Allocation Server (MAAS)) Alternative Solution - Root Addressed Multicast Architecture (RAMA) 5/28/2016 5:01:12 PM Reliable Multicast The MBone • • • • A virtual network layered on top of the physical Internet to support routing of IP multicast packets Initially a test bed for multicast Extensively exploits tunnels Routing mainly with DVMRP “MBONE is truly the start of mass-communication that may supplant television. Used well, it could become an important component of mass communication.” -- John December 5/28/2016 5:01:12 PM Reliable Multicast The MBone 5/28/2016 5:01:12 PM Reliable Multicast The Internet2 Internet2 is a collaboration among more than 100 U.S. universities to develop networking and advanced applications for learning and research. The design and implementation of a deployment strategy to provide a consistent and ubiquitous multicast service within the Internet2 community. Internet2 Multicast-Peering Sites: Abilene, vBNS, NREN, DREN, Esnet, CANARIE, TEN-155/34 (DANTE), NORDUnet, SurfNet, APAN Abilene is an advanced backbone network that connects regional network aggregation points, called gigaPoPs, to support the work of Internet2 universities as they develop advanced Internet applications. vBNS maintains a native IP multicast service via a PIM sparse-densemode configuration among all vBNS Cisco routers. MBGP routing is used internally in combination with an MBGP default route representing MBone sources. vBNS belongs to MCI Worldcom 5/28/2016 5:01:12 PM Reliable Multicast The Internet2 (cnt’d) For Internet2, the plan has always been to try and do multicast “the right way” in so much as is possible given the currently available set of protocols. As a result, the multicast deployment plan is following guidelines set forth by the Internet2 Multicast Working Group. Guidelines • • • 5/28/2016 5:01:12 PM all multicast deployed in Internet2 to be native and sparse mode No tunnels are allowed All routers must support inter-domain multicast routing using MBGP/MSDP. Reliable Multicast Multicast on Abilene Network 5/28/2016 5:01:12 PM Reliable Multicast Multicast on vBNS 5/28/2016 5:01:12 PM Reliable Multicast Summary • IP Multicast is emerging as an utterly important topic in the future Internet • Achieving reliability: ACKs vs NACKs, Local Recovery, FEC, … • Reliable multicast protocols: SRM, RMPT, and PGM • Multicast congestion control • Routing in multicast: DVMRP, MOSPF, CBT, PIM-SM • Interdomain Multicast and multicast deployment on the MBone and the Internet2 5/28/2016 5:01:12 PM Reliable Multicast References Almeroth, K. C., The Evolution of Multicast: From the MBone to Inter-Domain Multicast to Internet2, Deployment, IPMI White Paper (www.stardust.com), 1999 Ballardie, A., RFC-2201: Core Based Trees (CBT) Multicast Routing Architecture, September 1997 Costello, A. M. and McCanne S., Search party: using randomcast for reliable multicast with local recovery, University of California at Berkeley Techanical Report UCB//CSD-98-1011, 1998 Estrin, D., Farinacci D., A. Helmy, Thaler D., Deering S., Handley M., Jacobson V., Liu C., Sharma P., Wei L., RFC-2362: protocol independent multicast-sparse mode (PIM-SM): protocol specification Floyd, S., Jacobson V., Liu C., McCanne S., and Zhang L., A reliable multicast framework for light-weight sessions and application level framing, IEEE/ACM Transactions on Networking, Vol. 5, No. 6, 1997 IPMI, Reliable IP multicast - PGM overview, IPMI White Paper (www.stardust.com), 1998 http://www.tascnets.com/mist/doc/mcpCompare.html http://netweb.usc.edu/multicast/ http://www.stardust.com/ http://www.starburstcom.com/ Katia Obraczka, Multicast transport mechanisms: a survey and taxonomy, IEEE Communications Magazine, January 1998 5/28/2016 5:01:12 PM Reliable Multicast References Mankin, A., Romanow A., Bradner S., and Paxson V., RFC-2357: IETF criteria for evaluating reliable multicast transport and application protocols, June 1998 McCanne, S., Scalable Multimedia Communication Using IP Multicast and Lightweight Sessions, IEEE Internet Computing, Vol. 3, No. 2, 1999 Moy, J., RFC-1584: multicast extensions to OSPF, March 1994 Paul, S., Sabnani K. K., Lin J. C., and Bhattacharyya S., Reliable Multicast Transport Protocol (RMTP), IEEE Journal on Selected Areas in Communications, Vol. 15 No. 3, 1997 Rekhter, Y., Li T., RFC-1771: a border gateway protocol 4 (BGP-4), March 1995 Waitzman, D. and Deering S., RFC1075: distance vector multicast routing protocol, November 1988 5/28/2016 5:01:12 PM Reliable Multicast