IP Multicast Tarik Čičić University of Oslo December 2001 Overview • One-to-many communication, why and how • Algorithmic approach • (IP) multicast protocols: – host-router – intra-domain (router-router) – inter-domain • Scalability and manageability issues 2 “Casts” • Unicast: one-to-one • Broadcast: one-to-all • Multicast: one-to-group – the group is a subset of “all” – most general notion ☺ • (Anycast is a “special” unicast) 3 1 One-to-many communication Sender needs to send the same information to all three receivers Receiver 1 Sender Router 3 transmissions over the same link! Receiver 2 Receiver 3 4 One-to-many communication Sender needs to send the same information to all three receivers Receiver 1 Sender Router Only one transmission Receiver 2 Receiver 3 5 Link utilization • Three receivers: – unicast case: (3+3)/4 = 1.5 msg./link – multicast case (1+3)/4 = 1 msg./link • Thousand receivers: – unicast case: (1000+1000)/1001 = 2 msg./link – multicast case: (1+1000)/1001 = 1 msg./link 6 2 Graph-algorithmic background • Network routers and switches map to a set of nodes (V) • Links map to edges E • Problem: – given graph (V, E), node-set G, where G subset V, – create a spanning tree that interconnects G 7 Steiner trees 2 1 2 1 3 2 1 2 3 4 1 3 3 1 1 1 2 4 2 1 • All links in the tree have an associated cost • The minimal spanning tree we call Steiner tree • Constructing this tree is a well-known NP-complete problem 3 8 Approximate algorithms 2 1 2 1 6 6 3 2 1 2 3 4 7 7 3 1 1 2 5 5 4 2 1 7 1 3 1 10 7 3 9 3 Approximate algorithms (2) 6 2 3 1 2 1 2 1 2 3 7 4 Cost: 1 17 Optimal cost: 14 3 3 1 1 1 2 5 4 2 1 3 10 Approximate algorithms (3) 2 1 2 1 2 3 2 1 4 2 4 1 1 3 3 3 1 1 2 1 4 2 1 3 1 1 2 4 2 1 2 3 3 1 3 1 2 3 1 2 1 3 11 Spanning tree in practice • Creating a minimal spanning tree is an NPcomplete problem • More than one metric adds to the complexity (cost/delay optimization) • Research shows that “best effort trees” are often quite good (20% costlier on average) 12 4 Shortest Path Trees (SPT) • Use communication delay as the metric • Find the (unicast) shortest path between all node pairs in the group G • Remove possible cycles • Result: delay-optimized spanning tree! 13 Core Based Trees (CBT) • Removing loops in the SPT algorithm leads to sub-optimal tree for some nodes • Alternative 1: keep all n trees (n=|G|) • Alternative 2: – select a “central” node in the graph (tree core) – find the shortest paths from this node to all other nodes in G – merge the paths, creating a CBT 14 More problems • In the Internet, nodes do not know the network state " we need a distributed calculation • network dynamics • group dynamics (members joining and leaving) 15 5 IP multicast concepts • In the Internet, one-to-many communication is called IP multicast • multicast group is a set of receivers interested in the same data • group membership is dynamic, everybody can join and leave as desired • senders can but need not be members of the group 16 IP multicast history • • • • • Introduced in 1988 (!) Flood-and-Prune mechanisms (DVMRP) Mbone established in 1991 Audio broadcast from IETF meeting in 1992 Exciting but limited – manageability (flat structure, bandwidth consumption) – scalability (many routes in the routing tables, BW) • Mbone collapse in ~1997 17 Modern IP multicast • New (PIM) protocols being introduced from ~1996 • Due to its explicit control messages, PIMSM is the most popular intra-domain multicast routing protocol today • Inter-domain multicast: – Multicast Source Discovery Protocol (MSDP) – Border Gateway Multicast Protocol (BGMP) 18 6 Host/network and network protocols • How does a sender learn where the receivers are? – all receivers make up a multicast group – receivers use IGMP (Group Membership Protocol) to mark their interest to the closest router – routers construct a multicast distribution tree rooted in the source, spanning all receivers – the tree is constructed using a multicast routing protocol – (+ inter-domain issues) 19 OK, I do not IGMP need to send G5 G7 Have G5 G7 G4 Have G7 Have G4 Group report? On a local, well-known multicast address: • Query interval: 125s • Response: (0, 10s) CISCO xxxx 20 IGMP in switched LANs • IGMP was designed for multiple access LANs • There are three approaches for switched LAN optimization: – proprietary (ICMP, Cisco) – IGMP snooping (the switch reads IP headers searching for IGMP control messages) – “Generic Attribute Registration Protocol” (GARP), and GMRP, having drawback that all hosts need to have modified kernels, but otherwise able to handle complex switched LANs 21 7 Multicast addresses • • • • • In IPv4: “Class D” addresses (prefix 1110) 224.0.0.0 – 239.255.255.255 per today, random address assignment scope through TTL field in the IP header IPv6: prefix 1111 1111 – (+ 4-bit flags + 4-bit scope) # 112-group address • Scope: how far to transmit the packet (e.g. link local, organization local scope) 22 Flood-and-Prune Sender(s) start transmitting packets on all adjacent links Receivers Sender 23 Flood-and-Prune Sender(s) start transmitting packets on all adjacent links Receivers Sender Candidate branch 24 8 Flood-and-Prune Routers forward data on all links except where the data arrived Receivers Sender Candidate branch 25 Flood-and-Prune Routing loops must be avoided! Receivers Sender Candidate branch 26 Flood-and-Prune Prune control messages are sent downstream on inactive links Receivers Sender Candidate branch 27 9 Flood-and-Prune Prunes are also sent “downstream” if prunes are received on all “upstream” interfaces Receivers Sender Candidate branch 28 Flood-and-Prune Receivers Sender Candidate branch Tree branch 29 Flood-and-Prune Receivers Sender Candidate branch Tree branch 30 10 Flood-and-Prune What if a new receiver wants to join? - flooding is periodically repeated Receivers Sender Candidate branch Tree branch 31 Flood-and-Prune problem • Can we flood the Internet in order to reach 5 receivers? – NO! • this is one of the reasons why the original Mbone collapsed 32 PIM-SM • “Protocol Independent Multicast – Sparse Mode” does not flood the network • it is based on explicit Join/Prune messages • receivers wanting to join the group send a Join message … where? • to the well-known core router (“Rendezvous Point”) 33 11 PIM-SM operation Receivers send a Join message to the Rendezvous Point Receivers Sender Core (RP) 34 PIM-SM operation The sender uses unicast to reach the core Receivers Sender Core (RP) Tree branch p2p stream 35 PIM-SM operation If the traffic generated by the source is large enough, the group switches to Shortest Path Tree (instead of CBT) Receivers Sender Core (RP) Tree branch 36 12 Comparison • Control-driven tree construction (PIM-SM) saves a significant amount of resources compared to data-driven (DVMRP) • this is particularly true in large networks in the case of small multicast groups 37 Sample multicast group 38 39 13 Hierarchical multicast • We still cannot cover the Internet by PIMSM: – state and control traffic (periodic state refreshing) overhead in transient routers – where to place the RPs? – policing • solution: divide the Internet in domains, quite as in the unicast case 40 MSDP • PIM-SM RPs act as the MSDP speakers (note: simplified view in this presentation) • The speakers announce local active sources to the other domains, these forward the SAs • If an announcement is received, and if local receivers exist, send a PIM-Join towards the announced source (in the other domain) 41 MSDP operation D1 D3 R1 RP RP S1 MSDP /TCP D2 D4 RP RP R2 42 14 MSDP operation D1 D3 R1 RP RP S1 MSDP_SA (RP1, S1, G) D2 D4 RP RP R2 43 MSDP operation D1 D3 R1 RP RP S1 D2 D4 RP RP R2 PIM_join (S1, G) 44 MSDP operation D1 D3 R1 RP RP S1 D2 D4 RP RP R2 45 15 BGMP • Main idea: build a global shared tree of domains • The root domain can be determined from the group address • Pure inter-domain protocol • Needs support of an address allocation scheme/system/protocol (MASC, ++) 46 BGMP operation D1 (root) D3 R1 RP RP S1 D2 D4 RP RP R2 47 BGMP operation D1 (root) D3 R1 RP S1 RP BGMP_join RP D2 D4 PIM_join RP BGMP_join R2 48 16 BGMP operation D1 (root) D3 R1 RP S1 RP RP D2 D4 RP R2 49 Comparison Property MSDP BGMP Intra-domain Generality PIM-SM Only Full Control Information Flooding of SA over the whole MSDP tree Joins/Prunes where needed Forwarding the control information Bidirectional, between Bidirectional, between the the MSDP peers, periodic border gateways, triggered Data Forwarding Unidirectional, PIM-SM Bidirectional between the domains, any within Join latency Low when caching, high when not Low State Medium to high 50 Low to medium Reliable multicast • Needed for many new applications, but difficult: – ACK implosion – End-to-end argument seems to does not work with multicast • more intelligence in the network? 51 17 Future of multicast • Multicast will be an integral part of the Internet – a ubiquitous service – if there is use for it • multicast will become widespread if a functional, scalable reliable multicast transport protocol is ever to be constructed • use today: – Real-time media streaming (unreliable streams) • use tomorrow: – reliable services (e.g. auctions, distribution lists) 52 Summary • Algorithms: – optimal multicast NP complete problem – good behavior in practice using simple means – SPT vs. CBT • Protocols: – host/network, network and inter-domain protocols – flooding vs. explicit messaging – IP multicast addressing issues • Scalability, manageability and hierarchical solutions • References: IEEE Network, Jan/Feb 2000, thematic issue on multicasting! 53 18