IP Multicast Overview

advertisement
IP Multicast
Tarik Čičić
University of Oslo
December 2001
Overview
• One-to-many communication, why and how
• Algorithmic approach
• (IP) multicast protocols:
– host-router
– intra-domain (router-router)
– inter-domain
• Scalability and manageability issues
2
“Casts”
• Unicast: one-to-one
• Broadcast: one-to-all
• Multicast: one-to-group
– the group is a subset of “all”
– most general notion ☺
• (Anycast is a “special” unicast)
3
1
One-to-many communication
Sender needs to send the
same information to all
three receivers
Receiver 1
Sender
Router
3 transmissions
over the same link!
Receiver 2
Receiver 3
4
One-to-many communication
Sender needs to send the
same information to all
three receivers
Receiver 1
Sender
Router
Only one
transmission
Receiver 2
Receiver 3
5
Link utilization
• Three receivers:
– unicast case: (3+3)/4 = 1.5 msg./link
– multicast case (1+3)/4 = 1 msg./link
• Thousand receivers:
– unicast case: (1000+1000)/1001 = 2 msg./link
– multicast case: (1+1000)/1001 = 1 msg./link
6
2
Graph-algorithmic background
• Network routers and
switches map to a set of
nodes (V)
• Links map to edges E
• Problem:
– given graph (V, E), node-set
G, where G subset V,
– create a spanning tree that
interconnects G
7
Steiner trees
2
1
2
1
3
2
1
2
3
4
1
3
3
1
1
1
2
4
2
1
• All links in the tree have an
associated cost
• The minimal spanning tree
we call Steiner tree
• Constructing this tree is a
well-known NP-complete
problem
3
8
Approximate algorithms
2
1
2
1
6
6
3
2
1
2
3
4
7
7
3
1
1
2
5
5
4
2
1
7
1
3
1
10
7
3
9
3
Approximate algorithms (2)
6
2
3
1
2
1
2
1
2
3
7
4
Cost:
1
17
Optimal cost:
14
3
3
1
1
1
2
5
4
2
1
3
10
Approximate algorithms (3)
2
1
2
1
2
3
2
1
4
2
4
1
1
3
3
3
1
1
2
1
4
2
1
3
1
1
2
4
2
1
2
3
3
1
3
1
2
3
1
2
1
3
11
Spanning tree in practice
• Creating a minimal spanning tree is an NPcomplete problem
• More than one metric adds to the
complexity (cost/delay optimization)
• Research shows that “best effort trees” are
often quite good (20% costlier on average)
12
4
Shortest Path Trees (SPT)
• Use communication delay as the metric
• Find the (unicast) shortest path between all
node pairs in the group G
• Remove possible cycles
• Result: delay-optimized spanning tree!
13
Core Based Trees (CBT)
• Removing loops in the SPT algorithm leads
to sub-optimal tree for some nodes
• Alternative 1: keep all n trees (n=|G|)
• Alternative 2:
– select a “central” node in the graph (tree core)
– find the shortest paths from this node to all
other nodes in G
– merge the paths, creating a CBT
14
More problems
• In the Internet, nodes do not know the
network state " we need a distributed
calculation
• network dynamics
• group dynamics (members joining and
leaving)
15
5
IP multicast concepts
• In the Internet, one-to-many communication
is called IP multicast
• multicast group is a set of receivers
interested in the same data
• group membership is dynamic, everybody
can join and leave as desired
• senders can but need not be members of the
group
16
IP multicast history
•
•
•
•
•
Introduced in 1988 (!)
Flood-and-Prune mechanisms (DVMRP)
Mbone established in 1991
Audio broadcast from IETF meeting in 1992
Exciting but limited
– manageability (flat structure, bandwidth consumption)
– scalability (many routes in the routing tables, BW)
• Mbone collapse in ~1997
17
Modern IP multicast
• New (PIM) protocols being introduced from
~1996
• Due to its explicit control messages, PIMSM is the most popular intra-domain
multicast routing protocol today
• Inter-domain multicast:
– Multicast Source Discovery Protocol (MSDP)
– Border Gateway Multicast Protocol (BGMP)
18
6
Host/network and network protocols
• How does a sender learn where the receivers are?
– all receivers make up a multicast group
– receivers use IGMP (Group Membership Protocol) to
mark their interest to the closest router
– routers construct a multicast distribution tree rooted in
the source, spanning all receivers
– the tree is constructed using a multicast routing
protocol
– (+ inter-domain issues)
19
OK, I do not
IGMP
need to send
G5
G7
Have G5
G7
G4
Have G7
Have G4
Group report?
On a local, well-known
multicast address:
• Query interval: 125s
• Response: (0, 10s)
CISCO xxxx
20
IGMP in switched LANs
• IGMP was designed for multiple access LANs
• There are three approaches for switched LAN
optimization:
– proprietary (ICMP, Cisco)
– IGMP snooping (the switch reads IP headers searching
for IGMP control messages)
– “Generic Attribute Registration Protocol” (GARP), and
GMRP, having drawback that all hosts need to have
modified kernels, but otherwise able to handle complex
switched LANs
21
7
Multicast addresses
•
•
•
•
•
In IPv4: “Class D” addresses (prefix 1110)
224.0.0.0 – 239.255.255.255
per today, random address assignment
scope through TTL field in the IP header
IPv6: prefix 1111 1111
– (+ 4-bit flags + 4-bit scope) # 112-group address
• Scope: how far to transmit the packet (e.g. link
local, organization local scope)
22
Flood-and-Prune
Sender(s) start transmitting packets on all adjacent links
Receivers
Sender
23
Flood-and-Prune
Sender(s) start transmitting packets on all adjacent links
Receivers
Sender
Candidate
branch
24
8
Flood-and-Prune
Routers forward data on all links except where the data arrived
Receivers
Sender
Candidate
branch
25
Flood-and-Prune
Routing loops must be avoided!
Receivers
Sender
Candidate
branch
26
Flood-and-Prune
Prune control messages are sent downstream on inactive links
Receivers
Sender
Candidate
branch
27
9
Flood-and-Prune
Prunes are also sent “downstream” if prunes are
received on all “upstream” interfaces
Receivers
Sender
Candidate
branch
28
Flood-and-Prune
Receivers
Sender
Candidate
branch
Tree branch
29
Flood-and-Prune
Receivers
Sender
Candidate
branch
Tree branch
30
10
Flood-and-Prune
What if a new receiver wants to join?
- flooding is periodically repeated
Receivers
Sender
Candidate
branch
Tree branch
31
Flood-and-Prune problem
• Can we flood the Internet in order to reach 5
receivers?
– NO!
• this is one of the reasons why the original
Mbone collapsed
32
PIM-SM
• “Protocol Independent Multicast – Sparse
Mode” does not flood the network
• it is based on explicit Join/Prune messages
• receivers wanting to join the group send a
Join message … where?
• to the well-known core router (“Rendezvous
Point”)
33
11
PIM-SM operation
Receivers send a Join message to the Rendezvous Point
Receivers
Sender
Core (RP)
34
PIM-SM operation
The sender uses unicast to reach the core
Receivers
Sender
Core (RP)
Tree branch
p2p stream
35
PIM-SM operation
If the traffic generated by the source is large enough, the
group switches to Shortest Path Tree (instead of CBT)
Receivers
Sender
Core (RP)
Tree branch
36
12
Comparison
• Control-driven tree construction (PIM-SM)
saves a significant amount of resources
compared to data-driven (DVMRP)
• this is particularly true in large networks in
the case of small multicast groups
37
Sample multicast group
38
39
13
Hierarchical multicast
• We still cannot cover the Internet by PIMSM:
– state and control traffic (periodic state
refreshing) overhead in transient routers
– where to place the RPs?
– policing
• solution: divide the Internet in domains,
quite as in the unicast case
40
MSDP
• PIM-SM RPs act as the MSDP speakers
(note: simplified view in this presentation)
• The speakers announce local active sources
to the other domains, these forward the SAs
• If an announcement is received, and if local
receivers exist, send a PIM-Join towards the
announced source (in the other domain)
41
MSDP operation
D1
D3
R1
RP
RP
S1
MSDP
/TCP
D2
D4
RP
RP
R2
42
14
MSDP operation
D1
D3
R1
RP
RP
S1
MSDP_SA
(RP1, S1, G)
D2
D4
RP
RP
R2
43
MSDP operation
D1
D3
R1
RP
RP
S1
D2
D4
RP
RP
R2
PIM_join
(S1, G)
44
MSDP operation
D1
D3
R1
RP
RP
S1
D2
D4
RP
RP
R2
45
15
BGMP
• Main idea: build a global shared tree of
domains
• The root domain can be determined from
the group address
• Pure inter-domain protocol
• Needs support of an address allocation
scheme/system/protocol (MASC, ++)
46
BGMP operation
D1 (root)
D3
R1
RP
RP
S1
D2
D4
RP
RP
R2
47
BGMP operation
D1 (root)
D3
R1
RP
S1
RP
BGMP_join
RP
D2
D4
PIM_join
RP
BGMP_join
R2
48
16
BGMP operation
D1 (root)
D3
R1
RP
S1
RP
RP
D2
D4
RP
R2
49
Comparison
Property
MSDP
BGMP
Intra-domain
Generality
PIM-SM Only
Full
Control
Information
Flooding of SA over the
whole MSDP tree
Joins/Prunes where needed
Forwarding the
control
information
Bidirectional, between
Bidirectional, between the
the MSDP peers, periodic border gateways, triggered
Data Forwarding
Unidirectional, PIM-SM
Bidirectional between the
domains, any within
Join latency
Low when caching, high
when not
Low
State
Medium to high
50
Low to medium
Reliable multicast
• Needed for many new applications, but
difficult:
– ACK implosion
– End-to-end argument seems to does not work
with multicast
• more intelligence in the network?
51
17
Future of multicast
• Multicast will be an integral part of the Internet –
a ubiquitous service – if there is use for it
• multicast will become widespread if a functional,
scalable reliable multicast transport protocol is
ever to be constructed
• use today:
– Real-time media streaming (unreliable streams)
• use tomorrow:
– reliable services (e.g. auctions, distribution lists)
52
Summary
• Algorithms:
– optimal multicast NP complete problem
– good behavior in practice using simple means
– SPT vs. CBT
• Protocols:
– host/network, network and inter-domain protocols
– flooding vs. explicit messaging
– IP multicast addressing issues
• Scalability, manageability and hierarchical solutions
• References: IEEE Network, Jan/Feb 2000, thematic issue
on multicasting!
53
18
Download