L6 - Switching

advertisement
Today’s Lecture
• Admini-trivia
• Switching
• Examples of Switching Technologies
– Bridging (datagram)
• Spanning-Tree
– ATM (Virtual Circuits)
Assignment 1
Collision Detection
Me
Check to see
If any one is tx
NO
collisions
Other
you
Check to see
If any one is tx
NO
collisions
Collision Detection
Me
Check to see
If any one is tx
NO
collisions
you
Check to see
If any one is tx
NO
collisions
Collision Detection
Me
Check to see
If any one is tx
NO
collisions
you
Check to see
If any one is tx
NO
collisions
Collision Detection
Me
Check to see
If any one is tx
NO
collisions
you
Check to see
If any one is tx
NO
collisions
Collision Detection
Me
Check to see
If any one is tx
NO
collisions
you
Check to see
If any one is tx
NO
collisions
Today’s Lecture
• Admini-trivia
• Switching
• Examples of Switching Technologies
– Bridging (datagram)
• Spanning-Tree
– ATM (Virtual Circuits)
Why Should You Care About
Switching?
• All Networks use switches
– Facebook/Google/Microsoft
• Use Spanning-Tree in data centers
– ATT/Verizon/Sprint/Comcast
• Use MPLS in Wide-Area-Networks
Basic Problem
• Direct-link networks don’t scale
• Solution: use switches to connect network
segments
Switching
• Switches must be able to, given a packet,
determine the outgoing port
• 3 ways to do this:
– Virtual Circuit Switching
– Datagram Switching
– Source Routing
Virtual Circuit Switching
• Explicit set-up and tear down phases
– Establishes Virtual Circuit Identifier on each link
– Each switch stores VC table
• Subsequent packets follow same path
– Switches map [in-port, in-VCI] : [out-port, out-VCI]
• Also called connection-oriented model
Virtual Circuit Model
• Requires one RTT before sending first packet
• Connection request contain full destination
address, subsequent packets only small VCI
• Setup phase allows reservation of resources,
such as bandwidth or buffer-space
– Any problems here?
• If a link or switch fails, must re-establish
whole circuit
• Example: ATM
Datagram Switching
Switch 2
Add
r
Port
A
3
B
0
C
3
D
3
E
2
F
1
G
0
• Each packet carries destination addressH
• Switches maintain address-based tables
– Maps [destination address]:[out-port]
• Also called connectionless model
0
Datagram Switching
• No delay for connection setup
• Source can’t know if network can deliver a
packet
• Possible to route around failures
• Higher overhead per-packet
• Potentially larger tables at switches
Source Routing
• Packets carry entire route: ports
• Switches need no tables!
– But end hosts must obtain the path information
• Variable packet header
Comparisons
Virtual Circuit
Datagram
Switching
Source Routing
Start-Up Latency
Yes
NO
NO
Size of Forwarding
Tables (In switch)
O(# of circuits)
O(# of hosts)
0
Packet Header size
Contains circuit ID
Contains src & dst
addresses
Contains the
WHOLE PATH
Packet Header size
Fixed-size
Fixed-size
Variable-size
Recover from
failure
No
Yes
No
Delivery guarantee
Yes
No
no
Constant Oscillation between
paradigms as New Technologies Arise
Today’s Lecture
• Switching
• Examples of Switching Technologies
– Bridging (datagram)
• Spanning-Tree
– ATM (Virtual Circuits)
Bridges and Extended LANs
• LANs have limitations
– E.g. Ethernet < 1024 hosts, < 2500m
• Connect two or more LANs with a bridge
– Operates on Ethernet addresses
– Forwards packets from one LAN to the
other(s)
• Ethernet switch is just a multi-way bridge
Learning Bridges
• Idea: don’t forward a packet where it isn’t needed
– If you know recipient is not on that port
• Learn hosts’ locations based on source addresses
– Build a table as you receive packets
– Table is a cache: if full, evict old entries. Why is this fine?
• Table says when not to forward a packet
– Doesn’t need to be complete for correctness
Attack on a Learning Switch
• Eve: wants to sniff all packets sent to Bob
• Same segment: easy (shared medium)
• Different segment on a learning bridge: hard
– Once bridge learns Bob’s port, stop broadcasting
• How can Eve force the bridge to keep
broadcasting?
– Flood the network with frames with spoofed src
addr!
Bridges
•
•
•
•
Unicast: forward with filtering
Broadcast: always forward
Multicast: always forward or learn groups
Difference between bridges and repeaters?
– Bridges: same broadcast domain; copy frames
– Repeaters: same broadcast and collision domain;
copy signals
Dealing with Loops
• Problem: people may create loops in LAN!
– Accidentally, or to provide redundancy
– Don’t want to forward packets indefinitely
Dealing with Loops
• Problem: people may create loops in LAN!
– Accidentally, or to provide redundancy
– Don’t want to forward packets indefinitely
LAN 2
F
F
Bridge B
Bridge A
F
F
LAN 1
F
host A
Spanning Tree
• Need to disable ports, so that no loops in network
• Like creating a spanning tree in a graph
– View switches and networks as nodes, ports as edges
Distributed Spanning Tree Algorithm
• Every bridge has a unique ID (Ethernet address)
• Goal:
– Bridge with the smallest ID is the root
– Each segment has one designated bridge, responsible
for forwarding its packets towards the root
• Bridge closest to root is designated bridge
• If there is a tie, bridge with lowest ID wins
Spanning Tree Protocol
• Send message when you think you are the root
• Otherwise, forward messages from best known root
– Add one to distance before forwarding
– Don’t forward over discarding ports (see next slide)
• Spanning Tree messages contain:
– ID of bridge sending the message
– ID sender believes to be the root
– Distance (in hops) from sender to root
• Bridges remember best config msg on each port
• In the end, only root is generating messages
Spanning Tree Protocol (cont.)
• Forwarding and Broadcasting
• Port states*:
– Root port: a port the bridge uses to reach the root
– Designated port: the lowest-cost port attached to
a single segment
– If a port is not a root port or a designated port, it
is a discarding port.
* In a later protocol RSTP, there can be ports configured as backups and alternates.
Root Port
Designated Port
Discarding Port
Today’s Lecture
• Switching
• Examples of Switching Technologies
– Bridging (datagram)
• Spanning-Tree
– ATM (Virtual Circuits)
The protocol
• IEEE 802.1d has an algorithm that organizes
the bridges as spanning tree in a dynamic
environment
– Note: Trees don’t have loops
• Bridges exchange messages to configure the
bridge (Configuration Bridge Protocol Data
Unit, Configuration BPDUs) to build the tree
– Select ports they use to forward packets
Configuration BPDUs
Destination
MAC address
Source MAC
address
message type
Set to 0
lowest bit is "topology change bit (TC bit)
flags
Cost
bridge ID
port ID
ID of root
Cost of the path from the
bridge sending this
message to root bridge
ID of bridge sending this message
message age
ID of port from which
message is sent
maximum age
Time between
BPDUs from the root
(default: 1sec)
Set to 0
version
root ID
Configuration
Message
Set to 0
protocol identifier
hello time
forward delay
Time between
recalculations of the
spanning tree
(default: 15 secs)
time since root sent a
message on
which this message is based
What do the BPDUs do?
• Elect a single bridge as the root bridge
• Calculate the distance of the shortest path to the root bridge
• Each bridge can determine a root port, the port that gives the
best path to the root
• Each LAN can determine a designated bridge, which is the
bridge closest to the root. A LAN's designated bridge is the
only bridge allowed to forward frames to and from the LAN
for which it is the designated bridge.
• A LAN's designated port is the port that connects it to the
designated bridge
• Select ports to be included in the spanning tree.
Terms
• Each bridge has a unique identifier: Bridge ID
Bridge ID = {Priority : 2 bytes; Bridge MAC address: 6 bytes}
• Priority is configured
• Bridge MAC address is the lowest MAC addresses of all ports
• Each port within a bridge has a unique identifier (port ID)
• Root Bridge: The bridge with the lowest identifier is the
root of the spanning tree
• Root Port: Each bridge has a root port which identifies
the next hop from a bridge to the root
Terms
• Root Path Cost: For each bridge, the cost of the
min-cost path to the root
– Assume it is measured in #hops to the root
• Designated Bridge, Designated Port: Single
bridge on a LAN that is closest to the root for this
LAN:
– If two bridges have the same cost, select the one with
the highest priority; if they have the same priority,
select based on the bridge ID
– If the min-cost bridge has two or more ports on the
LAN, select the port with the lowest identifier
Spanning Tree Algorithm
• Each bridge is sending out BPDUs that contain the following
information:
root ID
cost
bridge ID
root bridge (what the sender thinks it is)
root path cost for sending bridge
Identifies sending bridge
Identifies the sending port
• The transmission of BPDUs results in the distributed
computation of a spanning tree
• The convergence of the algorithm is very quick
port ID
Ordering of Messages
• We define an ordering of BPDU messages
(lexicographically)
ID R1
C1
M1
ID B1
ID P1
ID R2
C2
ID B2
ID P2
M2
We say M1 advertises a better path than M2
(“M1<<M2”) if
(R1 < R2),
Or (R1 == R2) and (C1 < C2),
Or (R1 == R2) and (C1 == C2) and (B1 < B2),
Or (R1 == R2) and (C1 == C2) and (B1 == B2) and
(P1 < P2)
Initializing the Spanning Tree Protocol
• Initially, all bridges assume they are the root bridge.
• Each bridge B sends BPDUs of this form on its LANs
from each port P:
B
0
B
P
• Each bridge looks at the BPDUs received on all its ports
and its own transmitted BPDUs.
• Root bridge is the smallest received root ID that has been
received so far (Whenever a smaller ID arrives, the root is
updated)
Spanning Tree Protocol
• Each bridge B looks on all its ports for BPDUs that are better than its own
BPDUs
• Suppose a bridge with BPDU:
M1
R1
C1
B1
P1
R2
C2
B2
P2
R2
C2+1
receives a “better” BPDU:
M2
Then it will update the BPDU to:
B1
P1
• However, the new BPDU is not necessarily sent out
• On each bridge, the port where the “best BPDU” (via relation “<“) was received
is the root port of the bridge
– No need to send out updated BPDUs to root port
When to send a BPDU
• Say, B has generated a BPDU for each port x
R
Cost
B
x
• B will send this BPDU on port x only if its BPDU is
better (via relation “<“) than any BPDU that B
received from port x.
• In this case, B also assumes that it
is the designated bridge for the
LAN to which the port connects
• And port x is the designated port of that LAN
Port x
Bridge B
Port A
Port C
Port B
Selecting the Ports for the Spanning
Tree
• Each bridge makes a local decision which of its ports
are part of the spanning tree
• Now B can decide which ports are in the spanning
tree:
• B’s root port is part of the spanning tree
• All designated ports are part of the spanning tree
• All other ports are not part of the spanning tree
• B’s ports that are in the spanning tree will forward
packets (=forwarding state)
• B’s ports that are not in the spanning tree will not
forward packets (=blocking state)
Building the Spanning Tree
LAN 2
• Consider the network on the right.
• Assume that the bridges have
calculated the designated ports
(D) and the root ports (P) as
indicated.
•d
•D
Bridge5
Bridge4
•R
•D
•R
LAN 5
Bridge3
•R
• What is the spanning tree?
Bridge2
– On each LAN, connect D ports to
the R ports on this LAN
– Which bridge is the root bridge?
• Suppose a packet is originated in
LAN 5. How is the packet
flooded?
•D
LAN 1
•R
•D
LAN 3
Bridge1
•D
LAN 4
Example
• Assume that all bridges send out their BPDU’s once per
second, and assume that all bridges send their BPDUs at the
same time
• Assume that all bridges are turned on simultaneously at time
T=0 sec.
Brige1
LAN 1
A
LAN 3
B
B
Brige2
Brige4
Brige3
A
LAN 2
B
A
Brige5
B
A
LAN 4
A
B
Example: BPDUs sent
Bridge1 Bridge2 Bridge3 Bridge4 Bridge5
T=1sec
Example: BPDUs sent
Bridge1 Bridge2 Bridge3 Bridge4 Bridge5
T=1sec
Example: BPDUs sent
Bridge1 Bridge2 Bridge3 Bridge4 Bridge5
T=3sec
Example: BPDUs sent
T=1sec
Bridge1
Bridge2
Bridge3
Bridge4
Bridge5
Send:
A: (B1,0,B1,A)
Send:
A:
(B2,0,B2,A)
B: (B2,0,B2,B)
Recv:
A:
B: (B1,0,B1,A)
(B5,0,B5,A)
Send:
A:(B3,0,B3,A)
B:(B3,0,B3,B)
Recv:
A: (B5,0,B5,B)
(B4,0,B4,B)
B: (B1,0,B1,B)
(B4,0,B4,A)
Send:
A:(B4,0,B4,A)
B:(B4,0,B4,B)
Recv:
A: (B3,0,B3,B)
(B1,0,B1,B)
B: (B3,0,B3,A)
(B5,0,B5,B)
Send:
A:(B5,0,B5,A)
B:(B5,0,B5,B)
Recv:
A: (B2,0,B2,B)
(B1,0,B1,A)
B: (B3,0,B3,A)
(B4,0,B4,B)
B: (B1,0,B1,B)
Recv:
A:
(B5,0,B5,A)
(B2,0,B2,B)
B:
(B3,0,B3,B)
(B4,0,B4,A)
Example: BPDU’s sent
T=2sec
Bridge1
Bridge2
Bridge3
Bridge4
Bridge5
D-port: A,B
Send:
A:
(B1,0,B1,A)
B: (B1,0,B1,B)
Recv:
R-port: B
D-port: A
Send:
A:
(B1,1,B2,A)
Recv:
A:
B:
(B1,0,B1,A)
R-port: B
D-port: A
Send:
A:
(B1,1,B3,A)
Recv:
A:
(B1,1,B4,B)
(B1,1,B5,B)
B:
(B1,0,B1,B)
R-port: A
D-port: B
Send:
B: (B1,1,B4,B)
Recv:
A:
(B1,0,B1,B)
B:
(B1,1,B3,A)
(B1,1,B5,B)
R-port: A
D-port: B
Send:
B: (B1,1,B5,B)
Recv:
A:
(B1,0,B1,A)
B:
(B1,1,B3,A)
(B1,1,B4,B)
Example: BPDU’s sent
Bridge 1 Bridge 2 Bridge 3 Bridge4 Bridge5
T=3sec
D-port: A,B
Send:
A:
(B1,0,B1,A)
B:
(B1,0,B1,B)
Recv:
R-port: B
D-port: A
Send:
A:
(B1,1,B2,A)
Recv:
A:
B:
(B1,0,B1,A)
R-port: B
D-port: A
Send:
A:
(B1,1,B3,A)
Recv:
A:
B:
(B1,0,B1,B)
R-port: A
Blocked: B
R-port: A
Blocked: B
Recv:
A:
(B1,0,B1,B)
B:
(B1,1,B3,A)
Recv:
A:
(B1,0,B1,A)
B:
(B1,1,B3,A)
Example: the spanning tree
Bridge1 Bridge2 Bridge3 Bridge4 Bridge5
Root Port
Designated
bridge
Designated
ports
Brige1
A
LAN 1
LAN 3
B
B
Brige2
Brige4
Brige3
A
LAN 2
B
A
Brige5
B
A
B
A
LAN 4
A packet is sent
from LAN2
Example: the spanning tree
Bridge1 Bridge2 Bridge3 Bridge4 Bridge5
Root Port
B
B
A
Designated
bridge
LAN2,3 LAN1
LAN4
Designated
ports
A,B
A
A
Brige1
A
LAN 1
LAN 3
B
Brige2
Brige4
Brige3
A
B
LAN 2
B
A
Brige5
A
B
A
B
A
LAN 4
A packet is sent
from LAN2
Limitations of Bridges
• Scaling
– Spanning tree algorithm doesn’t scale
– Broadcast does not scale
– No way to route around congested links, even if
path exists
• May violate assumptions
– Could confuse some applications that assume
single segment
• Much more likely to drop packets
• Makes latency between nodes non-uniform
– Beware of transparency
VLANs
a
b
1
1
b
2
a
2
• Company network, A and B departments
– Broadcast traffic does not scale
– May not want traffic between the two
departments
– Topology has to mirror physical locations
– What if employees move between offices?
VLANs
a
a
1
2
b
2
b
1
• Solution: Virtual LANs
– Assign switch ports to a VLAN ID (color)
– Isolate traffic: only same color
– Trunk links may belong to multiple VLANs
– Encapsulate packets: add 12-bit VLAN ID
• Easy to change, no need to rewire
Generic Switch Architecture
• Goal: deliver packets from input to output
ports
• Three potential performance concerns:
– Throughput in bytes/second
– Throughput in packets/second
– Latency
Shared Memory Switch
• 1st Generation – like a regular PC
– NIC DMAs packet to memory over I/O bus
– CPU examines header, sends to destination
NIC
– I/O bus is serious bottleneck
– For small packets, CPU may be limited too
– Typically < 0.5 Gbps
Shared Bus Switch
• 2st Generation
– NIC has own processor, cache of forwarding
table
– Shared bus, doesn’t have to go to main
memory
– Typically limited to bus bandwidth
• (Cisco 5600 has a 32Gbps bus)
Point to Point Switch
• 3rd Generation: overcomes single-bus bottleneck
• Example: Cross-bar switch
– Any input-output permutation
– Multiple inputs to same output requires trickery
– Cisco 12000 series: 60Gbps
Cut through vs. Store and Forward
• Two approaches to forwarding a packet
– Receive a full packet, then send to output port
– Start retransmitting as soon as you know output
port, before full packet
• Cut-through routing can greatly decrease latency
• Disadvantage
– Can waste transmission (classic optimistic approach)
• CRC may be bad
• If Ethernet collision, may have to send runt packet on
output link
Buffering
• Buffering of packets can happen at input
ports, fabric, and/or output ports
• Queuing discipline is very important
• Consider FIFO + input port buffering
– Only one packet per output port at any time
– If multiple packets arrive for port 2, they may
block packets to other ports that are free
– Head-of-line blocking: can limit throughput to ~
58% under some reasonable conditions*
2
Port 1
1 2
Port 2
* For independent, uniform traffic, with same-size frames
Head-of-Line Blocking
2
Port 1
1 2
Port 2
• Solution: Virtual Output Queueing
– Each input port has n FIFO queues, one for each
output
– Switch using matching in a bipartite graph
– Shown to achieve 100% throughput*
*MCKEOWN et al.: ACHIEVING 100% THROUGHPUT IN AN INPUT-QUEUED SWITCH, 1999
Today’s Lecture
• Switching
• Examples of Switching Technologies
– Bridging (datagram)
• Spanning-Tree
– ATM (Virtual Circuits)
ATM Cells
• Fixed-size packets
– 5 bytes header
– 48 bytes payload
• If payload smaller than 48B, uses padding
• If greater than 48B, breaks it
Why small, fixed-length packets?
• Cons: maximum efficiency 48/53=90.6%
• Pros:
– Suitable for high-speed hardware implementation
– Many switching elements doing the same thing in
parallel
– Reducing priority packet latency
• Good for QoS
– Reducing transmission latency
• Reducing preemption
latency
• Reduce queuing latency
– Transmission +
propagation + queuing
Why 48 bytes
• It’s from the telephone technology
• Thought data would be mostly voice
• A compromise
– US: 64 bytes
– Europe: 32 bytes
– 64+32 = 48 bytes
Virtual paths
• 24-bit virtual circuit identifiers (VCIs)
– Discussed in our previous lecture
• Two-levels of VCIs
– 8-bit virtual path, 16-bit VCI
– Virtual paths shared by multiple connections
Summary
• Case study
– Ethernet bridges
– Spanning tree algorithm
• Asynchronous Transfer Mode (ATM)
– A fixed packet size network
– Connection oriented
• Using signaling to setup a virtual circuit
• Next lecture
– Internetworking
Coming Up
• Connecting multiple networks: IP and the
Network Layer
Download