Today’s Lecture • Admini-trivia • Switching • Examples of Switching Technologies – Bridging (datagram) • Spanning-Tree – ATM (Virtual Circuits) Assignment 1 Collision Detection Me Check to see If any one is tx NO collisions Other you Check to see If any one is tx NO collisions Collision Detection Me Check to see If any one is tx NO collisions you Check to see If any one is tx NO collisions Collision Detection Me Check to see If any one is tx NO collisions you Check to see If any one is tx NO collisions Collision Detection Me Check to see If any one is tx NO collisions you Check to see If any one is tx NO collisions Collision Detection Me Check to see If any one is tx NO collisions you Check to see If any one is tx NO collisions Today’s Lecture • Admini-trivia • Switching • Examples of Switching Technologies – Bridging (datagram) • Spanning-Tree – ATM (Virtual Circuits) Why Should You Care About Switching? • All Networks use switches – Facebook/Google/Microsoft • Use Spanning-Tree in data centers – ATT/Verizon/Sprint/Comcast • Use MPLS in Wide-Area-Networks Basic Problem • Direct-link networks don’t scale • Solution: use switches to connect network segments Switching • Switches must be able to, given a packet, determine the outgoing port • 3 ways to do this: – Virtual Circuit Switching – Datagram Switching – Source Routing Virtual Circuit Switching • Explicit set-up and tear down phases – Establishes Virtual Circuit Identifier on each link – Each switch stores VC table • Subsequent packets follow same path – Switches map [in-port, in-VCI] : [out-port, out-VCI] • Also called connection-oriented model Virtual Circuit Model • Requires one RTT before sending first packet • Connection request contain full destination address, subsequent packets only small VCI • Setup phase allows reservation of resources, such as bandwidth or buffer-space – Any problems here? • If a link or switch fails, must re-establish whole circuit • Example: ATM Datagram Switching Switch 2 Add r Port A 3 B 0 C 3 D 3 E 2 F 1 G 0 • Each packet carries destination addressH • Switches maintain address-based tables – Maps [destination address]:[out-port] • Also called connectionless model 0 Datagram Switching • No delay for connection setup • Source can’t know if network can deliver a packet • Possible to route around failures • Higher overhead per-packet • Potentially larger tables at switches Source Routing • Packets carry entire route: ports • Switches need no tables! – But end hosts must obtain the path information • Variable packet header Comparisons Virtual Circuit Datagram Switching Source Routing Start-Up Latency Yes NO NO Size of Forwarding Tables (In switch) O(# of circuits) O(# of hosts) 0 Packet Header size Contains circuit ID Contains src & dst addresses Contains the WHOLE PATH Packet Header size Fixed-size Fixed-size Variable-size Recover from failure No Yes No Delivery guarantee Yes No no Constant Oscillation between paradigms as New Technologies Arise Today’s Lecture • Switching • Examples of Switching Technologies – Bridging (datagram) • Spanning-Tree – ATM (Virtual Circuits) Bridges and Extended LANs • LANs have limitations – E.g. Ethernet < 1024 hosts, < 2500m • Connect two or more LANs with a bridge – Operates on Ethernet addresses – Forwards packets from one LAN to the other(s) • Ethernet switch is just a multi-way bridge Learning Bridges • Idea: don’t forward a packet where it isn’t needed – If you know recipient is not on that port • Learn hosts’ locations based on source addresses – Build a table as you receive packets – Table is a cache: if full, evict old entries. Why is this fine? • Table says when not to forward a packet – Doesn’t need to be complete for correctness Attack on a Learning Switch • Eve: wants to sniff all packets sent to Bob • Same segment: easy (shared medium) • Different segment on a learning bridge: hard – Once bridge learns Bob’s port, stop broadcasting • How can Eve force the bridge to keep broadcasting? – Flood the network with frames with spoofed src addr! Bridges • • • • Unicast: forward with filtering Broadcast: always forward Multicast: always forward or learn groups Difference between bridges and repeaters? – Bridges: same broadcast domain; copy frames – Repeaters: same broadcast and collision domain; copy signals Dealing with Loops • Problem: people may create loops in LAN! – Accidentally, or to provide redundancy – Don’t want to forward packets indefinitely Dealing with Loops • Problem: people may create loops in LAN! – Accidentally, or to provide redundancy – Don’t want to forward packets indefinitely LAN 2 F F Bridge B Bridge A F F LAN 1 F host A Spanning Tree • Need to disable ports, so that no loops in network • Like creating a spanning tree in a graph – View switches and networks as nodes, ports as edges Distributed Spanning Tree Algorithm • Every bridge has a unique ID (Ethernet address) • Goal: – Bridge with the smallest ID is the root – Each segment has one designated bridge, responsible for forwarding its packets towards the root • Bridge closest to root is designated bridge • If there is a tie, bridge with lowest ID wins Spanning Tree Protocol • Send message when you think you are the root • Otherwise, forward messages from best known root – Add one to distance before forwarding – Don’t forward over discarding ports (see next slide) • Spanning Tree messages contain: – ID of bridge sending the message – ID sender believes to be the root – Distance (in hops) from sender to root • Bridges remember best config msg on each port • In the end, only root is generating messages Spanning Tree Protocol (cont.) • Forwarding and Broadcasting • Port states*: – Root port: a port the bridge uses to reach the root – Designated port: the lowest-cost port attached to a single segment – If a port is not a root port or a designated port, it is a discarding port. * In a later protocol RSTP, there can be ports configured as backups and alternates. Root Port Designated Port Discarding Port Today’s Lecture • Switching • Examples of Switching Technologies – Bridging (datagram) • Spanning-Tree – ATM (Virtual Circuits) The protocol • IEEE 802.1d has an algorithm that organizes the bridges as spanning tree in a dynamic environment – Note: Trees don’t have loops • Bridges exchange messages to configure the bridge (Configuration Bridge Protocol Data Unit, Configuration BPDUs) to build the tree – Select ports they use to forward packets Configuration BPDUs Destination MAC address Source MAC address message type Set to 0 lowest bit is "topology change bit (TC bit) flags Cost bridge ID port ID ID of root Cost of the path from the bridge sending this message to root bridge ID of bridge sending this message message age ID of port from which message is sent maximum age Time between BPDUs from the root (default: 1sec) Set to 0 version root ID Configuration Message Set to 0 protocol identifier hello time forward delay Time between recalculations of the spanning tree (default: 15 secs) time since root sent a message on which this message is based What do the BPDUs do? • Elect a single bridge as the root bridge • Calculate the distance of the shortest path to the root bridge • Each bridge can determine a root port, the port that gives the best path to the root • Each LAN can determine a designated bridge, which is the bridge closest to the root. A LAN's designated bridge is the only bridge allowed to forward frames to and from the LAN for which it is the designated bridge. • A LAN's designated port is the port that connects it to the designated bridge • Select ports to be included in the spanning tree. Terms • Each bridge has a unique identifier: Bridge ID Bridge ID = {Priority : 2 bytes; Bridge MAC address: 6 bytes} • Priority is configured • Bridge MAC address is the lowest MAC addresses of all ports • Each port within a bridge has a unique identifier (port ID) • Root Bridge: The bridge with the lowest identifier is the root of the spanning tree • Root Port: Each bridge has a root port which identifies the next hop from a bridge to the root Terms • Root Path Cost: For each bridge, the cost of the min-cost path to the root – Assume it is measured in #hops to the root • Designated Bridge, Designated Port: Single bridge on a LAN that is closest to the root for this LAN: – If two bridges have the same cost, select the one with the highest priority; if they have the same priority, select based on the bridge ID – If the min-cost bridge has two or more ports on the LAN, select the port with the lowest identifier Spanning Tree Algorithm • Each bridge is sending out BPDUs that contain the following information: root ID cost bridge ID root bridge (what the sender thinks it is) root path cost for sending bridge Identifies sending bridge Identifies the sending port • The transmission of BPDUs results in the distributed computation of a spanning tree • The convergence of the algorithm is very quick port ID Ordering of Messages • We define an ordering of BPDU messages (lexicographically) ID R1 C1 M1 ID B1 ID P1 ID R2 C2 ID B2 ID P2 M2 We say M1 advertises a better path than M2 (“M1<<M2”) if (R1 < R2), Or (R1 == R2) and (C1 < C2), Or (R1 == R2) and (C1 == C2) and (B1 < B2), Or (R1 == R2) and (C1 == C2) and (B1 == B2) and (P1 < P2) Initializing the Spanning Tree Protocol • Initially, all bridges assume they are the root bridge. • Each bridge B sends BPDUs of this form on its LANs from each port P: B 0 B P • Each bridge looks at the BPDUs received on all its ports and its own transmitted BPDUs. • Root bridge is the smallest received root ID that has been received so far (Whenever a smaller ID arrives, the root is updated) Spanning Tree Protocol • Each bridge B looks on all its ports for BPDUs that are better than its own BPDUs • Suppose a bridge with BPDU: M1 R1 C1 B1 P1 R2 C2 B2 P2 R2 C2+1 receives a “better” BPDU: M2 Then it will update the BPDU to: B1 P1 • However, the new BPDU is not necessarily sent out • On each bridge, the port where the “best BPDU” (via relation “<“) was received is the root port of the bridge – No need to send out updated BPDUs to root port When to send a BPDU • Say, B has generated a BPDU for each port x R Cost B x • B will send this BPDU on port x only if its BPDU is better (via relation “<“) than any BPDU that B received from port x. • In this case, B also assumes that it is the designated bridge for the LAN to which the port connects • And port x is the designated port of that LAN Port x Bridge B Port A Port C Port B Selecting the Ports for the Spanning Tree • Each bridge makes a local decision which of its ports are part of the spanning tree • Now B can decide which ports are in the spanning tree: • B’s root port is part of the spanning tree • All designated ports are part of the spanning tree • All other ports are not part of the spanning tree • B’s ports that are in the spanning tree will forward packets (=forwarding state) • B’s ports that are not in the spanning tree will not forward packets (=blocking state) Building the Spanning Tree LAN 2 • Consider the network on the right. • Assume that the bridges have calculated the designated ports (D) and the root ports (P) as indicated. •d •D Bridge5 Bridge4 •R •D •R LAN 5 Bridge3 •R • What is the spanning tree? Bridge2 – On each LAN, connect D ports to the R ports on this LAN – Which bridge is the root bridge? • Suppose a packet is originated in LAN 5. How is the packet flooded? •D LAN 1 •R •D LAN 3 Bridge1 •D LAN 4 Example • Assume that all bridges send out their BPDU’s once per second, and assume that all bridges send their BPDUs at the same time • Assume that all bridges are turned on simultaneously at time T=0 sec. Brige1 LAN 1 A LAN 3 B B Brige2 Brige4 Brige3 A LAN 2 B A Brige5 B A LAN 4 A B Example: BPDUs sent Bridge1 Bridge2 Bridge3 Bridge4 Bridge5 T=1sec Example: BPDUs sent Bridge1 Bridge2 Bridge3 Bridge4 Bridge5 T=1sec Example: BPDUs sent Bridge1 Bridge2 Bridge3 Bridge4 Bridge5 T=3sec Example: BPDUs sent T=1sec Bridge1 Bridge2 Bridge3 Bridge4 Bridge5 Send: A: (B1,0,B1,A) Send: A: (B2,0,B2,A) B: (B2,0,B2,B) Recv: A: B: (B1,0,B1,A) (B5,0,B5,A) Send: A:(B3,0,B3,A) B:(B3,0,B3,B) Recv: A: (B5,0,B5,B) (B4,0,B4,B) B: (B1,0,B1,B) (B4,0,B4,A) Send: A:(B4,0,B4,A) B:(B4,0,B4,B) Recv: A: (B3,0,B3,B) (B1,0,B1,B) B: (B3,0,B3,A) (B5,0,B5,B) Send: A:(B5,0,B5,A) B:(B5,0,B5,B) Recv: A: (B2,0,B2,B) (B1,0,B1,A) B: (B3,0,B3,A) (B4,0,B4,B) B: (B1,0,B1,B) Recv: A: (B5,0,B5,A) (B2,0,B2,B) B: (B3,0,B3,B) (B4,0,B4,A) Example: BPDU’s sent T=2sec Bridge1 Bridge2 Bridge3 Bridge4 Bridge5 D-port: A,B Send: A: (B1,0,B1,A) B: (B1,0,B1,B) Recv: R-port: B D-port: A Send: A: (B1,1,B2,A) Recv: A: B: (B1,0,B1,A) R-port: B D-port: A Send: A: (B1,1,B3,A) Recv: A: (B1,1,B4,B) (B1,1,B5,B) B: (B1,0,B1,B) R-port: A D-port: B Send: B: (B1,1,B4,B) Recv: A: (B1,0,B1,B) B: (B1,1,B3,A) (B1,1,B5,B) R-port: A D-port: B Send: B: (B1,1,B5,B) Recv: A: (B1,0,B1,A) B: (B1,1,B3,A) (B1,1,B4,B) Example: BPDU’s sent Bridge 1 Bridge 2 Bridge 3 Bridge4 Bridge5 T=3sec D-port: A,B Send: A: (B1,0,B1,A) B: (B1,0,B1,B) Recv: R-port: B D-port: A Send: A: (B1,1,B2,A) Recv: A: B: (B1,0,B1,A) R-port: B D-port: A Send: A: (B1,1,B3,A) Recv: A: B: (B1,0,B1,B) R-port: A Blocked: B R-port: A Blocked: B Recv: A: (B1,0,B1,B) B: (B1,1,B3,A) Recv: A: (B1,0,B1,A) B: (B1,1,B3,A) Example: the spanning tree Bridge1 Bridge2 Bridge3 Bridge4 Bridge5 Root Port Designated bridge Designated ports Brige1 A LAN 1 LAN 3 B B Brige2 Brige4 Brige3 A LAN 2 B A Brige5 B A B A LAN 4 A packet is sent from LAN2 Example: the spanning tree Bridge1 Bridge2 Bridge3 Bridge4 Bridge5 Root Port B B A Designated bridge LAN2,3 LAN1 LAN4 Designated ports A,B A A Brige1 A LAN 1 LAN 3 B Brige2 Brige4 Brige3 A B LAN 2 B A Brige5 A B A B A LAN 4 A packet is sent from LAN2 Limitations of Bridges • Scaling – Spanning tree algorithm doesn’t scale – Broadcast does not scale – No way to route around congested links, even if path exists • May violate assumptions – Could confuse some applications that assume single segment • Much more likely to drop packets • Makes latency between nodes non-uniform – Beware of transparency VLANs a b 1 1 b 2 a 2 • Company network, A and B departments – Broadcast traffic does not scale – May not want traffic between the two departments – Topology has to mirror physical locations – What if employees move between offices? VLANs a a 1 2 b 2 b 1 • Solution: Virtual LANs – Assign switch ports to a VLAN ID (color) – Isolate traffic: only same color – Trunk links may belong to multiple VLANs – Encapsulate packets: add 12-bit VLAN ID • Easy to change, no need to rewire Generic Switch Architecture • Goal: deliver packets from input to output ports • Three potential performance concerns: – Throughput in bytes/second – Throughput in packets/second – Latency Shared Memory Switch • 1st Generation – like a regular PC – NIC DMAs packet to memory over I/O bus – CPU examines header, sends to destination NIC – I/O bus is serious bottleneck – For small packets, CPU may be limited too – Typically < 0.5 Gbps Shared Bus Switch • 2st Generation – NIC has own processor, cache of forwarding table – Shared bus, doesn’t have to go to main memory – Typically limited to bus bandwidth • (Cisco 5600 has a 32Gbps bus) Point to Point Switch • 3rd Generation: overcomes single-bus bottleneck • Example: Cross-bar switch – Any input-output permutation – Multiple inputs to same output requires trickery – Cisco 12000 series: 60Gbps Cut through vs. Store and Forward • Two approaches to forwarding a packet – Receive a full packet, then send to output port – Start retransmitting as soon as you know output port, before full packet • Cut-through routing can greatly decrease latency • Disadvantage – Can waste transmission (classic optimistic approach) • CRC may be bad • If Ethernet collision, may have to send runt packet on output link Buffering • Buffering of packets can happen at input ports, fabric, and/or output ports • Queuing discipline is very important • Consider FIFO + input port buffering – Only one packet per output port at any time – If multiple packets arrive for port 2, they may block packets to other ports that are free – Head-of-line blocking: can limit throughput to ~ 58% under some reasonable conditions* 2 Port 1 1 2 Port 2 * For independent, uniform traffic, with same-size frames Head-of-Line Blocking 2 Port 1 1 2 Port 2 • Solution: Virtual Output Queueing – Each input port has n FIFO queues, one for each output – Switch using matching in a bipartite graph – Shown to achieve 100% throughput* *MCKEOWN et al.: ACHIEVING 100% THROUGHPUT IN AN INPUT-QUEUED SWITCH, 1999 Today’s Lecture • Switching • Examples of Switching Technologies – Bridging (datagram) • Spanning-Tree – ATM (Virtual Circuits) ATM Cells • Fixed-size packets – 5 bytes header – 48 bytes payload • If payload smaller than 48B, uses padding • If greater than 48B, breaks it Why small, fixed-length packets? • Cons: maximum efficiency 48/53=90.6% • Pros: – Suitable for high-speed hardware implementation – Many switching elements doing the same thing in parallel – Reducing priority packet latency • Good for QoS – Reducing transmission latency • Reducing preemption latency • Reduce queuing latency – Transmission + propagation + queuing Why 48 bytes • It’s from the telephone technology • Thought data would be mostly voice • A compromise – US: 64 bytes – Europe: 32 bytes – 64+32 = 48 bytes Virtual paths • 24-bit virtual circuit identifiers (VCIs) – Discussed in our previous lecture • Two-levels of VCIs – 8-bit virtual path, 16-bit VCI – Virtual paths shared by multiple connections Summary • Case study – Ethernet bridges – Spanning tree algorithm • Asynchronous Transfer Mode (ATM) – A fixed packet size network – Connection oriented • Using signaling to setup a virtual circuit • Next lecture – Internetworking Coming Up • Connecting multiple networks: IP and the Network Layer