Virtual LAN as A Network Control Mechanism Tzi-cker Chiueh Computer Science Department Stony Brook University EdgeNet2006 Summit 1 Ethernet Routing Spanning tree topology Source Learning to populate the forwarding table Broadcast if don’t know what to do Question: How to control the routes on large L2 networks of commodity Ethernet switches? VLAN EdgeNet2006 Summit 2 Virtual LAN (IEEE 802.1Q) Originally proposed to support multiple IP subnets on a L2 network without L3 routers VLAN limits the scope of a broadcast packet 4-byte 802.1Q header inserted between SRC MAC and Type/Length 2-byte 802.1Q tag type = 0x8100 3 bits for priority (IEEE 802.1P) 1 bit for Canonical Format Indicator 12 bits for VLAN ID EdgeNet2006 Summit 3 EdgeNet2006 Summit 4 VLAN in Practice 802.1Q tag is added at the hosts or edge switches Packets are exchanged between two VLANs through a router Conceptually, each VLAN is like a physical LAN that has its own Spanning tree L2 routing table 802.1S allows per-VLAN spanning tree Number of VLANs supported in real switches is hundreds VLAN specification is port-based or host-based Configuration can be based on SNMP or web requests or CLI EdgeNet2006 Summit 5 Viking Project Goal: A network resource management system for campus-wide L2 network backbone or Metro Ethernet Services A large number of low-port-density switches vs. a small number of high-port-density switches Larger geographic coverage More cost-effective (economy of scales) More redundancy at the physical connectivity level Higher aggregate back-plane throughput EdgeNet2006 Summit 6 Problem with Existing Ethernet Main problem: single spanning tree Inefficient Inflexible routing Longer failure recovery EdgeNet2006 Summit 7 Traffic Engineering Constantly measure traffic load matrix Compute an active-backup path for each node pair to balance loads among links and use shorter links whenever possible mesh rather than tree Force a path’s route by setting up a dedicated logical VLAN for it ATM-like behavior on Ethernet Need to combine multiple logical VLANs into one physical VLAN, which corresponds to a spanning tree; active and path paths belong to different VLANs EdgeNet2006 Summit 8 Big Picture Each host in a single IP subnet participates in multiple VLANs, and uses different VLANs to reach different destination Fast failure recovery: Switch to a different 802.1S VLAN to reach a destination when the current VLAN fails The failure recovery time of the Viking prototype is less than 500 msec, most of which is SNMP trap Next step: Edge-based traffic shaping and 802.1P for QoS guarantee EdgeNet2006 Summit 9 EdgeNet2006 Summit 10 IGMP Snooping Why: Avoid using L2 broadcast when supporting L3 multicast How: Snoop on IGMP packets to infer a L2 distribution tree for an IP multicast group on top of a L2 network’s spanning tree Supported by most commodity Ethernet switches Real switches can only track a small number of IP multicast groups Configuration: Sending IGMP packets to the root, which acts as the default router EdgeNet2006 Summit 11 Cassini Project Goal: Leverage commodity Ethernet switches as building block for storage area network Multicast is an important primitive Idea: Use VLAN/IGMP snooping to support tree-based L2 multicast Transparent Reliable Multicast: Multiple L3 connections (e.g. TCP) layered on on top of a L2 multicast connection ACK/Retransmission on individual L3 unicast connection EdgeNet2006 Summit 12 EdgeNet2006 Summit 13 Conclusion Many innovative features in commodity Ethernet switches that are largely exploited CLI or SNMP or HTTP provides the possibility of on-the-fly reconfiguration according to workloads and/or hardware health status Interesting application scenarios: Large-scale L2 network Storage area network Compute cluster interconnect: program-specific topology EdgeNet2006 Summit 14 Thank You! Questions? EdgeNet2006 Summit 15 Mariner Project Goal: Leverage advanced features of commodity Gigabit Ethernet switches to build scalable compute cluster interconnects (~1000 nodes) Programmable application-specific interconnect topology Fault management: asynchronous state checkpointing and pessimistic message logging Scalable multicast state management EdgeNet2006 Summit 16 EdgeNet2006 Summit 17