TRILL Routing Scalability Considerations Alex Zinin zinin@psg.com IETF-62 TRILL BOF General scalability framework About growth functions for Scaling parameters Data overhead (Adj’s, LSDB, MAC entries) BW overhead (Hellos, Updates, Refr’s/sec) CPU overhead (comp complexity, frequency) N—total number of stations N L—number of VLANs F—relocation frequency Types of devices Edge switch (attached to a fraction of N, and L) Core switch (most of L) IETF-62 TRILL BOF Scenarios for analysis Single stationary bcast domain Bcast domain with mobile stations Multiple stationary VLANs No practical station mobility N = O(1K) by natural bcast limits L = O(1K) total, O(100) visible to switch N = O(10K) total Multiple VLANs with mobile stations IETF-62 TRILL BOF Protocol params of interest What Why Amount of data (topology, leaf entries) Number of LSPs LSP refresh rate LSP update rate Flooding complexity Route calculation complexity & frequency Required memory [increase] as network grows Required mem & CPU to keep up with protocol dynamics Link BW overhead to control the network How: Absolute: big-O notation Relative: compare to e.g. bridging & IP routing IETF-62 TRILL BOF Why is this important If data-inefficient: Increased memory requirements Frequent memory upgrades as network grows Much more info to flood If comput’ly inefficient: Substantial comp power increase == marginal network size increase High CPU utilization Inability to keep up with protocol dynamics IETF-62 TRILL BOF Link-state Protocol Dynamics Network events are visible everywhere Main assumption for stationary networks: For each node: Rprc >> Rinp What if (Rprc < Rinp) ??? Rinp—input update rate (network event frequency) Rprc—update process rate Long-term convergence condition: Network change is temporary Topology stabilizes within finite T Micro bursts are buffered by queues Short-term (normal for stat. nets): update drops, rexmit, convergence Long-term/permanent: net never converges, CPU upgrade needed Rprc = f (proto design, CPU, implementation) Rinp = f (proto design, network) IETF-62 TRILL BOF Data-plane parameters Data overhead Number of MAC entries in CAM-table Why worry? CAM-table is expensive 1-8K entries for small switches 32K-128K for core switches Shared among VLANs Entries expire when stations go silent IETF-62 TRILL BOF Single Bcast domain (CP) Total of O(1K) MAC addresses IS-IS update packing: 4 addr’s per TLV (TLV is 255B max) 20 addr’s per LSP fragment (1470B default) ~5K add’s per node (256 frags total) LSP refresh rate: Each address: 12bit VLAN tag + 48bit MAC = 60 bits 1K MACs = 50 LSPs 1h renewal = 1 update every 72 secs MAC update rate: Depends on MAC learning & dead detection procedure IETF-62 TRILL BOF MAC learning Traffic + expiration (5-15m): Announces station activity 1K stations, 30m fluctuations = 1 update every 1.8 seconds average Likely bursts due to “start-of-day” phenomenon Reachability-based Start announcing MAC when first heard from station Assume it’s there until have seen evidence otherwise even if silent (presumption of reachability) Removes activity-sensitive fluctuations IETF-62 TRILL BOF Single bcast domain (DP) Number of entries Bridges: f (traffic) Limited by local config, location within network Rbridge: all attached stations No big change for core switches (see most MACs) May be a problem for smaller ones IETF-62 TRILL BOF Single bcast: summary With reachibility-based MAC announcements… CP is well within the limits of current link-state routing protocols CP data overhead is O(N) Can comfortably handle O(10k) routes Dynamics are very similar There’s an existence proof that this works Worse than IP routing: O(log N) However, net size is upper-bound by bcast limits Small switches will need to store & compute more Data-plane may require bigger MAC tables in smaller switches IETF-62 TRILL BOF Note: comfort limit Always possible to overload neighbor w updates Update flow control is employed Experience-based heuristics: pace updates at 30/sec Dynamic is possible, yet… Not a hard rule, ballpark Limits burst Rinp for neighbor Prevents drops during flooding storms Given the (Rprc >> Rinp) condition, want average to be an order of magnitude lower, e.g. O(1) upd/sec Max IETF-62 TRILL BOF Note: protocol upper-bound LSP generation is paced: normally not more frequent than each 5 secs Each LSP frag has it’s own timer With equal distribution Max node origination rate == 51 upd/sec Does not address long-term stability IETF-62 TRILL BOF Single bcast + mobility Same number of stations Different dynamics Take IETF wireless network, worst case ~700 stations New location within 10 minutes Average 1 MAC every 0.86 sec or 1.16 MAC/sec Note: every small switch in VLAN will see updates How does it work now??? Same data efficiency for CP and DP Bridges (APs + switches) relearn MACs, expire old Summary: dynamics barely fit within comfort range IETF-62 TRILL BOF Multiple VLANs Real networks have VLANs Assuming current proposal is used Two possibilities: Standard IS-IS flooding Single IS-IS instance for whole network Separate IS-IS instance per VLAN Similar scaling challenges as with VR-based L3 VPNs IETF-62 TRILL BOF VLANs: single IS-IS Assuming reachability-based MAC announc’t Adjacencies and convergence scale well However… Easily hit 5K MAC/node limit (solvable) Every switch sees every MAC in every VLAN Even if it doesn’t need it Clear scaling issue IETF-62 TRILL BOF VLANs: multiple instances MAC announcements scale well Good resource separation However… N N N N adjacencies for a VLAN trunk times more processing for a single topological event times more data structures (neighbors, timers, etc.) =100…1000 for a core switch Clear scaling issue for core switches IETF-62 TRILL BOF VLANs: data plane Core switches Not big difference Exposed to most MACs in VLANs anyway Smaller switches Have to install all MACs even if a single port on a switch belongs to a VLAN May require bigger MAC tables than available today IETF-62 TRILL BOF VLANs: summary Control plane: Currently available solutions have scaling issues Data plane: Smaller switches may have to pay IETF-62 TRILL BOF VLANs + Mobility Assuming some VLANs will have mobile stations Data plane: same as stationary VLANs All scaling considerations for VLANs apply Mobility dynamics get multiplied Single IS-IS: updates hit same adjacency Multiple IS-IS: updates hit same CPU Activity not bounded naturally anymore Update rate easily goes outside comfort range Clear scaling issues IETF-62 TRILL BOF Resolving scaling concerns 5K MAC/node limit in IS-IS could be solved with RFC3786 Don’t use per-VLAN (multi-instance) routing Use reachability-based MAC announcement Scaling MAC distribution requires VLAN-aware flooding: Each node and link is associated with a set of VLANs Only information needed by the remote nbr is flooded to it Not present in current IS-IS framework Forget about mobility ;-) IETF-62 TRILL BOF