High-Performance Routing Tarik Cicic University of Oslo December 2001 Overview • • • • • • • What is routing and what is switching Router and switch architecture in short Address structure of the Internet Routing and forwarding procedure Performance issues Routing lookups and packet classification Switch interconnect 2 Router • • • • “L3 device” does L3 (IP) packet forwarding supports L3 routing protocols (IP) possibility to interconnect different L2 technologies (IP/ATM with IP/SDH) 3 1 Switch • • • • “L2 device” L2 packet (or cell) forwarding forwarding decision based e.g. on flow ID all links (interfaces / ports) same L2 technology • distinction between routers and switches is fluid, some mean it dissapears 4 Basic architectural components Congestion Control Admission Control Routing Reservation Control Control Datapath: Output Scheduling Switching Policing per-packet processing (as we shall see, not all routers5 are switching routers) Per-packet processing Output Scheduling Forwarding Table Interconnect Forwarding Decision Forwarding Table Forwarding Decision Forwarding Table Forwarding Decision 6 2 ATM forwarding • Forwarding procedure: – – – – Lookup cell VCI/VPI in VC table replace old VCI/VPI with new forward cell to outgoing interface transmit cell onto link • VPI/VCI tables are built using a separate routing protocol (PNNI) 7 Ethernet switch forwarding • Lookup frame destination address in forwarding table – if known, forward to correct port – if unknown, broadcast to all ports • learn source address of incoming frame • forward frame to outgoing interface • transmit frame onto link 8 IP router forwarding • Lookup packet destination address in forwarding table – if known, forward to correct port – if unknown, drop packet • decrement TTL, update header checksum • forward packet to outgoing interface • transmit packet onto link 9 3 Additional delay in routing Routing Kernel and Switch Controller An extra queue must be passed. ATM switch 10 Comparison Switch Router Simple table Hierarchical lookup addresses, CIDR Header Label CRC, TTL modification swapping or none Queues n n+1 Lookup Not so big difference, after all? 11 First-generation IP routers Shared Backplane CP U M em or Buffer Memory CPU Li In n e ter fa ce y DMA DMA DMA Line Interface Line Interface Line Interface MAC MAC MAC 12 4 Second-generation IP routers Buffer Memory CPU DMA DMA DMA Line Card Local Buffer Memory Line Card Local Buffer Memory Line Card Local Buffer Memory MAC MAC MAC 13 Third-generation switches/routers Switched Backplane L L ILnin ine L ILniInnitneetere LiILniInneitneeteer f r fa face L I CPI Initnnetnereter f r fac acece nUt er fa ac e er fa ce e fa ce M ce em or y Line Card CPU Card Line Card Local Buffer Memory Local Buffer Memory MAC MAC 14 • The third generation routers / switches further obscures the difference • recall that ATM is considered to be a complex technology inducing too much overhead • a hybrid routing-switching technology is introduced 15 5 Addressing basics • Each node has a unique address • flat addressing: – twenty nodes need twenty entries in the routing table – two million nodes need two million entries • hierarchical addressing: – addresses composed of • network address • node address 16 IP addresses Class-based: 232-1 0 A Net mask B 127 networks ~32000 networks Net mask C ~16 million networks Net mask 17 IP routers: Class-based addresses IP Address Space Class A 212.17.9.4 Class B Class A Class B Class C Class C D Routing Table: Exact Match 212.17.9.0 Port 4 18 6 Forwarding decision (class-based) Classless coding: 128.9.0.0 65/24 0 142.12/19 128.9/16 216 232-1 128.9.16.14 • Only the shown networks are known to the router • other packets are sent to a default interface 19 Problems with Class-Based Addressing • Fixed net id – host id boundaries too inflexible: rapid depletion of address space • Exponential growth of routing table size 20 Classless Inter-Domain Routing 128.9.19/24 128.9.25/24 128.9.16/20 128.9.176/20 128.9/16 232-1 0 128.9.16.14 Most specific route = “longest matching prefix” 21 7 CIDR routing table 128.9.16.14 Prefix Port 65/24 128.9/16 128.9.16/20 128.9.19/24 128.9.25/24 128.9.176/20 142.12/19 3 5 2 7 10 1 3 CIDR saved IPv4 from running out of addresses 22 CIDR: Hierarchical Route Aggregation Backbone 192.2.0/22, R2 R1 R3 R4 R2 ISP P 192.2.0/22 Site T 192.2.1/24 ISP Q 200.11.0/22 Site S 192.2.2/24 23 192.2.1/24 192.2.2/24 192.2.0/22 IP number line Problems with Route Aggregation • Change of provider • Multi-homed networks 24 8 Multi-Homed Networks Backbone 192.2.2/24, R3 192.2.0/22, R2 R1 R3 R4 R2 192.2.1/24 192.2.2/24 ISP Q 200.11.0/22 25 Change of Provider Backbone 192.2.2/24, R3 192.2.0/22, R2 R1 R3 R4 R2 ISP P 192.2.0/22 Site T 192.2.1/24 ISP Q 200.11.0/22 Site S 192.2.2/24 26 Active BGP Entries http://www.telstra.net/ops/bgp/index.html 27 9 Global Internet routing • The Internet is divided in routing domains – it is too large to route only using OSPF – policing must be imposed • Border Gateway Protocol (BGP) is used to interconnect the domains Domain3 Domain1 Domain2 28 Border Gateway Protocol • Dominant inter-domain routing protocol • Domains (in this context) = administrative units in the Internet (Autonomous Systems) • BGP “speakers” in a domain announce the routes this domain uses to reach networks in the Internet (and much more, e.g. willingness to be used as transit) 29 Name Service (DNS) • Way to map the hardly intelligible IP addresses to human-understandable names • largely orthogonal to addressing • name hierarchy, comparable to e.g. a file system edu ucla www cs com mit apple org cisco uk no uio vg 30 10 DNS • • • • • • End hosts are the leaves in this tree there is no root – it is distributed! the hierarchy is divided into zones each zone runs a name server each server can resolve all names in its zone it also “talks” to other name servers – caches the name information – learns by interrogating other servers 31 DNS Client www.ifi.uio.no www.ifi.uio.no 129.240.64.2 o o.n .ui . ifi o 0 n w . ww ui o 0.1.4 4 9.2 12 Local name server www.ifi.uio.no ifi.uio.no 129.240.64.16 ww w ww . ifi.u io. w n 12 . ifi. 9.2 uio o 40 . 64 . no .2 Root name server UIO name server IFI name server 32 Challenges of Modern IP Routing • High performance core IP router coming to the market today should – have capacity for keeping 200000+ routes – support 10 - 40 Gb/s lines + support all new, advanced functionality (policing, service differentiation +++) 33 11 Lookup Time Year Line Line Rate Capacity (Mpps) 40 B 84 B 354 B 0.48 0.23 0.054 1997-78 OC3 (Gb/s) 0.155 1998-99 OC12 0.622 1.94 0.92 0.22 1999-00 OC48 2.5 7.81 3.72 0.88 2000-01 OC192 10 31.25 15 3.53 2002-03 OC768 40 125 60 14.12 34 Lookup Time (cont.) • Basic functionality requires only address lookup • Advanced functions require – flow classification (e.g. IntServ), or – other header analyze (e.g. TOS or policing) • We first discuss the address lookups 35 Three Stages of Packet Processing Forwarding Table 1 2 Interconnect Output Scheduling 3 Forwarding Decision Forwarding Table Forwarding Decision Forwarding Table Forwarding Decision 36 12 Forwarding Decision Destination address? Routing table Forwarding decision Input buffer Output buffer Linecard 37 IP Router Lookup H E A D E R Dstn Addr Forwarding Engine Next Hop Next Hop Computation Forwarding Table Destination Next Hop ------------- Incoming Packet ---- ---- 38 Lookup Algorithms • Clever data structure needed • Optimize: – lookup time – memory requirements – incremental update time 39 13 Trivial Schemes • Address caching: remember last addresses in hope that more packets addressed to the same destination will appear soon • List of (address, next hop) pairs: – O(N) entries – O(N) lookup time – O(1) update time 40 Example Forwarding Table 5-bit Prefixes Prefix Next Hop P1 111* H1 P2 10* H2 P3 1010* H3 P4 10101 H4 41 Radix Trie P1 111* H1 P2 10* H2 P3 1010* H3 P4 10101 H4 • O(W) lookup 1 • O(NW) storage 1 0 • O(W) update P2 1 1 P1 0 Lookup: 10111 P3 1 P4 42 14 PATRICIA Trie P1 111* H1 P2 10* H2 P3 1010* H3 P4 10101 H4 • O(W2) lookup 2 0 1 3 • O(N) storage P1 0 • O(W) update 1 P2 5 0 1 P3 P4 Lookup: 10111 Backtracking! 43 Multi-bit Tries W W/k 44 Tertiary Trie (k=2) P1 111* H1 P2 10* H2 P3 1010* H3 P4 10101 H4 • O(W/2) lookup 10 11 10 10 P3 10 Lookup: 10111 • O(N*4) storage P2 P41 P11 11 P12 11 P42 45 15 Hardware Lookups • Content-Addressable Memory (CAM) • Information is not located on a fixed physical location in RAM chip, but parameterized • If the destination address is used as the parameter, O(1) lookup time is achieved! • Ternary CAM is commercially available • Power consumption proportional to the routing table size, 6-8 W • 0.5 MB at 66 MHz costs ~100 $ 46 Lookup Comparison Algorithm Lookup Storage Binary Trie W NW W Patricia W2 N W Multiary Trie W/k N* 2k - N log W - N W Binary search log W on trie levels T-CAM 1 Update 47 Providing Value-Added Services • Differentiated services – Regard traffic from AS#33 as `platinum-grade’ • Access Control Lists – Deny UDP host 194.72.72.33 • Committed Access Rate – Rate limit WWW traffic from sub-interface#739 to 10Mbps • Policy-based Routing – Route all voice traffic through the ATM network • Peering Arrangements – Restrict the total amount of traffic of precedence 7 from MAC address N to 20 Mbps between 10 am and 5pm • Accounting and Billing – Generate hourly reports of traffic from MAC address M 48 16 Flow Classification H E A D E R Forwarding Engine Flow Index Flow Classification Policy Database Predicate Action ------------- Incoming Packet ---- ---- 49 A Packet Classifier Field 1 Field 2 … Field k Action Rule 1 152.163.190.69/21 152.163.80.11/32 … Udp A1 Rule 2 152.168.3.0/24 152.163.200.157/16 … Tcp A2 … … … … … … Rule N 152.168.3.0/16 152.163.80.11/32 … Any An Given a classifier, find the action associated with the highest priority rule (here, the lowest numbered rule) matching an incoming packet. 50 Field #1 Field #2 Data Geometric Interpretation in 2D Field #2 R7 R3 R6 e.g. (144.24/16, 64/24) e.g. (128.16.46.23, *) R1 R5 R4 R2 Field51 #1 17 Proposed Schemes Pros Sequential Evaluation Cons Small storage, scales well with number of fields Slow classification rates Ternary CAMs Single cycle classification Grid of Tries Cost, density, power consumption Small storage requirements and Not easily extendible to fast lookup rates for two fields. more than two fields. Suitable for big classifiers 52 Three Stages of Packet Processing Forwarding Table 1 2 Interconnect Output Scheduling 3 Forwarding Decision Forwarding Table Forwarding Decision Forwarding Table Forwarding Decision 53 Interconnects: Two basic techniques Input Queuing Usually a non-blocking switch fabric (e.g. crossbar) Output Queuing Usually a fast bus 54 18 Interconnects: Output Queuing Individual Output Queues Centralized Shared Memory Memory b/w = 2RN 1 2 N 1 2 Memory b/w = R*(N+1) N 55 “Ideal” Output Queuing 1 2 1 1 2 2 11 1 2 1 2 1 2 1 1 56 How fast can we make centralized shared memory? 5ns SRAM Shared Memory • 5ns per memory operation • Two memory operations per packet • Therefore, up to 160Gb/s • In practice, closer to 80Gb/s 1 2 N 200 byte bus 57 19 Input Queuing with Crossbar Memory b/w = 2R Data In Scheduler Data Out 58 Head of Line Blocking Delay configuration Load 58.6% 100% 59 Head of Line Blocking 60 20 61 62 Virtual output queues 63 21 Delay Virtual Output Queues Load 100% 64 Input Queueing Memory b/w = 2R Scheduler Can be quite complex! 65 Three Stages of Packet Processing Forwarding Table 1 2 Interconnect Output Scheduling 3 Forwarding Decision Forwarding Table Forwarding Decision Forwarding Table Forwarding Decision 66 22 Summary • Extreme requirements for modern routing equipment • Scalability: 110 000 ++ routing entries • Performance: 30 000 000 + lookups per second • Modern services demand a far bigger processing power 67 23