Broadband Network Architectures Router Design TEMangir Sp02 Routing1 Outline Introduction Router Fundamentals Routing Algorithms and Protocols Fast Forwarding Layer-3 Switching IP over WDM TEM 497 Routing2 Introduction Routing3 A Fine Distinction Imprecision surrounds the terms “routing” and “forwarding” Forwarding is the act of transferring a packet from one interface of a router to another, after consulting a forwarding table Routing is the act building routing tables by means of a routing algorithm We frequently abuse this convention TEM 497 Routing4 What is a Router? A packet forwarder Multiprotocol – IP, IPX, AppleTalk A routing-protocol execution machine Multiprotocol – IGRP, RIP, OSPF, IS-IS A A A A TEM 497 packet monitor general-purpose computer firewall switch Routing5 Internet Forwarder Functions Parse the datagram header Checksum actions Select the network protocol Decrement the TTL field Use the TOS field to prioritize the datagram Process the options fields Forward (route) the datagram to next hop Fragment the datagram TEM 497 Routing6 Internet Router Functions Execute one or more routing protocols Exchange state information with other routers Use a transport protocol Authentication Collect network-management statistics Packet counts, lengths, and types Source-destination matrix Configuration support User interface Tunnel management TEM 497 Routing7 Internet Firewall Functions Filtering of destinations Source Destination Filtering of services Block protocols Block transport port numbers Virtual private networks FTP HTTP X Port Nums UDP TCP Proto ID IP Encrypted tunnels TEM 497 Routing8 Control and Data Planes Control Plane control packets to & from other control plane entities data packets to & from other data plane entities Route Determination Function Data Forwarding Function control packets to & from other control plane entities data packets to & from other data plane entities Data Plane Router TEM 497 Routing9 Router Fundamentals Routing10 ARP Address Resolution Protocol translates an IP address to a media (link) address Simple request-response protocol First host broadcasts a request packet containing desired IP address Second host recognizes its IP address Second host sends a response packet to first host containing its media (link) address First host caches address mapping for later use TEM 497 Routing11 ARP Header 0 15 Hardware Type HLen PLen 31 Protocol Type Operation Source Hardware Address Source Protocol Address Target Hardware Address Target Protocol Address TEM 497 Routing12 ARP Header Fields Hardware type: e.g. Ethernet = 1 Protocol type: e.g. IPv4 = 0080 HLen: Hardware address length (e.g. Ethernet = 48 bits) PLen: Protocol address length (e.g. IPv4 = 32 bits) Operation: a query (0) or a reply (1) Source: where packet came from Target: system it is querying about TEM 497 Routing13 ARP Operation (1) DNS FTP FTP (8) TCP TCP (8) (1) (2) IP (8) ARP (7) Ethernet Driver ARP (3) IP (8) Ethernet Driver ARP (6) (5) Ethernet Driver (4) TEM 497 Routing14 ARP Operation (2) 1. IP datagram with destination address 2. Next-hop address is passed to ARP 3. ARP request passed to Ethernet driver 4. ARP request broadcast in Ethernet frame Routing ARP request recognized by next-hop node 6. ARP reply sent by next-hop node 7. ARP reply updates ARP cache 8. IP datagram sent through next-hop node TEM 497 Routing15 Proxy ARP Allows a router to answer ARP requests from one of its networks for a host on another of its networks Router substitutes its link address for the responding host’s Proxy gives the illusion that the host is connected to another network TEM 497 Routing16 RARP Reverse ARP translates a media (link) address to an IP address Used by system without nonvolatile storage Requires a network-wide RARP server Similar to BOOTP (Bootstrap Protocol) TEM 497 Routing17 Router Advertisement (1) Routers announce presence by broadcasting ICMP router advertisements All-hosts multicast address: 224.0.0.1 Limited broadcast address: Routing Advertisements are periodic 7-minute period Advertisement becomes stale after 30 minutes TEM 497 Routing18 Router Advertisement (2) Advertisements contain a list of addresses Router IP addresses Preference level of each address Higher values are preferred Highest value is the normal router Lower value is a backup router Lowest values do not wish to receive default traffic TEM 497 Routing19 Router Solicitation (1) A host should not have to wait 7 minutes for the next ICMP router advertisement ICMP router solicitation messages allow the host to request the identity of a router The host broadcasts the solicitation All-routers multicast address: 224.0.0.2 Limited broadcast address: 255.255.255.255 The host receives many advertisements The host chooses the router on its subnet TEM 497 Routing20 Router Solicitation (2) Host bootstrap operation Broadcasts 3 solicitations Broadcasts 1 message every 3 seconds Broadcasting stops as soon as a valid router advertisement is received TEM 497 Routing21 Broadcast Storms Mechanisms that rely on broadcasting messages within a LAN are vulnerable to broadcast storms, i.e. long, uncontrolled exchanges of broadcast packets. Because everyone must process a broadcast, storms put a heavy load on uninvolved nodes. Therefore, protocol exchanges – such as ARP, RARP, DHCP, Router Solicitation, and Router Announcement – must control broadcasts with timers and by limiting message counts. TEM 497 Routing22 Redirect ICMP redirect error is sent by a router to a host to indicate that the host should send its datagrams through another router 1. First Datagram 4. Successive Datagrams 2. Redirect 3. First Datagram Security concern! TEM 497 Routing23 A Simple Router I/O Bus CPU DMA Ctrl 3 2 System Bus 1 DMA Xfer Main Memory 1. Packet input 2. Header processing Routing table lookup DMA transaction TEM 497 3. Packet output NIC Fast Ethernet NIC FDDI NIC ATM NIC = Network Interface Controller DMA = Direct Memory Access Routing24 IP-Layer Processing Routing Algorithm Routing Table Mgmt UDP TCP Yes ICMP Data Control Routing Table IP Output Calculate Next Hop IP Layer TEM 497 Network Output(s) No Addressed Here? Forwarded Packet Source Routed Packet Process IP Options IP Input Queue Network Input(s) Routing25 Routing Table Structure Destination IPv4 address Host address (32 bits) Network address (<32 bits) Next-hop router IP address Router on a directly connected network Flags Network or host Router or interface Network interface TEM 497 Routing26 Routing Table Host address Multicast address zap % netstat -rn Routing tables Destination 128.9.192.24 128.9.192.72 128.9.192.73 224.0.0.9 127.0.0.1 128.9.192.146 128.9.192.100 128.9.192.69 128.9.192.126 default 128.9.192.0 128.9.112.0 Loopback address Next-hop router Gateway 128.9.112.24 128.9.112.72 128.9.112.73 127.0.0.1 127.0.0.1 128.9.112.146 128.9.112.100 128.9.112.69 128.9.112.126 128.9.112.72 128.9.192.151 128.9.112.151 Network address TEM 497 Flags UGH UGH UGH UH UH UGH UGH UGH UGH UG U U U = route G = route H = route D = route Refcnt 0 9 0 1 8 0 0 0 0 22 7 0 is up is via gateway is to a host was redirected Use 0 54173 0 118606 3541986 0 0 0 0 8601210 2109258 51 Interface myri0 myri0 myri0 lo0 lo0 myri0 myri0 myri0 myri0 myri0 le0 myri0 Ethernet Loopback Myrinet Routing27 IP Output Processing Search table for match of host address If found, then send datagram to next-hop router or directly connected interface Search table for match of network address If found, then send datagram to next-hop router or directly connected interface Use subnet mask, if necessary Search table for default entry If found, then send the datagram to next-hop router TEM 497 Routing28 Routing Assumptions Router knows the addresses of all other routers Router knows the “costs” to reach its neighbors Network viewed as a collection of nodes and (bidirectional) links From any given router find next hop on shortest path to any other router Tolerance of failures TEM 497 Routing29 Distance-Vector Routing Based on the sharing of distance vectors A router’s distance vector is a list of its “distances” to every other router in the routing domain Router tells its neighbors its distance (cost) to every other router in the network Cost = Distance Usually we assume that cost = distance = hops Other metrics: bandwidth, delay, charging TEM 497 Routing30 Distance-Vector Algorithm Router maintains a distance vector List of <dest, cost> entries Router periodically sends a copy of its distance vector to all neighboring routers Upon receipt of a distance vector, the router determines its new distance vector cost(v) min {cost(v), costw(v)+cost(w)} Converges to shortest-path routes O(MN), M=num_links, N=num_nodes TEM 497 Routing31 Distance-Vector Problems Slow convergence Packet bouncing after link failure Counting to infinity Race condition after network partition Algorithm keeps adding to current cost, never reaching infinity Solution: represent infinity by a large number Large number is 16 in RIP Caused by routers repeating information that was valid before failures TEM 497 Routing32 Link-State Routing Based on sharing of link state Link-state packets: <ID, Nbr_ID, cost> Link-state information is flooded throughout the network Each router computes shortest paths independently Router tells every other router its distance (cost) to its neighbors Cost = distance = hops TEM 497 Routing33 Link-State Algorithm Router maintains a database of link-state packets that describe its links Router floods a copy of every link-state packet throughout the network Uses sequence numbers and duplicate elimination to control the flood Router applies Dijkstra algorithm to find shortest path Converges to shortest-path routes O(M logM), M = num_links TEM 497 Routing34 LS LS LS Router All Other Routers LS Router’s Neighbors DV Router DV DV Two Routing Schemes Distance Vector Routing Link State Routing Router sends a large amount of information to a few recipients Router sends a small amount of information to many recipients TEM 497 Routing35 Link-State & DistanceVector Routing Link-state Loopless routing Fast convergence Precise, multiple metrics (costs) Distance-vector Simplicity Less memory required Both in use in today’s Internet TEM 497 Routing36 Internet Routing Hierarchy Interior routing Within an AS Intradomain routing Exterior routing Between ASs Interdomain routing TEM 497 Routing37 Internet Routing Protocols Interior Gateway Protocols (IGPs) RIP RIPv2 is the current standard IGRP EIGRP OSPF IS-IS Exterior Gateway Protocol (EGP) Border Gateway Protocol (BGP) BGP-4 is the current standard TEM 497 Routing38 Routing Protocol Comparison Routing Protocol Supported Protocols Strengths Enhanced IGRP IP, IPX, AppleTalk load balancing, metrics IGRP IP, OSI-IP RIPv2 IP simplicity improved convergence OSPF IP rapid convergence complexity IS-IS IP, OSI-IP RIP IP simplicity count to TEM 497 Limitations Routing39 IGP Example 128.9.1.2 Rtr A Rtr B s1 e0 128.9.2.0/24 (2000) s2 128.9.3.0/24 (60) .2 128.9.Routing2 .2 128.9.4.0/24 (60) Rtr C 128.9.1.0/24 (10) 128.9.Routing0/24 (10) 128.9.6.0/24 (10) RIP Routing Table at Rtr A Destination 129.9.1.0 128.9.2.0 128.9.3.0 Next Hop e0 s1 s2 128.9.2.2 (s1) 128.9.4.0 128.9.3.2 (s2) 128.9.Routing0128.9.2.2 (s1) 128.9.6.0 128.9.3.2 (s2) TEM 497 Hop Count 1 1 1 1 OSPF Routing Table at Rtr A 128.9.6.2 Destination 129.9.1.0 128.9.2.0 128.9.3.0 Next Hop e0 s1 s2 Hop Count - 128.9.4.0 128.9.3.2 (s2) 120 128.9.Routing0128.9.2.2 (s1) 128.9.6.0 128.9.3.2 (s2) 130 70 Routing40 Lollipop Sequence Space Problem: Sequence numbers of link-state packets wrap around or are restarted a -N/2 0 d N/2 - 1 Sequence numbers start here (bootup) and circle around repeatedly TEM 497 b If d<N/4 (half circumference) then b is the newer sequence number, otherwise a is newer Sequence numbers in this subspace are generated only after bootup, and recipients notify the booting router of last sequence number received Routing41 Routing in the Internet Autonomous System (AS) Set of routers and hosts administered by a single entity Customer network (e.g., 128.9.0.0) ISP Backbone provider Assigned a unique 16-bit number AS represents a routing domain TEM 497 Routing42 Classification of ASs (1) Stub AS Single connection to another AS All traffic is local (i.e., originates or terminates at the AS) E.g., a typical corporation Multihomed AS Multiple connections to other ASs Refuses to carry nonlocal (transit) traffic E.g., a well-connected corporation TEM 497 Routing43 Classification of ASs (2) Transit AS Multiple connections to other ASs Accepts local and nonlocal (transit) traffic E.g., ISP or backbone operator TEM 497 Routing44 Types of ASs AS 4 (stub) AS 2 (transit) AS 1 (transit) AS 5 (stub) AS 6 (multihomed) TEM 497 AS 3 (transit) Routing45 First 20 AS Numbers AS Number Name Handle 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 GNTY-1 DCN-AS MIT-GATEWAYS ISI-AS SYMBOLICS BULL-HN UK-MOD RICE-AS CMU-ROUTER CSNET-EXT-AS HARVARD NYU-DOMAIN BRL-AS COLUMBIA-GW NET-DYNAMICS-EXP LBL PURDUE UTEXAS CSS-DOMAIN UR [CS15-ARIN] [DW19-ARIN] [RH164-ARIN] [JKR1-ARIN] [SG52-ARIN] [JLM23-ARIN] [RNM1-ARIN] [RUH-ORG-ARIN] [HC-ORG-ARIN] [CS15-ARIN] [WJO3-ARIN] [ZN68-ARIN] [RR33-ARIN] [ZC26-ARIN] [ZSU-ARIN] [CAL3-ARIN] [JRS8-ARIN] [DLN12-ARIN] [CR11-ARIN] [LB16-ARIN] http://www.arin.net/library/internet_info/asn.txt TEM 497 Routing46 CIDR — Problems Classless Interdomain Routing (CIDR) Class A IP addresses are too large (16M hosts) Class C IP addresses are too small (256 hosts) Class B addresses are just right (64K hosts), but we are running out of class B addresses Routing table explosion Core routers act upon network numbers Routing tables grow as number of networks increases TEM 497 Routing47 CIDR — Solutions Allocate the class C address space among geographical regions Europe, the Americas, Asia, Africa Eases routing problems Assign blocks of class C addresses to users Can attach more than 256 hosts Allows for the aggregation of routes TEM 497 Routing48 CIDR — Rules User may ask for 2n contiguous class C address blocks (0 n 5) Yields 2n+8 host addresses A block of class C addresses is listed in a core routing table by address prefix Like a subnet mask E.g., the prefix 192.4.16.0/20 specifies network numbers 192.4.16.0 through 192.4.31.255 TEM 497 Routing49 CIDR Aggregation Routing Table 192.4.16.0/20 One routing prefix replaces 4096 entries 4096 Customer Addresses 192.4.16.0 - 192.4.31.255 Customer Backbone Provider ISP “192.4.16.0/20” is shorthand notation for “192.4.16.0 - 192.4.31.255” TEM 497 Routing50 CIDR Block Allocations 194.0.0.0 198.0.0.0 200.0.0.0 202.0.0.0 fewer fewer fewer fewer fewer fewer fewer TEM 497 – 195.255.255.255: Europe - 199.255.255.255: North America - 201. 255.255.255 : Central and South America - 203. 255.255.255 : Asia and the Pacific than than than than than than than 256 addresses: 512 addresses: 1024 addresses: 2048 addresses: 4096 addresses: 8192 addresses: 16384 addresses: 1 class C network 2 class C networks 4 class C networks 8 class C networks 16 class C networks 32 class C networks 64 class C networks Routing51 Network Address Translation A form of IP masquerading Used when a large customer network can obtain only a small IP address allocation For example, a corporation with thousands of hosts receives only a class C address space Private network address space used internally 10.0.0.0/8 172.16.0.0/12 192.168.0.0/16 TEM 497 Routing52 User Tools for Routing netstat Unix and MS-DOS Display routing table with -rn arp Unix and MS-DOS Examine or modify the ARP cache ifconfig Unix Report details of network interfaces with -a TEM 497 Routing53 Evolution of Router Design Generation 1: shared backplane and shared buffer memory Generation 2: shared backplane and local buffer memory Generation 3: switched backplane and local buffer memory Generation 4: clusters of routers TEM 497 Routing54 Generation 1 PACKET BUFFERS CPU BACKPLANE BUS LINK INTERFACE CARDS TEM 497 DMA DMA DMA MAC MAC MAC Routing55 Generation 2 CPU BACKPLANE BUS DMA LINK INTERFACE CARDS PACKET BUFFERS MAC TEM 497 DMA DMA PACKET BUFFERS PACKET BUFFERS MAC MAC Routing56 Generation 3 CPU SWITCH DMA LINK INTERFACE CARDS PACKET BUFFERS MAC TEM 497 DMA DMA PACKET BUFFERS PACKET BUFFERS MAC MAC Routing57 Generation 4 LINK INTERFACES R O U T E R R O U T E R R O U T E R FAST INTERCONNECT TEM 497 Routing58 Fast Forwarding Routing59 Cisco Forwarding Performance Switching Path Cisco 2500 Cisco 4500 Cisco 7000 Cisco 7500 Process 1000 pps 10,000 pps 2500 pps 10,000 pps Fast 6000 pps 45,000 pps 30000 pps 150,000 pps 271,000 pps 275,000 pps Hardware TEM 497 N/A N/A Routing60 Cisco Performance Notes Process Fast Hardware TEM 497 Routing61 Importance of Lookups The routing table must have an entry for every possible Internet address Routing-table size has grown steadily The problem is to match the destination address of an incoming packet to a routing-table entry in a small amount of time Entry is usually an aggregated prefix Best (longest) prefix match TEM 497 Routing62 Routing Table Growth TEM 497 www.telstra.net/ops/bgptable.html Routing63 Address Lookup Router must be able to look up all assigned IPv4 addresses Millions of addresses are assigned There is not enough high-speed memory to store all assigned IPv4 (and IPv6) addresses We must aggregate addresses to compress the routing table as much as possible TEM 497 Routing64 Address Aggregation Address Interface Address Interface 128.9.160.38 8 128.9.0.0/16 8 128.9.191.7 8 128.0.0.0/1 4 154.23.16.134 4 128.0.0.0/6 1 194.47.10.72 4 171.9.0.0/16 5 128.12Routing50.89 1 193.0.0.0/4 3 130.39.213.66 1 193.9.14.0/24 5 171.9.160.38 5 193.77.50.7 3 193.9.14.38 5 202.197.160.67 3 Compressed Table Original Table TEM 497 Routing65 A Simple Scheme In IPv4 at most only the first 24 bits are used by core routers Those bits specify the network number toward which the packet is headed Given a fast random-access memory of 224 locations (16 Mword), we can store the next hop of net address x.y.z.* in memory location x.y.z Only one memory access per lookup is needed TEM 497 Routing66 Updating Routing Tables Compressed routing tables must be updated periodically New information about routes can affect address aggregation The compression effort can be significant Compression must be computationally efficient TEM 497 Routing67 Hash Tables for Fast Address Lookup Length 8 TEM 497 Hash Lists of Prefixes 10 12 10.128, 10.64 16 10.1, 10.2 24 10.1.1, 10.1.2, 10.2.1 Routing68 Level-1 Lookup Scheme IP Address 31 10 2 4 0 16 0 bix bit 1 15 0 ix code[4K] six ten 675 base[1K] maptable[676] pointer = + TEM 497 + Routing69 Level-2/3 Lookup Level-1 pointer points to either: Next hop, or Indicator to continue search at levels 2/3 Levels 2/3 use the lower 16 bits of the address to look up the next hop TEM 497 Routing70 Performance of Scheme Data structures fit in data cache memory Fewer than 100 instructions per address are required for lookup Therefore, can forward several million packets per second through a conventional CPU-based router TEM 497 Routing71 Layer-3 Switching Routing72 Tag Switching Sometimes called layer-3 or IP switching Combines a switch with a router Fast switch Slower router Attempts to detour around the slow routing path by taking a fast switching path TEM 497 Routing73 Observations (ca. 1997) Routers are expensive and slow $187,000 for 1-Gb/s router Switches are cheap and fast $41,000 for 5-Gb/s switch It costs 20 times as much to route a bit as to switch it TEM 497 Routing74 IP Flows A flow is a stream of IP packets that follow the same route for several hops Common flow types Streams from a specific source address to a specific destination address Streams from a specific source address/port to a specific destination address/port Flows have limited lifetimes Analogous to a VC TEM 497 Routing75 Flow Classification Flows should be long-lived Disregard DNS packets Disregard ICMP packets (e.g., ping) Disregard most HTTP packets Flows should be high-throughput Disregard Telnet sessions Detect a flow if the number of packets received in a specified time interval exceeds a threshold TEM 497 Routing76 Flow Statistics Count packets and flows over a period of time Flow is defined by IP source and destination addresses Measure the duration of each flow Count the number of packets in each flow TEM 497 Routing77 Flow Statistics Illustrated 100 PERCENTAGE FLOWS 50 PACKETS 0 0 50 100 150 200 250 300 FLOW DURATION (seconds) TEM 497 Routing78 Flows and Packet Traces Protocol News (NNTP+TCP) Mbone (IP in IP) X Windows (TCP) FTP Data (TCP) Rlogin (Telnet+TCP) Web (HTTP+TCP) Mail (POP+TCP) Mail (SMTP+TCP) Management (SNMP+TCP) Name Server (DNS+UDP) TEM 497 Packets/s Flows/s Flow Duration (s) Packets/Flow 1096 0.7 177 627 456 0.1 173 2307 111 0.2 161 276 2018 2.2 118 525 803 4.2 114 114 6717 73.0 57 74 9 0.4 27 21 802 49.5 18 15 43 6.1 18 6 929 216.6 15 4 Routing79 Flow Classifier X/Y flow classifier Flow recognition by stream characteristics X packets Y seconds Flow is declared switchable Flow deletion by stream characteristics W packets Z seconds Flow is declared unswitchable Analogous to calculating first derivative df/dt TEM 497 Routing80 Basic Tag-Switch Strategy Determine whether a flow exists Use normal hop-by-hop IP forwarding for short-lived flows Use “short-cut” ATM switching for longlived, high-throughput flows TEM 497 Routing81 Tag-Switch Architecture Remove from ATM switch Signaling LANE MPOA IS-IS routing IP Switch Controller Add to ATM switch Flow management protocol Switch management protocol Flow classifier ATM Switch Claim: added software is 10% the size of removed software! TEM 497 Routing82 Default Mode Controller IP flow is initially forwarded Two default VCs are used Downstream Upstream Switch TEM 497 Routing83 Flows Detected Controller detects a flow Instructs upstream switch to use a new VC Upstream flow is now labelled by a new VC Controller Downstream controller detects a flow Instructs this switch to use a new VC Downstream flow is now labelled by a new VC Downstream Upstream Switch TEM 497 Routing84 Cut-Through Action Controller directs the switch to reconfigure Two VCs are joined into one VC Flow is now carried at switching speeds Periodic messages to maintain new VC Timeout of inactive flows Controller Downstream Upstream Switch TEM 497 Routing85 Features of Tag Switching IP header and LLC/SNAP encapsulation header can be removed Compression benefits throughput Added back later at the exit tag switch TTL is adjusted at the exit tag switch Preserve the value that it would have had in default mode Update the IP checksum too Avoids mismatches in TTL for a flow TEM 497 Routing86 Tag-Switching Performance Analysis based on San Francisco NAP packet traces Evaluate switching gain, i.e. the fraction of all packets that are directly switched Simulations of Ipsilon IP switch 86% of packets are switched 92% of bytes are switched Switching gain is maximized at a detection threshold of about 10 packets TEM 497 Routing87 Layer-3 Switching Data-driven approaches: use only packet statistics Ipsilon IP Switching Cisco Tag Switching Topology-driven approaches: use routingtable or other topological information IBM ARIS Multiprotocol Label Switching (MPLS) TEM 497 Routing88 MPLS Multiprotocol Label Switching (MPLS) Generalized MPLS (GMPLS) TEM 497 Routing89 IP over WDM Place IP flows on their own lightpaths Lightpath is formed by the concatenation of wavelengths Lightpath is all-optical Idea is similar to IP switching Wavelength-selective crossconnect (vs. ATM cell switch) There are only a few wavlelengths to carry flows (vs. many ATM virtual channels) A signaling protocol is required to set up lightpaths TEM 497 Routing90