CS 356: Computer Network Architectures Lecture 10: IP forwarding Xiaowei Yang xwy@cs.duke.edu Overview • IP addressing • IP forwarding – Forwarding algorithm – Fragmentation • Address resolution protocol (ARP) • Internet Control Message protocol (ICMP) – Error reporting Global IP addresses What is an IP Address? • An IP address is a unique global identifier for a network interface – An IP address uniquely identifies a network location • Routers forwards a packet based on the destination address of the packet • Uniqueness ensures global reachability IP Addressing • Addressing defines how addresses are allocated and the structure of addresses • IPv4 (32-bit) – Classful IP addresses (obsolete) – Classless inter-domain routing (CIDR) (RFC 854, current standard) • IP Version 6 addresses (128-bit) An IPv4 address is often written in dotted decimal notation • Each byte is identified by a decimal number in the range [0…255]: 10000000 10001111 10001001 10010000 1st Byte 2nd Byte 3rd Byte 4th Byte = 128 = 143 = 137 = 144 128.143.137.144 Structure of an IP address 31 0 network prefix host number • An IP address encodes both a network number (network prefix) and an interface number (host number). – network prefix identifies a network – the host number identifies a specific host (actually, an interface on the network). • The structure is designed to improve the scalability of routing – Scales better than flat addresses How long is a network prefix? • Before 1993: The network prefix is implicitly defined (class-based addressing) • After 1993: The network prefix is indicated by a netmask Before 1993: Class-based addressing • The Internet address space was divided up into classes: – Class A: Network prefix is 8 bits long – Class B: Network prefix is 16 bits long – Class C: Network prefix is 24 bits long – Class D is multicast address – Class E is reserved Classful IP Addresses (Until 1993) • Each IP address contained a key which identifies the class: – Class A: IP address starts with “0” – Class B: IP address starts with “10” – Class C: IP address starts with “110” – Class D: IP address starts with “1110” – Class E: IP address starts wit “11110” The old way: Internet Address Classes bit # 0 Class A 1 7 8 31 0 Network Prefix Host Number 8 bits 24 bits bit # 0 1 2 Class B 10 15 16 network id 110 host Network Prefix Host Number 16 bits 16 bits bit # 0 1 2 3 Class C 31 23 24 network id 31 host Network Prefix Host Number 24 bits 8 bits The old way: Internet Address Classes bit # 0 1 2 3 4 Class D 1110 31 multicast group id bit # 0 1 2 3 4 5 Class E 11110 31 (reserved for future use) Problems with Classful IP Addresses • Fast growing routing table size – Each router must have an entry for every network prefix – ~ 221 = 2,097,152 class C networks – In 1993, the size of routing tables started to outgrow the capacity of routers • Local admins must request another network number before installing a new network at their site Solution: Classless Inter-domain routing (CIDR) • Network prefix is of variable length – No rigid class boundary • Addresses are allocated hierarchically • Routers aggregate multiple address prefixes into one routing entry to minimize routing table size Hierarchical IP Address Allocation Internet Assigned Numbers Authority Regional Internet Registries (Five of them) Internet Service Providers • American Registry for Internet Numbers (ARIN) • RIPE, APNIC, LACNIC, AfriNIC CIDR network prefix has variable length 128 Addr 10000000 255 Mask 11111111 143 10001111 255 11111111 137 10001001 255 1111111 144 10010000 0 00000000 • A network mask specifies the number of bits used to identify a network in an IP address. CIDR notation • CIDR notation of an IP address: – 128.143.137.144/24 – /24 is the prefix length. It states that the first 24 bits are the network prefix of the address (and the remaining 8 bits are available for specific host addresses) • CIDR notation can nicely express blocks of addresses – An address block [128.195.0.0, 128.195.255.255] can be represented by an address prefix 128.195.0.0/16 – How many IP addresses are there in a /x address block? • 2 (32-x) IP Forwarding Forwarding of IP datagrams • There are two distinct processes to delivering IP datagrams: 1. Forwarding (data plane): How to pass a packet from an input interface to the output interface? 2. Routing (control plane): How to find and setup the forwarding tables? Key points • Each IP datagram contains the IP destination address • The “network part” of an IP address identifies a single physical network • All hosts and routers that share the same network part of their address are connected to the same physical network • Each physical network on the Internet has at least one router that connects this network to other physical networks Forwarding algorithm Is dst on the same physical network? No Forward to next-hop router Yes Deliver the packet to the Network directly 1. How to determine whether a dst is on the same physical network? 2. How to determine the next hop router? – Routing Detailed forwarding algorithm • If (networkNum == networkNum of one of my interfaces) then – Deliver packet over the interface • Else – if (NetworkNum is in my forwarding table) then • Deliver to the NextHop router – Else • Deliver packet to the default router Forwarding table lookup • When a router or host needs to transmit an IP datagram, it performs a routing table lookup • Forwarding table lookup: Use the IP destination address as a key to search the routing table • Result of the lookup is the IP address of a next hop router, and/or the name of a network interface Destination address Next hop/ interface network prefix or host IP address or loopback address or default route IP address of next hop router or Name of a network interface Type of forwarding table entries • Network route – Destination addresses is a network address (e.g., 10.0.2.0/24) – Most entries are network routes • Host route – Destination address is an interface address (e.g., 10.0.1.2/32) – Used to specify a separate route for certain hosts • Default route – Used when no network or host route matches • Loopback address – Routing table for the loopback address (127.0.0.1) – The next hop lists the loopback (lo0) interface as outgoing interface Simplified forwarding algorithm • Observation: – A directly physical network can be an entry in the forwarding table – A default route can be an entry • Simplified algorithm 1. Look up destination algorithm in the forwarding table using longest prefix match 2. Forward the packet to the next hop indicated by the matched entry Longest prefix match • Longest Prefix Match: Search for the forwarding table entry that has the longest = of the destination IP match with the prefix address 1. 2. Search for a match on all 32 bits Search for a match for 31 bits ….. 32. Search for a match on 0 bits Host route, loopback entry 32-bit prefix match Default route is represented as 0.0.0.0/0 0-bit prefix match 128.143.71.21 Destination addressNext hop 10.0.0.0/8 128.143.0.0/16 128.143.64.0/20 128.143.192.0/20 128.143.71.0/24 128.143.71.55/32 0.0.0.0/0 (default) eth0 R2 R3 R3 R4 R3 R5 The longest prefix match for 128.143.71.21 is for 24 bits with entry 128.143.71.0/24 Datagram will be sent to R4 eth0 • • • • Ex: H1 H2 Nexthop: eth0 H1 H6 Nexthop: R1 • Q: How an IP packet is sent from H1 to H2 or H1 to R6? – Encapsulated into an Ethernet frame How to find out a host’s Ethernet address after knowing its IP address? Address Resolution Protocol ARP and RARP • Note: – The Internet is based on IP addresses – Data link protocols (Ethernet, FDDI, ATM) may have different (MAC) addresses • The ARP and RARP protocols perform the translation between IP addresses and MAC layer addresses • We will discuss ARP for broadcast LANs, particularly Ethernet LANs – RFC 826 • RARP obsolete IP address (32 bit) ARP RARP Ethernet MAC address (48 bit) Address Translation with ARP ARP Request: Argon broadcasts an ARP request to all stations on the network: “What is the hardware address of 128.143.137.1?” Argon 128.143.137.144 00:a0:24:71:e4:44 ARP Request: What is the MAC address of 128.143.71.1? Router137 128.143.137.1 00:e0:f9:23:a8:20 Address Translation with ARP ARP Reply: Router 137 responds with an ARP Reply which contains the hardware address Argon 128.143.137.144 00:a0:24:71:e4:44 Router137 128.143.137.1 00:e0:f9:23:a8:20 ARP Reply: The MAC address of 128.143.71.1 is 00:e0:f9:23:a8:20 ARP Packet Format Ethernet II header Destination address Source address Type 0x8060 6 6 2 ARP Request or ARP Reply 28 10 Hardware type (2 bytes) Hardware address length (1 byte) Padding CRC 4 Protocol type (2 bytes) Protocol address length (1 byte) Operation code (2 bytes) Source hardware address* Source protocol address* Target hardware address* Target protocol address* * Note: The length of the address fields is determined by the corresponding address length fields • Hardware type: ether (1) • Prototype: taken from the set ether_type – IP: 0x0800 • Opcode – ARP request: 1 – ARP reply: 2 • Check RFC for implementation details Example • ARP Request from Argon is broadcasted: – Source addr in Ethernet header: 00:a0:24:71:e4:44 – Destination addr in Ethernet header: FF:FF:FF:FF:FF:FF Source hardware address: 00:a0:24:71:e4:44 Source protocol address: 128.143.137.144 Target hardware address: 00:00:00:00:00:00 Target protocol address: 128.143.137.1 • ARP Reply from Router137 is unicasted: – Source addr: 00:e0:f9:23:a8:20 – Dst addr: 00:a0:24:71:e4:44 Source hardware address: 00:e0:f9:23:a8:20 Source protocol address: 128.143.137.1 Target hardware address: 00:a0:24:71:e4:44 Target protocol address: 128.143.137.144 ARP Cache • Since sending an ARP request/reply for each IP datagram is inefficient, hosts maintain a cache (ARP Cache) of current entries. The entries expire after a time interval. • Contents of the ARP Cache: (128.143.71.37) at 00:10:4B:C5:D1:15 [ether] on eth0 (128.143.71.36) at 00:B0:D0:E1:17:D5 [ether] on eth0 (128.143.71.35) at 00:B0:D0:DE:70:E6 [ether] on eth0 (128.143.136.90) at 00:05:3C:06:27:35 [ether] on eth1 (128.143.71.34) at 00:B0:D0:E1:17:DB [ether] on eth0 (128.143.71.33) at 00:B0:D0:E1:17:DF [ether] on eth0 Putting it together IP Forwarding Implementation Logistics IP Output Put on IP input queue Yes Yes IP Input IP destination = multicast or broadcast ? Put on IP input queue No IP destination of packet = local IP address ? loopback Driver Next slide No: get MAC address with ARP Ethernet Driver ARP ARP Packet IP datagram demultiplex Ethernet Frame Ethernet Lab2 input Routing Protocol Static routing UDP TCP Demultiplex Yes routing table Lookup next hop Yes IP forwarding enabled? No Destination address local? No IP module Send datagram Discard Data Link Layer Input queue ICMP IP Forwarding Logistics (Lab 2) 1. Sanity-check • 2. Update header • 3. Decrement the TTL by 1, and compute the packet checksum over the modified header. Next hop IP lookup • 4. Find out which entry in the routing table has the longest prefix match with the destination IP address. Next hop MAC lookup • 5. meets minimum length and has correct checksum Check the ARP cache for the next-hop MAC address corresponding to the next-hop IP. If it's there, send it. Otherwise, send an ARP request for the next-hop IP (if one hasn't been sent within the last second), and add the packet to the queue of packets waiting on this ARP request. Error reporting Error reporting • Internet Control Message Protocol (ICMP) – Ill-formatted packets – TTL == 0 – ARP receives no reply – No protocol or application running at the destination – No routing table match –… Location in the protocol stack • The IP (Internet Protocol) relies on several other protocols to perform necessary control and routing functions: • Control functions (ICMP) • Multicast signaling (IGMP) • Setting up forwarding tables (RIP, OSPF, BGP, PIM, …) RIP ICMP OSPF BGP IGMP PIM Routing Control 41 Overview • The Internet Control Message Protocol (ICMP) is a helper protocol that supports IP with facility for – Error reporting – Simple queries – ICMP messages are encapsulated as IP datagrams: IP header ICMP message IP payload 42 ICMP message format bit # 0 7 8 type 15 16 code 23 24 31 checksum additional information or 0x00000000 4 byte header: • Type (1 byte): type of ICMP message • Code (1 byte): subtype of ICMP message • Checksum (2 bytes): similar to IP header checksum. Checksum is calculated over the entire ICMP message If there is no additional data, there are 4 bytes set to zero. each ICMP message is at least 8 bytes long 43 ICMP Query message ICMP query: • Request sent by host to a router or host • Reply sent back to querying host Example of ICMP Queries Type/Code: Description 8/0 0/0 Echo Request Echo Reply 13/0 14/0 Timestamp Request Timestamp Reply The ping command uses Echo Request/ Echo Reply Extension (RFC 1256): 10/0 9/0 Router Solicitation Router Advertisement 45 ICMP Error message • ICMP error messages report error conditions • Typically sent when a datagram is discarded • Error message is often passed from ICMP to the application program 46 ICMP Error message ICMP Message from IP datagram that triggered the error IP header type ICMP header code IP header 8 bytes of payload checksum Unused (0x00000000) • ICMP error messages include the complete IP header and the first 8 bytes of the payload (typically: UDP, TCP) Example: ICMP Port Unreachable • RFC 792: If, in the destination host, the IP module cannot deliver the datagram because the indicated protocol module or process port is not active, the destination host may send a destination unreachable message to the source host. • Scenario: No process is waiting at port 80 Client Server 48 Common ICMP Error messages Type Code Description 3 0–5 Destination Notification that an IP datagram could not be unreachable forwarded and was dropped. The code field contains an explanation. (traceroute) 5 0–3 Redirect Informs about an alternative route for the datagram and should result in a routing table update. The code field explains the reason for the route change. 11 0, 1 Time exceeded Sent when the TTL field has reached zero (Code 0) or when there is a timeout for the reassembly of segments (Code 1) (traceroute) 12 0, 1 Parameter problem Sent when the IP header is invalid (Code 0) or when an IP header option is missing (Code 1) 49 Some subtypes of the “Destination Unreachable” Code Description Reason for Sending 0 Network Unreachable No routing table entry is available for the destination network. 1 Host Unreachable Destination host should be directly reachable, but does not respond to ARP Requests. 2 Protocol Unreachable The protocol in the protocol field of the IP header is not supported at the destination. 3 Port Unreachable The transport protocol at the destination host cannot pass the datagram to an application. 4 Fragmentation Needed and DF Bit Set IP datagram must be fragmented, but the DF bit in the IP header is set. (MTU discovery) 5 Source route failed The source routing option has failed. 50 ICMP applications • Ping • Traceroute • MTU discovery Ping: Echo Request and Reply Host or Router Host or router Type (= 8 or 0) Code (=0) identifier Checksum sequence number 32-bit sender timestamp Optional data • Ping’s are handled directly by the kernel • Each Ping is translated into an ICMP Echo Request • The Ping’ed host responds with an ICMP Echo Reply 52 Traceroute • xwy@linux20$ traceroute -n 18.26.0.1 – traceroute to 18.26.0.1 (18.26.0.1), 30 hops max, 60 byte packets – 1 152.3.141.250 4.968 ms 4.990 ms 5.058 ms – 2 152.3.234.195 1.479 ms 1.549 ms 1.615 ms – 3 152.3.234.196 1.157 ms 1.171 ms 1.238 ms – 4 128.109.70.13 1.905 ms 1.885 ms 1.943 ms – 5 128.109.70.138 4.011 ms 3.993 ms 4.045 ms – 6 128.109.70.102 10.551 ms 10.118 ms 10.079 ms – 7 18.3.3.1 28.715 ms 28.691 ms 28.619 ms – 8 18.168.0.23 27.945 ms 28.028 ms 28.080 ms – 9 18.4.7.65 28.037 ms 27.969 ms 27.966 ms – 10 128.30.0.246 27.941 ms * * Traceroute algorithm • Sends out three UDP packets with TTL=1,2,…,n, destined to a high port • Routers on the path send ICMP Time exceeded message with their IP addresses until n reaches the destination distance • Destination replies with port unreachable ICMP messages Fragmentation and Reassembly (not required for Lab 2) Different networks have different Maximum Transmission Units (MTUs) IP Fragmentation and Reassembly • What if the size of an IP datagram exceeds the MTU? IP datagram is fragmented into smaller units. • What if the route contains networks with different MTUs? FDDI Ring Host A MTUs: FDDI: 4352 Ethernet Router Host B Ethernet: 1500 • Fragmentation: • IP router splits the datagram into several datagrams Design question: Where is Fragmentation/reassembly done? • Fragmentation can be done at the sender or at intermediate routers • The same datagram can be fragmented several times. • Reassembly of original datagram is only done at destination hosts !! (why?) IP datagram H Fragment 2 Router H2 Fragment 1 H1 What’s involved in Fragmentation? • The following fields in the IP header are involved: version header length DS Identification time-to-live (TTL) • • total length (in bytes) ECN 0 DM F F protocol Fragment offset header checksum Identification – When a datagram is fragmented, the identification is the same in all fragments – Used to reassemble the original packet Flags – DF bit is set: datagram cannot be fragmented and must be discarded if MTU is too small • ICMP sent – MF bit: • 1: this is not the last fragment • 0: last fragment 59 What’s involved in Fragmentation? • The following fields in the IP header are involved: version header length DS Identification time-to-live (TTL) • Fragment protocol total length (in bytes) ECN 0 DM F F Fragment offset (13-bit) header checksum offset • Offset of the payload of the current fragment in the original datagram in units of 8 bytes • Why? • Because the field is only 13 bits long, while the total length is 16 bits. • Total length • Total length of the current fragment Example of Fragmentation • A datagram with size 2400 bytes must be fragmented according to an MTU limit of 1000 bytes Header length: 20 Total length: 2400 Identification: 0xa428 DF flag: 0 MF flag: 0 Fragment offset: 0 Header length: 20 Total length: 448 Identification: 0xa428 DF flag: 0 MF flag: 0 Fragment offset: 244 IP datagram Header length: 20 Header length: 20 Total length: 996 Total length: 996 Identification: 0xa428 Identification: 0xa428 DF flag: 0 DF flag: 0 MF flag: 1 MF flag: 1 Fragment offset: 122 fragment offset: 0 Fragment 3 MTU: 4000 Fragment 2 Fragment 1 MTU: 1000 Router 61 Determining the length of fragments • Maximum payload length = 1000 – 20 = 980 bytes • Offset specifies the bytes in multiple of 8 bytes. So the payload must be a multiple of 8 bytes. • 980 - 980 % 8 = 976 (the largest number that is less than 980 and divisible by 8) • The payload for the first fragment is 976 and has bytes 0 ~ 975 of the original IP datagram. The offset is 0. • The payload for the second fragment is 976 and has bytes 976 ~ 1951 of the original IP datagram. The offset is 976 / 8 = 122. • The pay load of the last fragment is 2400 – 976 * 2 = 428 bytes and has bytes 1952 ~ 2400 of the original IP datagram. The offset is 244. • Total length of three fragments: 996 + 996 + 448 = 2440 > 2400 – Why? – Two additional IP headers. 62 Path MTU discovery • Fragmentation slows down the router • should be done by end hosts • How does a sender know the MTU of a path? – A host only knows the MTU of its links • Solution – send large packets with DF set – If receive ICMP Fragmentation needed messages, reduce maximum segment size Summary • IP addressing • IP forwarding – Forwarding algorithm – Fragmentation • Address resolution protocol (ARP) • Internet Control Message protocol (ICMP) – Error reporting • Next: DHCP, NAT, IPv6, VPN and Tunneling