Internetworking Lecture 2: Basic routing, ARP, and basic IP Basic Routing • Literature: Delivery, Forwarding, and Routing of IP packets – Forouzan, TCP/IP Protocol Suite: Ch 6 - 8 lecture_2 lecture_2 Connection-oriented vs Connectionless • Connection-Oriented Services Direct vs Indirect delivery • Direct delivery – The network layer establishes a connection between a source and a destination – The final destination is connected to the same physical network as the sender. – Packets are sent along the connection. – IP destination address and local interface has same netmask – The decision about the route is made once at connection establishment A indirect delivery – Map IP address to physical address: ARP – Routers/switches in connection-oriented networks are stateful • Connectionless Services R1 • Indirect delivery indirect delivery – From router to router, last delivery is direct R2 – Destination address and routing table: Routing – The network layer treats each packet independently – Route lookup for each packet (routing table) A – IP is connectionless indirect delivery B R3 direct delivery direct delivery – IP routers are stateless direct delivery B lecture_2 lecture_2 Next-hop Routing R Routing Table Search - Classful • How do you hold information about route from A to all other hosts? – A Æ R1 Æ R2 Æ R3 Æ B • Determine class from destination address • Search within class • Store table of host/network address and nexthop in every node • Routing table often divided into ”buckets” N1, N2, R1 N3, R1 N4, R1 N1, N2, R2 N3, R2 N4, R2 A N1, R1 N2, R4 N3, R4 N4, R3 R1 R2 N1, R2 N2, R2 N3, R2 N4, - N1, R3 N2, R3 N3, R3 N4, - Class A bucket B R3 N1 destination destination IP IP address address R4 N2 lecture_2 Class B bucket N4 C Class C bucket N3 D E F lecture_2 1 Routing Table Search - Classless • Longest prefix first • Conceptually: divide table in 32 ”buckets” - one for each netmask length and match destination with longest prefixes first • SW algorithms: tree, binary trees, tries (different data structures) • HW support: TCAMs – Content Addressable Memory • More aggregation leads to smaller routing tables • Some mechanisms lead to increased fragmentation Netid – # of available addresses decreasing Æ distribution of long prefixes (/24) ... 31 – more specific networks (with longer prefixes) Æ less specific networks (with shorter prefixes) – Effective address assignment policy Netid 1 • The basic idea with IP addressing (and CIDR) is to aggregate addresses • The ideal situation is to have domains publishing (exporting) only a small set of prefixes Masklen 0 Routing Tables – Multihoming - sites having several subnetworks – from different providers • Current routing tables (# of entries) is ~150000 (~60% are /24 prefixes) 32 destination destination IP IP address address lecture_2 lecture_2 Routing Table – Common Fields IP Router Model Mask Network Next-hop Address Address Interface Flags Reference Use count ........ .............. .............. ............... ......... ................ IP Routing RIB Routing Information Base IP Forwarding FIB Forwarding Information Base ...... • Mask – netmask applied for the entry [255.255.255.0] • Network address – destination network [192.168.15.0] • Next-hop address – next router [130.237.15.1] • • • • Control Plane Interface – outgoing interface [eth0] Flags – status/info [U(p), G(ateway), H(ost-specific)...] Reference count – # of users using this route Use – # of packets transmitted for this destination lecture_2 Data Plane Ethernet Interface FDDI Interface Router • A Router can be partitioned into a dataplane and a controlplane – The dataplane is fast and special purpose – handles packet forwarding in real-time lecture_2 – The control plane is general purpose– handles routing in the background IP Forwarding • A router switches packets between network interfaces • Extracts header information from the incoming datagram – Destination IP address • Makes a lookup in the forwarding information base by making a match against networks ARP – Next-Hop IP address, – Outgoing interface,... • Modifies datagram header Mapping between logical IP addresses and physical addresses • Sends on outgoing interface • But a router performs much more than IPv4 lookup – Access lists, filtering – Traffic management – Other protocols: Bridging, MPLS, IPv6, ... lecture_2 lecture_2 2 Logical and Physical Addresses Communicating with a next-hop bsdi bsdi Name: bsdi bsdi Name: MAC addr: IP addr: sun sun 8:0:20:3:f6:42 140.252.13.33 0:0:c0:6f:2d:40 140.252.13.35 sun sun svr4 svr4 svr4 svr4 MAC addr: IP addr: 0:0:c0:c2:9b:26 140.252.13.34 8:0:20:3:f6:42 140.252.13.33 0:0:c0:6f:2d:40 140.252.13.35 0:0:c0:c2:9b:26 140.252.13.34 • Problem: bsdi wants to send an IP packet to svr4 – No routers between sender and receiver – directly connected host • Getting the IP address of svr4 • A host’s network interface card (NIC) has: – Static configuration – a hardcoded, physical MAC address – DNS: Name Æ Address (Later lectures) • e.g., 48-bit Ethernet address • Getting the MAC address of svr4 – a configured, logical IP address – Static configuration – a configured name – Dynamic Address Resolution - ARP lecture_2 lecture_2 ARP - Address Resolution Protocol ARP Example bsdi intends to send an IP datagram to svr4 (140.252.13.34) • Problem: we are to send a packet to an interface on a directly attached network - we know the IP-address of the destination but not the MAC address. • Idea: Broadcast a request - “On which MAC address can IP-address X be reached?”. – ARP request 1. Send an ARP request on broadcast to all stations: – who has 140.252.13.34? 2. svr4 identifies it as its own address and sends an ARP reply on unicast back to bsdi – I have 140.252.13.34 and its mac address is 0:0:c0:c2:9b:26 3. bsdi sends the datagram to svr4 using the resolved mac address 4. Note that sun and svr4 can update their ARP caches with bsdi! • The host/router with the destination replies with its MAC address bsdi – ARP reply • This is the basic functionality of ARP svr4 1 3 lecture_2 sun 2 lecture_2 ARP Packet ARP Optimizations • Two length fields • ARP cache – Hardware (Ethernet address length: 6) – Resolved addresses are saved in a cache. – Protocol (IP address length: 4) • Sender Ethernet and IP address – Works because of correlations in use of addresses • Target Ethernet and IP address – Limits ARP traffic • ARP is encapsulated directly into a data link frame (e.g., Ethernet) hw prot hw prot type type len len 2 2 1 1 • Entries in the ARP cache times out • Network is snooped hardware size op sender Ethernet addr sender IP addr target Ethernet addr 2 6 4 6 target IP addr 4 – Since the sender’s Internet-to-Physical address binding is in every ARP broadcast; (all) receivers update their caches before processing an ARP packet protocol size lecture_2 lecture_2 3 ARP Timeouts Indirect/Direct Delivery and ARP • If there is no reply to an ARP request • A sends an IP packet to B through router R • Ethernet links to connect A and B to R – The machine is down or not responding – Request was lost, therefore retry (but not too often) – Eventually give up (When?) IP A IP R IP B • ARP cache timeouts MAC a – completed entry in 20 minutes (BSD Unix) MAC r1 MAC r2 MAC b – incomplete entry in 3 minutes (BSD Unix) IP Header Ethernet Header lecture_2 Src: A, Dst: B Src: a, Dst: r1 Src: r2, Dst: b Indirect delivery Direct delivery lecture_2 Proxy ARP (RFC 826) • Proxy ARP - someone responds to ARP requests on someone else’s behalf • Example: sun is hidden behind netb: Netb responds on behalf of sun. Gratuitous ARP • Host sends an ARP request of its own address – Generally done at boot time to inform other machines of its address (possibly a new address) - they get a chance to update their cache entries immediately gemini • Allows sub-networks to be hidden arp request for 140.252.1.129 arp reply 140.252.1.183 netb slip 140.252.1.129 sun lecture_2 – Lets hosts check to see if there is another machine claiming the same address ⇒ “duplicate IP address sent from Ethernet address a:b:c:d:e:f” • As noted before, hosts have paid the price by servicing the broadcast, so they can cache this information - this is one of the ways the proxy ARP server could know the mapping • Note that faking that you are another machine can be used to provide failover for servers lecture_2 RARP: Reverse Address Resolution Protocol (RFC 903) • How to get your own IP address, when all you know is your link address • Necessary if you don’t have a disk or other stable storage • RARP request - broadcast to every host on the network (i.e., EtherDST=0xFFFFFF), TYPE=0x8035 • RARP server: “I know that address!” and sends an RARP reply • Source host - receives the RARP reply, and now knows its own IP addr • RARP packet has exactly the same format as ARP packet • BOOTP/DHCP is a more powerful alternative to RARP lecture_2 Src: A, Dst: B RARP Server • Someone has to know the mappings - quite often this is in the file “/etc/ethers” • Since this information is generally in a file, RARP servers are generally implemented as user processes • Unlike ARP responses which are generally part of the TCP/IP implementation (often part of the kernel) • How does the process get the packets - since they aren’t IP and won’t come across a socket? – PCAP – Packet Capture (used by Tcpdump/Ethereal) – BPF – Berkeley Packet Filter (older) • RARP requests are sent as hardware level broadcasts therefore are not forwarded across routers lecture_2 4 Issues in IP • Following the end2end argument, only the absolutely necessary functionality is in IP – Best Effort Service: Unreliable and Connectionless – Application or Transport layer handles reliability IP • How to deliver datagrams over multiple links (hops) in an internetwork? Basic functionality and the IP packet header – Addressing – Best-effort delivery service • Forwarding of packets from one link to another – Error handling lecture_2 lecture_2 IPv4 Header – RFC 791 • Version • HLEN – Header Length • Type of Service • Total Length • Fragmentation – – • Protocol – • Version 4 (RFC 791) – IPv4 • Version 5 (RFC 1190) Limits lifetime – ST-II - Multimedia streaming protocol • Version 6 (RFC 2460) Higher level protocol • Header checksum • IP Addresses • Options – – Stems from when TCP was being split into one component handling hop-by-hop communication (IP) and one component handling endto-end communication (TCP). IEN 21 1 February 1978. ID, Flags, Offset TTL – Time To Live – • Version 3 (IEN 21) Header + Payload • The Version Field – IPv6 Source, Destination ©The McGraw-Hill Companies, Inc., 2000 lecture_2 lecture_2 The Length Fields • Header Length (4 bits) – Size of IPv4 header including options. – Expressed in number of 32-bit words (4-byte words) – Min is 5 words (=20 bytes) – Max is 15 words (=60 bytes) – limited size Æ limited use • Total Length (16 bits) The Type of Service Field • Type of Service (ToS): 8 bits • Intended as a field for specifying Quality of Service on a per-packet basis. • Few applications set the TOS field. – Unless an added cost/policy check/… associated with usage of a precedence level - it is very likely going to be abused. • Long history of experimental use – Total length of datagram including header. – RFC 791 – original – If datagram is fragmented: length of fragment. – RFC 1122, 1349, 1455 modified the meaning of the ToS field – Expressed in bytes. – Current proposal: RFC 2474 • Max: 65535 bytes. (This is IPs length limit) • Many systems only accept 8K bytes. lecture_2 • Differentiated Services – Early Congestion Notification (ECN): RFC 2481, 3168 lecture_2 5 The ToS Byte – Original proposal DSField – Current Proposal Bit 0 Bit 0 Precedence Bit 7 DSCP Bit 7 ECN TOS • Differentiated Services (DiffServ) proposes to use 6 of these bits to provide 64 priority levels - calling it the Differentiated Service (DS) field • Original Proposal – RFC 791 – RFC 2474 – Bits 0-6: Differentiated Services CodePoint (DSCP) • Bits 0-2: Precedence – Defines priority e.g., when packets must be dropped • The DSCP is set when entering an area and determines the QoS handling of the IP datagram in the routers within that area – Scheduling • Bits 3-5: TOS – Bit 3: 0 = Normal Delay, 1 = Low Delay – Bit 4: 0 = Normal Throughput, 1 = High Throughput – Bit 5: 0 = Normal Reliability, 1 = High Reliability. – Shaping – Queue Dropping • Explicit Congestion Avoidance (ECN) – ECN Capable Transport (ECT) – Congestion Experienced (CE) lecture_2 lecture_2 Fragmentation – MTU Fragmentation cont’d • Physical networks maximum frame size – MTU Maximum Transfer Unit. • A host or router transmitting datagram larger than MTU of link must divide it into smaller pieces - fragments. • Both hosts and router may fragment ©The McGraw-Hill Companies, Inc., 2000 – But only destination host reassemble! – Each fragment routed separately as independent datagram • If the IP datagram is larger than the MTU of the link layer, it must be divided into several pieces to fit the MTU – this is called fragmentation • In effect, only datagram service (e.g. UDP) lecture_2 lecture_2 – TCP uses 576 byte MTU or path MTU discovery • 3 fields of the IP header concerns fragmentation The Fragmentation Fields Fragmentation Example – Offset • Identification: 16 bits – ID + src IP addr uniquely identifies each datagram sent by a host – The ID is copied to all fragments of a datagram upon fragmentation • Flags: 3 bits – RF (Reserved Fragment) – for future use (set to 0) – DF (Dont Fragment). • Set to 1 if datagram should not be fragmented. • If set and fragmentation needed, datagram will be discarded and an error message will be returned to the sender – MF (More Fragments) • Set to 1 for all fragments, except the last. • Fragmentation Offset: 13 bits – 8-byte units: (ipÆip_frag << 3) ©The McGraw-Hill Companies, Inc., 2000 – Shows relative position of a fragment with respect to the whole datagram lecture_2 lecture_2 6 Fragmentation Example – Detailed MTU = 1500 bytes IPv4 hdr id=0, DF=0 UDP hdr 20 bytes 8 bytes The TTL field • TTL - Time To Live: 8 bits • Limit the lifetime of a datagram - avoid infinite loops Data • A router receiving a TTL>1 decrements the TTL and forwards it 1473 bytes • A TTL <= 1 shall not be forwarded – ICMP “time exceeded” is returned to the sender (later slide) IPv4 hdr id=n, DF=0 MF=1, off=0 UDP hdr 20 bytes IPv4 hdr id=n, DF=0 MF=0, off=185 Data 8 bytes 1472 bytes 20 bytes • Recommended value is 64 Data 1 byte Offset = 185 Æ 185x8 = 1480 bytes lecture_2 • Should really be called Hop Limit (as in IPv6) – Historically: Every router holding a datagram for more than 1 second should decrement the TTL by the number of seconds. lecture_2 The Protocol Field Header Checksum • Ensures integrity of header fields – Hop-by-hop (not end-to-end) – The header fields must be correct for proper and safe processing. • Demultiplexing to higher layers • Assigned by IANA decimal keyword protocol 1 ICMP Internet Control Message – The payload is not covered. • Other checksums – Link-level CRC. IP assumes a strong L2 checksum/CRC. Hop-by-hop. 4 IP IP in IP (encapsulation) – Internet Assigned Numbers Authority 6 TCP Transmission Control • A subset (out of 134) assigned 17 UDP User Datagram 41 IPv6 IPv6 in IPv4 – Treat header as sequence of 16-bit integers. 46 RSVP Reservation Protocol – Add them together – L4 checksums, eg TCP/ICMP/UDP checksums cover payload. End-to-end. • Internet Checksum Algorithm, RFC 1071 – Take the one’s complement of the result. lecture_2 lecture_2 Summary • Basic Routing – Connectionless, next-hop routing – Routing tables: RIBs and FIBs – Longest prefix match • Address resolution – ARP – RARP • IP – Internet Protocol – Basic functionality – Header fields lecture_2 7