CS 356: Computer Network Architectures Lecture 10: IP forwarding Xiaowei Yang

advertisement
CS 356: Computer Network
Architectures
Lecture 10: IP forwarding
Xiaowei Yang
xwy@cs.duke.edu
Overview
• IP addressing
• IP forwarding
– Forwarding algorithm
– Fragmentation
• Address resolution protocol (ARP)
• Internet Control Message protocol (ICMP)
– Error reporting
Global IP addresses
What is an IP Address?
• An IP address is a unique global identifier for a
network interface
– An IP address uniquely identifies a network location
• Routers forwards a packet based on the destination
address of the packet
• Uniqueness ensures global reachability
IP Addressing
• Addressing defines how addresses are
allocated and the structure of addresses
• IPv4 (32-bit)
– Classful IP addresses (obsolete)
– Classless inter-domain routing (CIDR) (RFC 854,
current standard)
• IP Version 6 addresses (128-bit)
An IPv4 address is often written in dotted
decimal notation
• Each byte is identified by a decimal number in
the range [0…255]:
10000000
10001111
10001001
10010000
1st Byte
2nd Byte
3rd Byte
4th Byte
= 128
= 143
= 137
= 144
128.143.137.144
Structure of an IP address
31
0
network prefix
host number
• An IP address encodes both a network number
(network prefix) and an interface number (host
number).
– network prefix identifies a network
– the host number identifies a specific host (actually, an
interface on the network).
• The structure is designed to improve the scalability
of routing
– Scales better than flat addresses
How long is a network prefix?
• Before 1993: The network prefix is implicitly
defined (class-based addressing)
• After 1993: The network prefix is indicated by
a netmask
Before 1993: Class-based addressing
• The Internet address space was divided up into
classes:
– Class A: Network prefix is 8 bits long
– Class B: Network prefix is 16 bits long
– Class C: Network prefix is 24 bits long
– Class D is multicast address
– Class E is reserved
Classful IP Addresses (Until 1993)
• Each IP address contained a key which
identifies the class:
– Class A: IP address starts with “0”
– Class B: IP address starts with “10”
– Class C: IP address starts with “110”
– Class D: IP address starts with “1110”
– Class E: IP address starts wit “11110”
The old way: Internet Address
Classes
bit # 0
Class A
1
7 8
31
0
Network Prefix
Host Number
8 bits
24 bits
bit # 0 1 2
Class B
10
15 16
network id
110
host
Network Prefix
Host Number
16 bits
16 bits
bit # 0 1 2 3
Class C
31
23 24
network id
31
host
Network Prefix
Host Number
24 bits
8 bits
The old way: Internet Address
Classes
bit # 0 1 2 3 4
Class D
1110
31
multicast group id
bit # 0 1 2 3 4 5
Class E
11110
31
(reserved for future use)
Problems with Classful IP Addresses
• Fast growing routing table size
– Each router must have an entry for every network prefix
– ~ 221 = 2,097,152 class C networks
– In 1993, the size of routing tables started to outgrow the capacity of
routers
• Local admins must request another network number
before installing a new network at their site
Solution: Classless Inter-domain routing (CIDR)
• Network prefix is of variable length
– No rigid class boundary
• Addresses are allocated hierarchically
• Routers aggregate multiple address prefixes
into one routing entry to minimize routing
table size
Hierarchical IP Address Allocation
Internet Assigned Numbers Authority
Regional Internet Registries
(Five of them)
Internet Service Providers
• American Registry for Internet Numbers
(ARIN)
• RIPE, APNIC, LACNIC, AfriNIC
CIDR network prefix has variable length
128
Addr 10000000
255
Mask 11111111
143
10001111
255
11111111
137
10001001
255
1111111
144
10010000
0
00000000
• A network mask specifies the number of bits
used to identify a network in an IP address.
CIDR notation
• CIDR notation of an IP address:
– 128.143.137.144/24
– /24 is the prefix length. It states that the first 24 bits are the
network prefix of the address (and the remaining 8 bits are
available for specific host addresses)
• CIDR notation can nicely express blocks of addresses
– An address block
[128.195.0.0, 128.195.255.255]
can be represented by an address prefix
128.195.0.0/16
– How many IP addresses are there in a /x address block?
• 2 (32-x)
IP Forwarding
Forwarding of IP datagrams
• There are two distinct processes to delivering
IP datagrams:
1. Forwarding (data plane): How to pass a
packet from an input interface to the output
interface?
2. Routing (control plane): How to find and
setup the forwarding tables?
Key points
• Each IP datagram contains the IP destination address
• The “network part” of an IP address identifies a
single physical network
• All hosts and routers that share the same network part
of their address are connected to the same physical
network
• Each physical network on the Internet has at least one
router that connects this network to other physical
networks
Forwarding algorithm
Is dst on the same
physical network?
No
Forward to next-hop
router
Yes
Deliver the packet to the
Network directly
1. How to determine
whether a dst is on the
same physical network?
2. How to determine the
next hop router?
– Routing
Detailed forwarding algorithm
• If (networkNum == networkNum of one of my
interfaces) then
– Deliver packet over the interface
• Else
– if (NetworkNum is in my forwarding table) then
• Deliver to the NextHop router
– Else
• Deliver packet to the default router
Forwarding table lookup
• When a router or host needs to
transmit an IP datagram, it
performs a routing table lookup
• Forwarding table lookup: Use
the IP destination address as a key
to search the routing table
• Result of the lookup is the IP
address of a next hop router,
and/or the name of a network
interface
Destination
address
Next hop/
interface
network prefix
or
host IP address
or
loopback address
or
default route
IP address of
next hop router
or
Name of a
network
interface
Type of forwarding table entries
• Network route
– Destination addresses is a network address (e.g., 10.0.2.0/24)
– Most entries are network routes
• Host route
– Destination address is an interface address (e.g., 10.0.1.2/32)
– Used to specify a separate route for certain hosts
• Default route
– Used when no network or host route matches
• Loopback address
– Routing table for the loopback address (127.0.0.1)
– The next hop lists the loopback (lo0) interface as outgoing interface
Simplified forwarding algorithm
• Observation:
– A directly physical network can be an entry in the
forwarding table
– A default route can be an entry
• Simplified algorithm
1. Look up destination algorithm in the forwarding
table using longest prefix match
2. Forward the packet to the next hop indicated by
the matched entry
Longest prefix match
•
Longest Prefix Match: Search for the
forwarding table entry that has the longest
= of the destination IP
match with the prefix
address
1.
2.
Search for a match on all 32 bits
Search for a match for 31 bits
…..
32. Search for a match on 0 bits
Host route, loopback entry
 32-bit prefix match
Default route is represented as 0.0.0.0/0
 0-bit prefix match
128.143.71.21
Destination addressNext hop
10.0.0.0/8
128.143.0.0/16
128.143.64.0/20
128.143.192.0/20
128.143.71.0/24
128.143.71.55/32
0.0.0.0/0 (default)
eth0
R2
R3
R3
R4
R3
R5
The longest prefix match for
128.143.71.21 is for 24 bits
with entry 128.143.71.0/24
Datagram will be sent to R4
eth0
•
•
•
•
Ex: H1  H2
Nexthop: eth0
H1  H6
Nexthop: R1
• Q: How an IP packet is
sent from H1 to H2 or
H1 to R6?
– Encapsulated into an
Ethernet frame
How to find out a host’s Ethernet
address after knowing its IP address?
 Address Resolution Protocol
ARP and RARP
• Note:
– The Internet is based on IP addresses
– Data link protocols (Ethernet, FDDI, ATM) may have different (MAC)
addresses
• The ARP and RARP protocols perform the translation between
IP addresses and MAC layer addresses
• We will discuss ARP for broadcast LANs, particularly
Ethernet LANs
– RFC 826
• RARP obsolete
IP address
(32 bit)
ARP
RARP
Ethernet MAC
address
(48 bit)
Address Translation with ARP
ARP Request:
Argon broadcasts an ARP request to all
stations on the network: “What is the
hardware address of 128.143.137.1?”
Argon
128.143.137.144
00:a0:24:71:e4:44
ARP Request:
What is the MAC address
of 128.143.71.1?
Router137
128.143.137.1
00:e0:f9:23:a8:20
Address Translation with ARP
ARP Reply:
Router 137 responds with an ARP Reply
which contains the hardware address
Argon
128.143.137.144
00:a0:24:71:e4:44
Router137
128.143.137.1
00:e0:f9:23:a8:20
ARP Reply:
The MAC address of 128.143.71.1
is 00:e0:f9:23:a8:20
ARP Packet Format
Ethernet II header
Destination
address
Source
address
Type
0x8060
6
6
2
ARP Request or ARP Reply
28
10
Hardware type (2 bytes)
Hardware address
length (1 byte)
Padding
CRC
4
Protocol type (2 bytes)
Protocol address
length (1 byte)
Operation code (2 bytes)
Source hardware address*
Source protocol address*
Target hardware address*
Target protocol address*
* Note: The length of the address fields is determined by the corresponding address length fields
• Hardware type: ether (1)
• Prototype: taken from the set ether_type
– IP: 0x0800
• Opcode
– ARP request: 1
– ARP reply: 2
• Check RFC for implementation details
Example
• ARP Request from Argon is broadcasted:
– Source addr in Ethernet header: 00:a0:24:71:e4:44
– Destination addr in Ethernet header: FF:FF:FF:FF:FF:FF
Source hardware address:
00:a0:24:71:e4:44
Source protocol address:
128.143.137.144
Target hardware address:
00:00:00:00:00:00
Target protocol address:
128.143.137.1
• ARP Reply from Router137 is unicasted:
– Source addr: 00:e0:f9:23:a8:20
– Dst addr: 00:a0:24:71:e4:44
Source hardware address:
00:e0:f9:23:a8:20
Source protocol address:
128.143.137.1
Target hardware address:
00:a0:24:71:e4:44
Target protocol address:
128.143.137.144
ARP Cache
• Since sending an ARP request/reply for each IP datagram is
inefficient, hosts maintain a cache (ARP Cache) of current
entries. The entries expire after a time interval.
• Contents of the ARP Cache:
(128.143.71.37) at 00:10:4B:C5:D1:15 [ether] on eth0
(128.143.71.36) at 00:B0:D0:E1:17:D5 [ether] on eth0
(128.143.71.35) at 00:B0:D0:DE:70:E6 [ether] on eth0
(128.143.136.90) at 00:05:3C:06:27:35 [ether] on eth1
(128.143.71.34) at 00:B0:D0:E1:17:DB [ether] on eth0
(128.143.71.33) at 00:B0:D0:E1:17:DF [ether] on eth0
Putting it together
IP Forwarding Implementation
Logistics
IP Output
Put on IP
input queue
Yes
Yes
IP Input
IP destination = multicast
or broadcast ?
Put on IP
input queue
No
IP destination of packet
= local IP address ?
loopback
Driver
Next
slide
No: get MAC
address with
ARP
Ethernet
Driver
ARP
ARP
Packet
IP datagram
demultiplex
Ethernet Frame
Ethernet
Lab2 input
Routing
Protocol
Static
routing
UDP
TCP
Demultiplex
Yes
routing
table
Lookup next
hop
Yes
IP forwarding
enabled?
No
Destination
address local?
No
IP module
Send
datagram
Discard
Data Link Layer
Input
queue
ICMP
IP Forwarding Logistics (Lab 2)
1.
Sanity-check
•
2.
Update header
•
3.
Decrement the TTL by 1, and compute the packet checksum over the modified
header.
Next hop IP lookup
•
4.
Find out which entry in the routing table has the longest prefix match with the
destination IP address.
Next hop MAC lookup
•
5.
meets minimum length and has correct checksum
Check the ARP cache for the next-hop MAC address corresponding to the next-hop
IP. If it's there, send it. Otherwise, send an ARP request for the next-hop IP (if one
hasn't been sent within the last second), and add the packet to the queue of packets
waiting on this ARP request.
Error reporting
Error reporting
• Internet Control Message Protocol (ICMP)
– Ill-formatted packets
– TTL == 0
– ARP receives no reply
– No protocol or application running at the destination
– No routing table match
–…
Location in the protocol stack
• The IP (Internet Protocol) relies on several
other protocols to perform necessary control
and routing functions:
• Control functions (ICMP)
• Multicast signaling (IGMP)
• Setting up forwarding tables (RIP, OSPF, BGP, PIM, …)
RIP
ICMP
OSPF
BGP
IGMP
PIM
Routing
Control
41
Overview
• The Internet Control Message Protocol (ICMP) is
a helper protocol that supports IP with facility for
– Error reporting
– Simple queries
– ICMP messages are encapsulated as IP datagrams:
IP header
ICMP message
IP payload
42
ICMP message format
bit # 0
7 8
type
15 16
code
23
24
31
checksum
additional information
or
0x00000000
4 byte header:
• Type (1 byte): type of ICMP message
• Code (1 byte): subtype of ICMP message
• Checksum (2 bytes): similar to IP header checksum. Checksum is
calculated over the entire ICMP message
If there is no additional data, there are 4 bytes set to zero.
 each ICMP message is at least 8 bytes long
43
ICMP Query message
ICMP query:
• Request sent by host to a router or host
• Reply sent back to querying host
Example of ICMP Queries
Type/Code:
Description
8/0
0/0
Echo Request
Echo Reply
13/0
14/0
Timestamp Request
Timestamp Reply
The ping command
uses Echo Request/
Echo Reply
Extension (RFC 1256):
10/0
9/0
Router Solicitation
Router Advertisement
45
ICMP Error message
• ICMP error messages report error conditions
• Typically sent when a datagram is discarded
• Error message is often passed from ICMP to the
application program
46
ICMP Error message
ICMP Message
from IP datagram that triggered the error
IP header
type
ICMP header
code
IP header
8 bytes of payload
checksum
Unused (0x00000000)
• ICMP error messages include the complete IP header
and the first 8 bytes of the payload (typically: UDP,
TCP)
Example: ICMP Port Unreachable
• RFC 792: If, in the destination host, the IP module cannot
deliver the datagram because the indicated protocol module or
process port is not active, the destination host may send a
destination unreachable message to the source host.
• Scenario:
No process
is waiting
at port 80
Client
Server
48
Common ICMP Error messages
Type Code
Description
3
0–5 Destination Notification that an IP datagram could not be
unreachable forwarded and was dropped. The code field
contains an explanation. (traceroute)
5
0–3 Redirect
Informs about an alternative route for the
datagram and should result in a routing table
update. The code field explains the reason for
the route change.
11
0, 1 Time
exceeded
Sent when the TTL field has reached zero
(Code 0) or when there is a timeout for the
reassembly of segments (Code 1) (traceroute)
12
0, 1 Parameter
problem
Sent when the IP header is invalid (Code 0) or
when an IP header option is missing (Code 1)
49
Some subtypes of the “Destination Unreachable”
Code
Description
Reason for Sending
0
Network
Unreachable
No routing table entry is available for the destination
network.
1
Host
Unreachable
Destination host should be directly reachable, but
does not respond to ARP Requests.
2
Protocol
Unreachable
The protocol in the protocol field of the IP header is
not supported at the destination.
3
Port
Unreachable
The transport protocol at the destination host cannot
pass the datagram to an application.
4
Fragmentation
Needed
and DF Bit Set
IP datagram must be fragmented, but the DF bit in the
IP header is set. (MTU discovery)
5
Source route
failed
The source routing option has failed.
50
ICMP applications
• Ping
• Traceroute
• MTU discovery
Ping: Echo Request and Reply
Host
or
Router
Host
or
router
Type
(= 8 or 0)
Code
(=0)
identifier
Checksum
sequence number
32-bit sender timestamp
Optional data
• Ping’s are handled directly by the kernel
• Each Ping is translated into an ICMP Echo Request
• The Ping’ed host responds with an ICMP Echo Reply
52
Traceroute
• xwy@linux20$ traceroute -n 18.26.0.1
– traceroute to 18.26.0.1 (18.26.0.1), 30 hops max, 60 byte
packets
– 1 152.3.141.250 4.968 ms 4.990 ms 5.058 ms
– 2 152.3.234.195 1.479 ms 1.549 ms 1.615 ms
– 3 152.3.234.196 1.157 ms 1.171 ms 1.238 ms
– 4 128.109.70.13 1.905 ms 1.885 ms 1.943 ms
– 5 128.109.70.138 4.011 ms 3.993 ms 4.045 ms
– 6 128.109.70.102 10.551 ms 10.118 ms 10.079 ms
– 7 18.3.3.1 28.715 ms 28.691 ms 28.619 ms
– 8 18.168.0.23 27.945 ms 28.028 ms 28.080 ms
– 9 18.4.7.65 28.037 ms 27.969 ms 27.966 ms
– 10 128.30.0.246 27.941 ms * *
Traceroute algorithm
• Sends out three UDP packets with
TTL=1,2,…,n, destined to a high port
• Routers on the path send ICMP Time exceeded
message with their IP addresses until n reaches
the destination distance
• Destination replies with port unreachable
ICMP messages
Fragmentation and Reassembly
(not required for Lab 2)
Different networks have different
Maximum Transmission Units (MTUs)
IP Fragmentation and Reassembly
• What if the size of an IP datagram exceeds the MTU?
IP datagram is fragmented into smaller units.
• What if the route contains networks with different MTUs?
FDDI
Ring
Host A
MTUs:
FDDI: 4352
Ethernet
Router
Host B
Ethernet: 1500
• Fragmentation:
• IP router splits the datagram into several datagrams
Design question: Where is
Fragmentation/reassembly done?
• Fragmentation can be done at the sender or at
intermediate routers
• The same datagram can be fragmented several
times.
• Reassembly of original datagram is only done at
destination hosts !! (why?)
IP datagram
H
Fragment 2
Router
H2
Fragment 1
H1
What’s involved in Fragmentation?
•
The following fields in the IP header are involved:
version
header
length
DS
Identification
time-to-live (TTL)
•
•
total length (in bytes)
ECN
0
DM
F F
protocol
Fragment offset
header checksum
Identification
– When a datagram is fragmented, the identification is the same in all fragments
– Used to reassemble the original packet
Flags
– DF bit is set: datagram cannot be fragmented and must be discarded if MTU is too
small
• ICMP sent
– MF bit:
• 1: this is not the last fragment
• 0: last fragment
59
What’s involved in Fragmentation?
• The following fields in the IP header are involved:
version
header
length
DS
Identification
time-to-live (TTL)
• Fragment
protocol
total length (in bytes)
ECN
0
DM
F F
Fragment offset (13-bit)
header checksum
offset
• Offset of the payload of the current fragment in the original
datagram in units of 8 bytes
• Why?
• Because the field is only 13 bits long, while the total length
is 16 bits.
• Total length
• Total length of the current fragment
Example of Fragmentation
• A datagram with size 2400 bytes must be fragmented according to an MTU
limit of 1000 bytes
Header length: 20
Total length:
2400
Identification:
0xa428
DF flag:
0
MF flag:
0
Fragment offset: 0
Header length: 20
Total length:
448
Identification:
0xa428
DF flag:
0
MF flag:
0
Fragment offset: 244
IP datagram
Header length: 20
Header length: 20
Total length:
996
Total length:
996
Identification:
0xa428 Identification:
0xa428
DF flag:
0
DF flag:
0
MF flag:
1
MF flag:
1
Fragment offset: 122
fragment offset: 0
Fragment 3
MTU: 4000
Fragment 2
Fragment 1
MTU: 1000
Router
61
Determining the length of fragments
• Maximum payload length = 1000 – 20 = 980 bytes
• Offset specifies the bytes in multiple of 8 bytes. So the payload must be a multiple
of 8 bytes.
• 980 - 980 % 8 = 976 (the largest number that is less than 980 and divisible by 8)
• The payload for the first fragment is 976 and has bytes 0 ~ 975 of the original IP
datagram. The offset is 0.
• The payload for the second fragment is 976 and has bytes 976 ~ 1951 of the
original IP datagram. The offset is 976 / 8 = 122.
• The pay load of the last fragment is 2400 – 976 * 2 = 428 bytes and has bytes
1952 ~ 2400 of the original IP datagram. The offset is 244.
• Total length of three fragments: 996 + 996 + 448 = 2440 > 2400
– Why?
– Two additional IP headers.
62
Path MTU discovery
• Fragmentation slows down the router
•  should be done by end hosts
• How does a sender know the MTU of a path?
– A host only knows the MTU of its links
• Solution
– send large packets with DF set
– If receive ICMP Fragmentation needed messages, reduce
maximum segment size
Summary
• IP addressing
• IP forwarding
– Forwarding algorithm
– Fragmentation
• Address resolution protocol (ARP)
• Internet Control Message protocol (ICMP)
– Error reporting
• Next: DHCP, NAT, IPv6, VPN and Tunneling
Download