William Stallings Data and Computer Communications - NYU

advertisement
Data Communication and
Networks
Lecture 9/10
Internet Protocols
November 6, 2003
Joseph Conron
Computer Science Department
New York University
jconron@cs.nyu.edu
What’s the Internet: Components view
 millions of connected computing devices: hosts, end-
systems
pc’s workstations, servers
PDA’s phones, toasters
running network apps
 communication links
fiber, copper, radio, satellite
 routers: forward packets (chunks) of data thru network
What’s the Internet:
Components view
 protocols: control sending, receiving of msgs
e.g., TCP, IP, HTTP, FTP, PPP
 Internet: “network of networks”
loosely hierarchical
public Internet versus private intranet
 Internet standards
RFC: Request for comments
IETF: Internet Engineering Task Force
What’s the Internet: a service view
 communication infrastructure enables
distributed applications:
WWW, email, games, e-commerce, database.,
voting,
more?
 communication services provided:
connectionless
connection-oriented
Internet structure: network of networks
 roughly hierarchical
 national/international backbone
providers (NBPs)
e.g. BBN/GTE, Sprint, AT&T,
IBM, UUNet
interconnect (peer) with each
other privately, or at public
Network Access Point (NAPs)
 regional ISPs
connect into NBPs
 local ISP, company
connect into regional ISPs
local
ISP
regional ISP
NBP B
NAP
NAP
NBP A
regional ISP
local
ISP
Connectionless Operation
 Corresponds to datagram mechanism in packet switched
network
 Each NPDU treated separately
 Network layer protocol common to all DTEs and routers
Known generically as the internet protocol
 Internet Protocol
One such internet protocol developed for ARPANET
RFC 791 (Get it and study it)
 Lower layer protocol needed to access particular
network
Connectionless Internetworking
Advantages
Flexibility
Robust
No unnecessary overhead
Unreliable
Not guaranteed delivery
Not guaranteed order of delivery
Packets can take different routes
Reliability is responsibility of next layer up (e.g. TCP)
Internet protocol stack
 application: supporting network applications
 ftp, smtp, http
 transport: host-host data transfer
 tcp, udp
 network: routing of datagrams from source to
destination
 ip, routing protocols
 link: data transfer between neighboring network
elements
 ppp, ethernet
 physical: bits “on the wire”
application
transport
network
link
physical
Protocol layering and data
Each layer takes data from above
 adds header information to create new data unit
 passes new data unit to layer below
source
M
Ht M
Hn Ht M
Hl Hn Ht M
application
transport
network
link
physical
destination
application
Ht
transport
Hn Ht
network
Hl Hn Ht
link
physical
M
message
M
segment
M
M
datagram
frame
Internet Protocol (IP)
 Only protocol at Layer 3
 Defines
Internet addressing
Internet packet format
Internet routing
 RFC 791 (1981)
IP Address Details
 32 Bits - divided into two parts
Prefix identifies network
Suffix identifies host
 Global authority assigns unique prefix to network (IANA)
 Local administrator assigns unique suffix to host
IP Addresses
given notion of “network”, let’s examine IP addresses:
“class-full” addressing:
class
A
0 network
B
10
C
110
D
1110
1.0.0.0 to
127.255.255.255
host
network
128.0.0.0 to
191.255.255.255
host
network
multicast address
32 bits
host
192.0.0.0 to
223.255.255.255
224.0.0.0 to
239.255.255.255
Classes and Network Sizes
 Maximum network size determined by class of
address
 Class A large
 Class B medium
 Class C small
IP Addressing Example
Subnets and Subnet Masks
 Allow arbitrary complexity of internetworked LANs within
organization
 Insulate overall internet from growth of network
numbers and routing complexity
 Site looks to rest of internet like single network
 Each LAN assigned subnet number
 Host portion of address partitioned into subnet number
and host number
 Local routers route within subnetted network
 Subnet mask indicates which bits are subnet number
and which are host number
Routing Using Subnets
IP addressing: CIDR
 classful addressing:
 inefficient use of address space, address space exhaustion
 e.g., class B net allocated enough addresses for 65K hosts, even if
only 2K hosts in that network
 CIDR: Classless InterDomain Routing
 network portion of address of arbitrary length
 address format: a.b.c.d/x, where x is # bits in network portion of
address
network
part
host
part
11001000 00010111 00010000 00000000
200.23.16.0/23
Internet Packets




Contains sender and destination addresses
Size depends on data being carried
Called IP datagram
Two Parts Of An IP Datagram
 Header
Contains source and destination address
Fixed-size fields
 Data Area (Payload)
 Variable size up to 64K
 No minimum size
IP datagram format
IP protocol version
number
header length
(bytes)
“type” of data
max number
remaining hops
(decremented at
each router)
upper layer protocol
to deliver payload to
32 bits
type of
ver head.
len service
length
fragment
16-bit identifier flgs
offset
time to upper
Internet
layer
live
checksum
total datagram
length (bytes)
for
fragmentation/
reassembly
32 bit source IP address
32 bit destination IP address
Options (if any)
data
(variable length,
typically a TCP
or UDP segment)
E.g. timestamp,
record route
taken, specify
list of routers
to visit.
IP Fragmentation & Reassembly
 network links have MTU
(max.transfer size) - largest
possible link-level frame.
fragmentation:
in: one large datagram
out: 3 smaller datagrams
 different link types, different
MTUs
 large IP datagram divided
(“fragmented”) within net
 one datagram becomes
several datagrams
 “reassembled” only at final
destination
 IP header bits used to
identify, order related
fragments
reassembly
IP Fragmentation and Reassembly
length ID fragflag offset
=4000 =x
=0
=0
One large datagram becomes
several smaller datagrams
length ID fragflag offset
=1500 =x
=1
=0
length ID fragflag offset
=1500 =x
=1
=1480
length ID fragflag offset
=1040 =x
=0
=2960
IP Semantics
IP is connectionless
Datagram contains identity of destination
Each datagram sent/ handled independently
Routes can change at any time
IP Semantics (continued)
IP allows datagrams to be
Delayed
Duplicated
Delivered out-of-order
Lost
Called best effort delivery
Motivation: accommodate all possible
networks
Datagram Lifetime
Datagrams could loop indefinitely
Consumes resources
Transport protocol may need upper bound on
datagram life
Datagram marked with lifetime
Time To Live field in IP
Once lifetime expires, datagram discarded (not
forwarded)
Hop count
Decrement time to live on passing through a each router
Time count
Need to know how long since last router
ICMP
Internet Control Message Protocol
RFC 792
Transfer of (control) messages from routers and
hosts to hosts
Feedback about problems
e.g. time to live expired
Encapsulated in IP datagram
Not reliable
ICMP Error Messages
When an ICMP error message is sent, the
message always contains the IP header and the
first 8 bytes of the IP datagram that caused the
problem
ICMP has rules regarding error message
generation to prevent broadcast storms
ICMP Echo Command
Used by “ping” and “tracert”
When a destination IP host receives an ICMP
echo command, it returns and ICMP “echo
reply”
Ping uses this to determine if a path to a
destination (and its return path) are “up”
Tracert uses echo in a clever way to determine
the identities of the routers along the path (by
“scoping” TTL).
Address Resolution Problem
Suppose we know the IP Address of a local
system (one to which we are connected)
We would like to send an IP packet to that
system.
The link layer (ethernet, for instance) only
knows about MAC addresses!
How do we determine the MAC address
associated with the IP address?
ARP
Address resolution provides a mapping between
two different forms of addresses
32-bit IP addresses and whatever the data link uses
ARP (address resolution protocol) is a protocol
used to do address resolution in the TCP/IP
protocol suite (RFC826)
ARP provides a dynamic mapping from an IP
address to the corresponding hardware address
ARP Protocol
 A knows B's IP address, wants to learn physical address
of B
 A broadcasts ARP query pkt, containing B's IP address
all machines on LAN receive ARP query
 B receives ARP packet, replies to A with its (B's) physical
layer address
 A caches (saves) IP-to-physical address pairs until
information becomes old (times out)
soft state: information that times out (goes away) unless
refreshed
ARP Cache
The cache maintains the recent IP to physical
address mappings
Each entry is aged (usually the lifetime is 20
minutes) forcing periodic updates of the cache
ARP replies are often broadcast so that all hosts
can update their caches
ARP Packet Format
8
16
31
Hardware Type
Hardware Size
Protocol Type
Protocol Size
Operation
Sender’s Hardware Address (for Ethernet 6 bytes)
Sender’s Protocol Address
(for IP 4 bytes)
Target Hardware Address
Target Protocol Address
Destination IP Address
Internet Transport Protocols
 Two Transport Protocols Available
 Transmission Control Protocol (TCP)
 connection oriented
most applications use TCP
RFC 793
 User Datagram Protocol (UDP)
 Connectionless
RFC 768
Transport layer addressing
Communications endpoint addressed by:
IP address (32 bit) in IP Header
Port number (16 bit) in TP Header1
Transport protocol (TCP or UDP) in IP Header
1
TP => Transport Protocol (UDP or TCP)
Standard services and port numbers
service
echo
daytime
netstat
ftp-data
ftp
telnet
smtp
time
domain
finger
http
pop-2
pop
sunrpc
uucp-path
nntp
talk
tcp
udp
7
7
13
13
15
20
21
23
25
37
37
53
53
79
80
109
110
111
111
117
119
517
TCP:
Overview
RFCs: 793, 1122, 1323, 2018, 2581
 point-to-point:
 full duplex data:
one sender, one receiver
bi-directional data flow in
same connection
MSS: maximum segment
size
 reliable, in-order byte
steam:
no “message boundaries”
 pipelined:
 connection-oriented:
handshaking (exchange of
control msgs) init’s
sender, receiver state
before data exchange
TCP congestion and flow
control set window size
 send & receive buffers
 flow controlled:
socket
door
application
writes data
application
reads data
TCP
send buffer
TCP
receive buffer
segment
socket
door
sender will not overwhelm
receiver
TCP Header
TCP segment structure
32 bits
URG: urgent data
(generally not used)
ACK: ACK #
valid
PSH: push data now
(generally not used)
RST, SYN, FIN:
connection estab
(setup, teardown
commands)
Internet
checksum
(as in UDP)
source port #
dest port #
sequence number
acknowledgement number
head not
UA P R S F
len used
checksum
rcvr window size
ptr urgent data
Options (variable length)
application
data
(variable length)
counting
by bytes
of data
(not segments!)
# bytes
rcvr willing
to accept
Reliability in an Unreliable World
 IP offers best-effort (unreliable) delivery
 TCP uses IP
 TCP provides completely reliable transfer
 How is this possible? How can TCP realize:
Reliable connection startup?
Reliable data transmission?
Graceful connection shutdown?
Reliable Data Transmission
 Positive acknowledgment
 Receiver returns short message when data arrives
 Called acknowledgment
 Retransmission
 Sender starts timer whenever message is transmitted
 If timer expires before acknowledgment arrives, sender retransmits
message
 THIS IS NOT A TRIVIAL PROBLEM! – more on this later.
TCP Flow Control
 Receiver
Advertises available buffer space
Called window
This is a known as a CREDIT policy
 Sender
Can send up to entire window before ACK arrives
 Each acknowledgment carries new window
information
Called window advertisement
Can be zero (called closed window)
 Interpretation: I have received up through X, and can
take Y more octets
Credit Scheme
Decouples flow control from ACK
May ACK without granting credit and vice versa
Each octet has sequence number
Each transport segment has seq number, ack
number and window size in header
Use of Header Fields
When sending, seq number is that of first octet
in segment
ACK includes AN=i, W=j
All octets through SN=i-1 acknowledged
Next expected octet is i
Permission to send additional window of W=j
octets
i.e. octets through i+j-1
Credit Allocation
TCP Flow Control
flow control
sender won’t overrun
receiver’s buffers by
transmitting too much,
too fast
RcvBuffer = size of TCP Receive Buffer
RcvWindow = amount of spare room in Buffer
receiver buffering
receiver: explicitly informs
sender of (dynamically
changing) amount of
free buffer space
RcvWindow field in
TCP segment
sender: keeps the amount
of transmitted, unACKed
data less than most
recently received
RcvWindow
TCP seq. #’s and ACKs
Seq. #’s:
byte stream
“number” of first
byte in segment’s
data
ACKs:
seq # of next byte
expected from other
side
cumulative ACK
Q: how receiver handles
out-of-order segments
A: TCP spec doesn’t
say, - up to
implementor
Host A
User
types
‘C’
Host B
host ACKs
receipt of
‘C’, echoes
back ‘C’
host ACKs
receipt
of echoed
‘C’
simple telnet scenario
time
TCP ACK generation
[RFC 1122, RFC 2581]
Event
TCP Receiver action
in-order segment arrival,
no gaps,
everything else already ACKed
delayed ACK. Wait up to 500ms
for next segment. If no next segment,
send ACK
in-order segment arrival,
no gaps,
one delayed ACK pending
immediately send single
cumulative ACK
out-of-order segment arrival
higher-than-expect seq. #
gap detected
send duplicate ACK, indicating seq. #
of next expected byte
arrival of segment that
partially or completely fills gap
immediate ACK if segment starts
at lower end of gap
TCP: retransmission scenarios
time
Host A
Host B
X
loss
lost ACK scenario
Host B
Seq=100 timeout
Seq=92 timeout
timeout
Host A
time
premature timeout,
cumulative ACKs
Why Startup/ Shutdown Difficult?
 Segments can be
Lost
Duplicated
Delayed
Delivered out of order
Either side can crash
Either side can reboot
 Need to avoid duplicate ‘‘shutdown’’ message from affecting
later connection
TCP Connection Management
Recall: TCP sender, receiver
establish “connection” before
exchanging data segments
 initialize TCP variables:
seq. #s
buffers, flow control info
(e.g. RcvWindow)
 client: connection initiator
Socket clientSocket = new
Socket("hostname","port
number");
 server: contacted by client
Socket connectionSocket =
welcomeSocket.accept();
Three way handshake:
Step 1: client end system sends
TCP SYN control segment to
server
specifies initial seq #
Step 2: server end system
receives SYN, replies with
SYNACK control segment
ACKs received SYN
allocates buffers
specifies server-> receiver
initial seq. #
TCP Connection Management (OPEN)
client
server
opening
opening
established
closed
TCP Connection Management (cont.)
Closing a connection:
client closes socket:
clientSocket.close();
client
server
close
Step 1: client end system sends
TCP FIN control segment to
server
close
replies with ACK. Closes
connection, sends FIN.
timed wait
Step 2: server receives FIN,
closed
TCP Connection Management (cont.)
Step 3: client receives FIN,
client
replies with ACK.
Enters “timed wait” - will
respond with ACK to
received FINs
closing
Step 4: server, receives ACK.
closing
can handle simultaneous
FINs.
timed wait
Connection closed.
Note: with small modification,
server
closed
closed
TCP Connection Management
(cont)
TCP server
lifecycle
TCP client
lifecycle
Timing Problem!
The delay required for data to reach a destination and an
acknowledgment to return depends on traffic in the internet as
well as the distance to the destination. Because it allows
multiple application programs to communicate with multiple
destinations concurrently, TCP must handle a variety of delays
that can change rapidly.
How does TCP handle this .....
Solving Timing Problem
 Keep estimate of round trip time on each connection
 Use current estimate to set retransmission timer
 Known as adaptive retransmission
 Key to TCP’s success
TCP Round Trip Time and Timeout
Q: how to set TCP
timeout value?
 longer than RTT
note: RTT will vary
 too short: premature
timeout
unnecessary
retransmissions
 too long: slow reaction
to segment loss
Q: how to estimate RTT?
 SampleRTT: measured time from
segment transmission until ACK
receipt
ignore retransmissions,
cumulatively ACKed segments
 SampleRTT will vary, want
estimated RTT “smoother”
use several recent
measurements, not just current
SampleRTT
TCP Round Trip Time and Timeout
EstimatedRTT = (1-x)*EstimatedRTT + x*SampleRTT
 Exponential weighted moving average
 influence of given sample decreases exponentially fast
 typical value of x: 0.1
Setting the timeout
 EstimtedRTT plus “safety margin”
 large variation in EstimatedRTT -> larger safety margin
Timeout = EstimatedRTT + 4*Deviation
Deviation = (1-x)*Deviation +
x*|SampleRTT-EstimatedRTT|
Implementation Policy Options
Send
Deliver
Accept
Retransmit
Acknowledge
Send
If no push or close TCP entity transmits at its
own convenience (IFF send window allows!)
Data buffered at transmit buffer
May construct segment per data batch
May wait for certain amount of data
Deliver (to application)
In absence of push, deliver data at own
convenience
May deliver as each in-order segment received
May buffer data from more than one segment
Accept
Segments may arrive out of order
In order
Only accept segments in order
Discard out of order segments
In windows
Accept all segments within receive window
Retransmit
TCP maintains queue of segments transmitted
but not acknowledged
TCP will retransmit if not ACKed in given time
First only
Batch
Individual
Acknowledgement
Immediate
as soon as segment arrives.
will introduce extra network traffic
Keeps sender’s pipe open
Cumulative
Wait a bit before sending ACK (called “delayed ACK”)
Must use timer to insure ACK is sent
Less network traffic
May let sender’s pipe fill if not timely!
UDP: User Datagram Protocol
 “no frills,” “bare bones”
Internet transport protocol
 “best effort” service, UDP
segments may be:
lost
delivered out of order to
app
 connectionless:
no handshaking between
UDP sender, receiver
each UDP segment
handled independently of
others
[RFC 768]
Why is there a UDP?
 no connection
establishment (which
can add delay)
 simple: no connection
state at sender, receiver
 small segment header
 no congestion control:
UDP can blast away as
fast as desired
UDP: more
 often used for streaming
multimedia apps
loss tolerant
Length, in
bytes of UDP
rate sensitive
 other UDP uses
DNS
SNMP
 reliable transfer over UDP:
add reliability at application
layer
application-specific error
recover!
segment,
including
header
32 bits
source port #
dest port #
length
checksum
Application
data
(message)
UDP segment format
UDP Uses
Inward data collection
Outward data dissemination
Request-Response
Real time application
Download