The Transport p Service Chapter p 6 The Transport Layer Services Provided to the Upper pp Layers y • • • • Services Provided to the Upper Layers Transport Service Primitives Berkeley Sockets A E An Example l off Socket S k t Programming: P i – An Internet File Server Transport p Service Primitives The primitives for a simple transport service. The network, network transport, transport and application layers. layers Transport p Service Primitives (2) ( ) Transport p Service Primitives ((3)) The nesting of TPDUs, TPDUs packets, packets and frames. frames A state diagram for a simple connection management scheme. Transitions labeled in italics are caused by packet arrivals. The solid lines show the client's state sequence. The dashed lines show the server's state sequence. Berkeleyy Sockets Socket API a) Creating a socket int socket(int domain domain, int type type, int protocol) • domain = PF_INET, PF_UNIX • type = SOCK_STREAM, SOCK STREAM SOCK_DGRAM SOCK DGRAM b) Passive P i O Open ((on server)) int bind(int socket, struct sockaddr *addr, int addr_len) int listen(int socket socket, int backlog) int accept(int socket, struct sockaddr *addr, int addr_len) The socket primitives for TCP. TCP Sockets (cont) ( ) a) Active Open (on client) int connect(int socket, struct sockaddr *addr, int addr_len) _ ) Protocol-to-Protocol Interface a) Process Model – avoid context switches b) Buffer Model b) Sending/Receiving Messages int send(int socket, char *msg, int mlen, int flags) int recv(int socket, char *buf, int blen, int flags) – avoid data copies M Message Lib Library P Process M Model d l Strip header Add header m m abcdefg abcdefg b d f bcopy “ ( xyz", hdr , 3); msgAddHdr(m, hdr, 3); hdr =msgStripHdr(m, g p ( , 3); ); m m defg xyzabcdefg (a) Process-per-Protocol (b) Process-per-Message + hdr =“abc" Event Library M Message Lib Library (cont) ( t) a) Scheduling Timeouts & Book-keeping activity a) Fragment message a) Reassemble messages b) Operations m m1 abcdefg m m2 abcd b d msgFragment (m, new, 3); new efg f msgReassemble(new, m1, m2) Event evSchedule(EvFunc function, void *argument, int time) EvCancelReturn evCancel(Event ( event) ) new defg + abc Socket P Programming i Example: Internet File Server 6-6-1 Client code using sockets. abcdefg Socket Programming Example: Internet File Server (2) Client code using sockets. Elements of Transport p Protocols • • • • • • Transport p Protocol Addressing Connection Establishment Connection Release Flow Control and Buffering Multiplexing Crash Recovery (a) Environment of the data link layer. ((b)) Environment of the transport p layer. y Addressing g TSAPs NSAPs and transport connections. TSAPs, connections Connection Establishment How a user process in host 1 establishes a connection with a time-of-day server in host 2. Connection Establishment (2) ( ) a) b) Suppose that in a congested subnet, ACKs hardly ever get back in time and each p times out and is retransmitted two or three times. pkt Nightmare: a user established a connection with a bank, sends msg to the bank to transfer a large amount of money and then releases the connection. But each pkt p and stored in the subnet. After the connection has been released,, all is duplicated pkts pop out of the subnet and arrive at the destination in order, asking the bank to establish a new connection, transfer money (again), and release the connection. p transaction.Æ How to p prevent the pproblem The bank thinks it is a second, indep of delayed duplicates. 1. throw-away transport address: process server model impossible; 2 give each connection a connection identifier chosen by the initiating party; 2. after each connection is released, each transport entity could update a table listing obsolete connections as (peer transport entity, connection IDÆ each transport must maintain a certain amount of history information; If a machine crashes and loses its memory ??? 3. pkts lifetime can be restricted to a known maximum using (1) restricted subnet design (2) hop counter (3) timestamping each pktÆ introducing T, T some small mulltiple of the true maximum pkt lifetime; Connection Establishment (3) ( ) (a) TPDUs may not enter the forbidden region. (b) The resynchronization problem. problem Connection Establishment (2) ( ) Each host with a time-of-day clock; Basic Idea: two identically numbered TPDUs are never outstanding t t di att the th same time. ti When Wh a connection ti is i sett up, the th lowl order k bits of the clock are used as the initial seq. #. (next slide’s figure) (a) Let T=60 sec. the initial seq. # for a connection opened at time x will be x. I Imaging i that h at t=30sec, 30 an ordinary di data d TPDU bbeing i sent on connection i 5 is i given seq. # 80. Call this TPDU X. After sending TPDU X, the host crashes and then quickly restarts. At t=60, it begins reopening connections 0 ~ 4. At t=70, it reopens connection ti 55, using i iinitial iti l seq. # 70. 70 Assume A that th t att t=85 t 85 a new TPDU with seq. # 80. Unfortunately, TPDU X still exists. If it arrives before the new TPDU 80. TPDU X will be accepted. Æ preventing seq. #’s from being used for a time T before their potential use as initial seq. seq ##’ss. Æ forbidden region region. However However, two problems exists; (1) If a host sends too much data too fast on a newly-opened connection, the actual t l seq. # versus ti time curve may rise i more steeply t l than th the th initial i iti l seq. # versus time curve. Æ favor a short clock tick. (2) as (b) seq.# will run out and actual seq.# will go into the forbidden region; Æ j before just b f sending di every TPDU, TPDU the h transport entity i must check h k to see if it i is i about to enter the forbidden region, if so, either delay the TPDU for T sec or resynchronize the seq. #. Connection Establishment (3) ( ) Three p protocol scenarios for establishingg a connection usingg a three-way handshake. CR denotes CONNECTION REQUEST. (a) Normal operation, (b) Old CONNECTION REQUEST appearing out of nowhere. (c) Duplicate CONNECTION REQUEST and duplicate ACK. Connection Release Abrupt disconnection with loss of data. CR: Connection Request; DR: Disconnection Request; Connection Release (3) ( ) 6-14, a, b Four protocol scenarios for releasing a connection. connection (a) Normal case of a three-way handshake. (b) final ACK lost. Connection Release (2) ( ) The two two-army army problem. problem Connection Release (4) ( ) 6-14, c,d (c) Response lost lost. (d) Response lost and subsequent DRs lost lost. Flow Control and Buffering g ((a)) Chained Ch i d fi fixed-size d i buffers; b ff wastedd if unequall TPDU (b) Chained variable-sized buffers. (c) One large circular buffer per connection; poor if some connections are lightly loaded; Multiplexing p g (a) Upward multiplexing multiplexing. (b) Downward multiplexing multiplexing. Flow Control and Bufferingg ((2)) Dynamic buffer allocation. allocation The arrows show the direction of transmission. An ellipsis (…) indicates a lost TPDU. Crash Recoveryy Different combinations of client and server strategy. A:ack; C:crash; W:write;S1: one TPDU outstanding; S0: no TPDUs outstanding A Simple p Transport p Protocol The Example p Transport p Entityy • The Example Service Primitives • The E Example ample Transport Entity Entit • The Example as a Finite State Machine The network layer packets used in our example. example The Example p Transport p Entityy (2) ( ) Each connection is in one of seven states: 1. Idle – Connection not established yet. 2 2. W iti – CONNECT hhas bbeen executed, Waiting t d CALL REQUEST sent. t 3. Queued – A CALL REQUEST has arrived; no LISTEN yet. 4 4. E t bli h d – The Established Th connection ti has h been b established. t bli h d 5. Sending – The user is waiting for permission to send a packet. 6 6. R i i – A RECEIVE has Receiving h been b done. d 7. DISCONNECTING – a DISCONNECT has been done locally. Trace the program in page 518 by yourself. The Example p as a Finite State Machine The example Th l protocol t l as a finite state machine. Each entry has an optional predicate predicate, an optional action, and the new state. The tilde(~) indicates that no major action is taken. An overbar above a predicate di t indicate i di t the th negation ti of the predicate. Blank entries correspond to impossible or invalid events. The Example p as a Finite State Machine (2) ( ) The Internet Transport Protocols: UDP • Introduction to UDP • Remote Procedure Call • The Real-Time Transport Protocol Th example The l protocoll in i graphical hi l form. f Transitions T ii that h leave l the connection state unchanged have been omitted for simplicity. Introduction to UDP Remote Procedure Call The UDP header. header Steps in making a remote procedure call. call The stubs are shaded. shaded The Real-Time Transport p Protocol (a) The position of RTP in the protocol stack. stack (b) Packet nesting. nesting The Internet Transport p Protocols: TCP • • • • • • • • • • • • Introduction to TCP Th TCP Service The S i Model M d l The TCP Protocol Th TCP Segment The S t Header H d TCP Connection Establishment TCP C Connection ti R Release l TCP Connection Management Modeling TCP Transmission T i i Policy P li TCP Congestion Control TCP Timer Ti Management M t Wireless TCP and UDP Transactional TCP The Real-Time Transport p Protocol (2) ( ) Ver. =2; P: padded to a multiple of 4 bytes and the last padding byte tells # of padding; X:an extension header is present; CC: # of contributing sources; M: an application-specific marker, e.g., start of a video frame; Payload type: encoding algorithm, e.g. MP3; Seq. #: for detecting packet loss; timestamp: for jitter g; Sync. y Source ID: which stream the pkt p belongs g to;; Contr. Source id: reducing; Mixers are present, the mixer is the synchronizing source, and the streams being mixed are listed here; The TCP Service Model Port 21 23 25 69 79 80 110 119 Protocol FTP Telnet SMTP TFTP Finger HTTP POP 3 POP-3 NNTP Use File transfer Remote login E-mail Trivial File Transfer Protocol Lookup info about a user World Wide Web R Remote t e-mail il access USENET news Some assigned ports ports. The TCP Service Model (2) ( ) The TCP Segment g Header (a) Four 512-byte segments sent as separate IP datagrams. (b) The Th 2048 bbytes t off ddata t ddelivered li d tto th the application li ti in i a single i l READ CALL. TCP Header. The TCP Segment g Header a) b) c) d) e) f) g) h) The TCP Segment g Header Source port and Destination port: well-known ports are defined at www.iana.org; www iana org; URG: urgent pointer is in use; urgent pointer indicates a byte offset from the current seq. seq #. # ACK: Acknowledgement # is valid. PSH: Push data; receiver must deliver the data to the application and not buffer it until a full buffer. RST: reset a connection; SYN: used to establish connections; FIN: release a connection; Window size: flow control with a variable-sized sliding window; The pseudoheader included in the TCP checksum. checksum (1) detect misdelivered pkts (2) violate the protocol hierarchy TCP Connection Establishment TCP Connection Management g Modeling g 6-31 (a) TCP connection establishment in the normal case. (b) Call C ll collision; lli i only l one socket k t connection ti identified id tifi d by b (x,y); ( ) TCP Connection Management Modeling (2) The states used in the TCP connection management finite state machine. machine TCP Transmission Policyy TCP connection management finite state machine. The heavy solid line is the normal path for a client. The heavy dashed line is the normal path for a server. The light lines are unusual events. Each transition is labeled by the event causing it and the action resulting from it, separated by a slash. (1) Receiver has a 4K-byte buffer; TCP Transmission Policy y ((2)) Silly window syndrome. TCP Congestion g Control ((2)) An example of the Internet congestion algorithm. TCP Congestion g Control (a) A fast network feeding a low capacity receiver. (b) A slow network feeding a high-capacity receiver. TCP Timer Management g (a) Probability density of ACK arrival times in the data link layer. (b) Probability density of ACK arrival times for TCP. If timeout is too short (T1), unnecessary retransmission will occur. Timeout Setting g a) b) c) d) e) Wireless TCP and UDP RTT αRTT (1 α)M; RTT=αRTT+(1-α)M; M: a new measurement of RTT; typically α α=7/8; 7/8; Timeout = β RTT; typically β=2; however, inflexible; Most TCP implementation; D D=αD+(1-α)|RTT-M|; αD+(1-α)|RTT-M|; Timeout = RTT + 4D; Splitting a TCP connection into two connections. connections Transactional TCP Performance Issues • • • • • (a) RPC using normal TPC. (b) RPC using T/TCP. Performance f Problems bl in i Computer C Networks k Network Performance Measurement System Design for Better Performance Fast TPDU Processing Protocols for Gigabit Networks Performance Problems in Computer Networks Network Performance Measurement The basic loop for improving network performance. 1. Measure relevant network parameters, performance. 2 Try to understand what is going on. 2. on 3. Change one parameter. The state of transmitting one megabit from San Diego to Boston (a) At t = 0, (b) After 500 μsec, (c) After 20 msec, (d) after 40 msec. System y Design g for Better Performance System y Design g for Better Performance (2) ( ) Rules: 1. CPU speed is more important than network speed. 2 Reduce 2. R d packet k count to reduce d software f overhead. h d 3. Minimize context switches. 4. Minimize copying. 5 You can buy more bandwidth but not lower delay 5. delay. 6. Avoiding congestion is better than recovering from it. 7 Avoid 7. A id ti timeouts. t Response as a function of load. load System y Design g for Better Performance (3) ( ) Four context switches to handle one packet with a user-space network manager. Protocols for Gigabit g Networks Time to transfer and acknowledge a 11-megabit megabit file over a 4000 4000-km km line. line