トランスポートレイヤ技術 - TCP; Transmission Control Protocol (1) TCPの基本動作 (2) TCP Interactive Data Flow (3) TCP Bulk Data Flow (4) TCP Data Retransmission (5) TCP Persist Timer (6) TCP Keep Alive Timer (7) その他・将来 1 ping telnet ftp X traceroute tftp bootp smtp TCP ICMP NFS/RPC UDP IP IGMP Data-Link Ethernet FDDI SDH FR ATM 2 インターネットアーキテクチャ - TCP : Transmission Control Protocol • TCP (Transmission Control Protocol) ; end-to-end – フロー制御 – エラー制御 / 再送制御 – コネクション管理 – セッションの多重化 Application Application TCP TCP IP IP IP Network Interface Network Interface Network Interface Physical Physical Physical 3 トランスポートレイヤ技術 - TCP; Transmission Control Protocol (1) TCPの基本動作 (2) TCP Interactive Data Flow (3) TCP Bulk Data Flow (4) TCP Data Retransmission (5) TCP Persist Timer (6) TCP Keep Alive Timer (7) その他・将来 4 TCP Header Format 0 0 78 15 16 23 24 31 source port identifier destination port identifier 20 Bytes 1 sequence number 2 3 ACK number 5 6 Offset(4) Rsrvd(6) control bits UR AK PH RT SY FN 4 checksum window size Urgent Pointer Option Padding 5 TCP Header Format UR AK PH RT SY FN : Urgent Pointer Field Significant (URG) : Acknowledgement Field Significant (ACK) : Push Function : Reset the Connection : Synchronize Sequence Numbers (SYN) : No More Data From Sender (FIN) 6 TCP Features ・ “Stream” Oriented Data Transmission → Connection確立(Three-way-handshake) ・ Connection (“Stream”) Identifier = “Socket” {dst_IP_addr, dst_port, src_IP_addr, src_port} ・ “Sequence Number” ; 32 bits → バイト番号 : 0 − (2^32-1) → 2^32 でSequence NumberがWrapされる ・ “Full-Duplex”での通信 ・ Acknowledgement (ACK) ; → 次に受信すべきバイト番号(SN)の通知 ・ エラー回復: セグメント再送(Segment retransmission) by Time-out, Dupilicated-ACK ・ “Sliding Window Control” を用いたデータ転送制御 (*) Window_size ≦ 65,535 Bytes 7 TCP Port Allocation (RFC1700) 1. Well-Known Ports ; 0 - 1,023 2. Registered Ports ; 1,024 - 49,151 3. Dynamic and/or Private Ports ; 49,152 - 65,535 最新情報 : ftp://ftp.isi.edu/in-notes/iana/assignments/port-numbers 8 TCP Well-Known Ports Port Number Keyword 5 20 21 23 25 39 53 63 67 69 70 79 80 110 111 119 rje ftp-data ftp telnet smtp rlp domain whois++ bootp tftp gopher finger http pop3 sunrpc nntp Application Remote Job Entry File Transfer [Default data] File Transfer [Control] Telnet Simple Management Protocol Resource Location Protocol Domain Name Server Whois++ Bootstrap Protocol Server Trivial File Transfer Gopher Finger World Wide Web HTTP Post Office Protocol - Version 3 SUN Remote Procedure Call Network News Transfer Protocol 9 TCP Well-Known Ports Port Number Keyword 123 137 138 139 179 202 213 220 396 540 546 547 560 ntp netbios-ns netbios-dgm netbios-ssn bgp at-nbp ipx imap3 netware-ip uucp dhcpv6-client dhcpv6-server rmonitor Application Network Time Protocol NetBIOS Name Service NetBIOS Datagram Service NetBIOS Session Service Border Gateway Protocol (BGP) AppleTalk Name Binding Protocol IPX IMAP3 (Interactive Mail Access Protocol) Novell Netware over IP uucp daemon DHCPv6 Client DHCPv6 Server remote monitor daemon 10 TCP Connection確立/開放 Log on the console; svr4% telnet bsdi discard Trying 140.252.13.35 Connected to bsdi. Escape character is ‘^]’. ^] telnet> quit Connection closed. #port=“9” (server discard packet) tcpdump output 1 0.0 2 0.024 (0.0024) 3 0.007 (0.0048) 4 4.155 (4.1482) 5 4.158 (0.0013) 6 4.159 (0.0014) 7 4.189 (0.0225) svr4.1037 > bsdi.discard: S 14155.14155(0) win 4096 <mss 1024> bsdi.discard > svr4.1037: S 18239.18239(0) ack 14156 win 4096 <mss 1024> svr4.1037 > bsdi.discard: . ack 18240 win 4096 svr4.1037 > bsdi.discard: F 14156:14156(0) ack 18240 win 4096 bsdi.discard > svr4.1037: . ack 14157 win 4096 bsdi.discard > svr4.1037: F 18240.18240(0) ack 14157 win 4096 svr4.1037 > bsdi.discard: . ack 18241 win 4096 TCP Connection確立/開放 tcpdump output 1 0.0 2 0.024 (0.0024) 3 0.007 (0.0048) 4 4.155 (4.1482) 5 4.158 (0.0013) 6 4.159 (0.0014) 7 4.189 (0.0225) svr4.1037 > bsdi.discard: S 14155.14155(0) win 4096 <mss 1024> bsdi.discard > svr4.1037: S 18239.18239(0) ack 14156 win 4096 <mss 1024> svr4.1037 > bsdi.discard: . ack 18240 win 4096 svr4.1037 > bsdi.discard: F 14156:14156(0) ack 18240 win 4096 bsdi.discard > svr4.1037: . ack 14157 win 4096 bsdi.discard > svr4.1037: F 18240.18240(0) ack 14157 win 4096 svr4.1037 > bsdi.discard: . ack 18241 win 4096 [意味] source.port > destination.port : flags SN_begin.SN_end(data_size) flags : S = SYN ; Synchronize sequence_number(SN) F = FIN ; Finish data transmission R = RST ; Reset connection P = PSH ; push data to receiving process asap . = ; none of above four flags is on SN_end = SN_begin + data_size win 4096 ; window size is 4096 mss 1024 ; maximum segment size is 1024 bytes 12 TCP Connection確立/開放 svr4.1037 (client) bsdi.discard(server) segment 1 SYN 14155.14155(0) SYN 18239.18239(0) ACK 14156 segment 3 (18239+1) “次に受信すべきSN” segment 4 ACK 18240 FIN 14156:14156(0) ACK 18240 segment 5 (14156+1) “次に受信すべきSN” segment 6 ACK 14157 FIN 18240.18240(0) segment 7 (18240+1) “次に受信すべきSN” segment 2 (14155+1) “次に受信すべきSN” ACK 14157 ACK 18241 13 TCP Connection確立/開放 svr4.1037 (client) “Active open” SYN (a) bsdi.discard(server) (appli. open : telnet) SYN_ACK(a+1,b) “open” “Passive open” ACK(b+1) “open” “Active Close” (application close: quit) FIN (m,s) EOF to Application ACK (m+1) “half close” “Passive Close” FIN_ACK (m+1,s) ACK (s+1) (application close) “half close” → full close 14 CLOSED appl: passive open send: <nothing> appl: active open send: SYN LISTEN recvl: SYN send: SYN, ACK appl: close send: FIN passive open appl: send data Send : RST send: SYN Active open recv: SYN SYN_RCVD SYN_SENT appl: close send: SYN,ACK or timeout (simultaneous open) recv: SYN,ACK recv: ACK send: ACK send: <nothing> appl: close send: FIN FIN_WAIT_1 recv: ACK send: <nothing> ESTABLISHED recv: FIN send: ACK recv: FIN,ACK send: ACK FIN_WAIT_2 recv: FIN send: ACK Active close recv: FIN send: ACK simultaneous close CLOSE_WAIT appl: close send: FIN CLOSING recv: ACK send: <nothing> TIME_WAIT 2 MSL timeout LAST_ACK Passive close recv: ACK send: <nothing> TCP Layer Interfaces 開始 指示 データ 送信 OPEN SEND Session 確立 TCP IP データ 送信 データ 受信 Session 情報 RECEIVE STATUS データ 受信 異常終了 指示 正常終了 指示 ABORT CLOSE Session 廃棄 Session 開放 Send (Service_Type, TTL, 擬似ヘッダ) Recieve send receive 16 TCP Layer Interfaces (1) OPEN Call : 機能 ; コネクション開始の指示 引数 ; Local_port, Destination_socket, Open_Mode(Active/Passive), [timeout_value]、[Priority], [security], [Options] 戻り値 ; Local_Connection_Name TCP動作 ; LISTEN(Passive_Open)、 ESTABLISHED(Active_Open) System calls; - socket(pf, type, protocol) - bind(socket, localaddr, adddrlen) - connect(socket, destaddr, addrlen) (2) SEND Call : 機能 ; データの送信指示 引数 ; Local_Connection_Name, 送信データバッファアドレス、 送信データバイト数, [PUSH], [URG], [再送タイムアウト値] 戻り値 ; なし TCP動作 ; データ送信(ESTASBLISHED) System calls ; write(socket, buffer, length) send(socket, message, length, flags) 17 TCP Layer Interfaces (3) RECEIVE Call : 機能 ; データの受信指示 引数 ; Local_Connection_Name, 受信データバッファアドレス、 受信データバイト数 戻り値 ; 受信バイト数, URG, PUSH, [受信バッファアドレス] TCP動作 ; データ受信格納 (ESTASBLISHED) System calls ; read(descriptor, buffer, length) recvfrom(socket, buffer, flags, fromaddr, addlen) recvmsg(socket, messagestruct, flags) (4) STATUS Call : 機能 ; コネクション状態の取得を指示 引数 ; Local_Connection_Name 戻り値 ; Local_Socket, Destination_Socket, Local_Conenction_Name, 受信window_size, 送信window_size, Connection_state, ACK待ちバッファ数, 未受信バッファ数, URG, Priority, Security, 送信タイムアウト値 18 TCP Layer Interfaces (5) ABORT Call : 機能 ; コネクションの異常終了指示 引数 ; Local_Connection_Name 戻り値 ; なし TCP動作 ; コネクション廃棄 (CLOSED) (6) CLOSE Call : 機能 ; コネクションの正常終了指示 引数 ; Local_Connection_Name 戻り値 ; なし TCP動作 ; コネクション開放 (CLOSED) System call ; close(socket) 19 トランスポートレイヤ技術 - TCP; Transmission Control Protocol (1) TCPの基本動作 (2) TCP Interactive Data Flow (3) TCP Bulk Data Flow (4) TCP Data Retransmission (5) TCP Persist Timer (6) TCP Keep Alive Timer (7) その他・将来 20 TCP Interactive Data Flow - Default and Basic Procedure Telnet client key-stroke “d” Telnet server data byte ack of data byte echo of data byte echo to display process “d” to telnet server process “d” echo from telnet server process “d” ack of echo data byte 21 TCP Interactive Data Flow - Delayed ACK Telnet client key-stroke “d” Telnet server data byte to telnet server process “d” ack of data byte echo of data byte Aggregate message → Delayed ACK echo to display process “d” echo from telnet server process “d” ack of echo data byte 22 TCP Interactive Data Flow - Delayed ACK : Piggy-Back Telnet client Telnet server echo of data byte + ack of data byte delay window delay window data byte echo from telnet server process “d” ack of echo data byte 23 TCP Interactive Data Flow <Client> <Server> date¥n (6 bytes) => Sat Feb 6 07:52:17 MST 1993¥n (30 bytes) 1 2 “d” 3 4 “a” 5 6 7 8 “t” 9 10 “e” 11 12 13 “¥n” 14 →“CR/LF” 15 16 17 “svr4 % “ 18 19 0.0 0.016497 0.139955 0.458037 0.474386 0.539943 0.814582 0.831108 0.940112 1.191287 1.207701 1.339994 1.680646 1.697977 1.739974 1.799841 1.940176 1.944338 2.140110 (0.0165) (0.1235) (0.3181) (0.0163) (0.0656) (0.2746) (0.0165) (0.1090) (0.2512) (0.0164) (0.1323) (0.3407) (0.0173) (0.0420) (0.0599) (0.1403) (0.0042) (0.1958) bsdi.1023 > svr4.login: svr4.login > bsdi.1023: bsdi.1023 > svr4.login: bsdi.1023 > svr4.login: svr4.login > bsdi.1023: bsdi.1023 > svr4.login: bsdi.1023 > svr4.login: svr4.login > bsdi.1023: bsdi.1023 > svr4.login: bsdi.1023 > svr4.login: svr4.login > bsdi.1023: bsdi.1023 > svr4.login: bsdi.1023 > svr4.login: svr4.login > bsdi.1023: bsdi.1023 > svr4.login: svr4.login > bsdi.1023: bsdi.1023 > svr4.login: svr4.login > bsdi.1023: bsdi.1023 > svr4.login: P P . P P . P P . P P . P P . P . P . 0:1(1) ack 1 1:2(1) ack 1 ack 2 1:2(1) ack 2 2:3(1) ack 2 ack 3 2:3(1) ack 3 3:4(1) ack 3 ack 4 3:4(1) ack 4 4:5(1) ack 4 ack 5 4:5(1) ack 5 5:7(2) ack 5 ack 7 7:37(30) ack 5 ack 37 37:44(7) ack 5 ack 44 24 bsdi.1023 D-ACK D-ACK D-ACK D-ACK D-ACK D-ACK 1 svr4.login PSH 0:1(1) ack 1 (d) 2 PSH 1:2(1) ack 1 (echo d) 3 4 PSH 1:2(1) ack 2 (a) 5 PSH 2:3(1) ack 2 (echo a) 6 7 ack 3 PSH 2:3(1) ack 3 (t) 8 PSH 3:4(1) ack 3 (echo t) 9 10 Delayed ACKによる メッセージのAggregate ack 2 ack 4 PSH 3:4(1) ack 4 (e) PSH 4:5(1) ack 4 (echo e) 11 ack 5 12 13 PSH 4:5(1) ack 5 (¥n) 15 PSH 5:7(2) ack 5 (echo CR/LF) ack 7 PSH 7:37(30) ack 5 (date内容) 14 17 ack 37 19 PSH 37:44(7) ack 5 (echo svr4%) ack 44 16 18 ・ D-ACK ; 200 msec ・ Piggy-back ; → echo + ack ・ segment 13 ; 1 byte data “¥n” segment 14 ; 2 byte date “CR/LF” TCP Negle Algorithm “Large RTT” (Sender) data packet flow “¥n” “e” “t” “a” (Receiver) “d” echo & ack packet flow aggregate payload “te¥n” (e.g., ack_of_”d” & echo “d”) “¥n” + “e” + “t” : IP (20B) : TCP(20B) : Data 26 Negle Algorithm Telnet client key-stroke (sn=1) “c” 1 (sn=2) “a” (sn=3) “t” ack 2 → OK sn=1 “a”+”t” → “at” 3 Telnet server PSH 1:2(1) ack 2 2 PSH 2:3(1) ack 2 PSH 2:4(2) ack 3 4 “at” (sn=3,4) PSH 3:5(2) ack 4 ack 4 → “sn=4”を期待 (sn=3まで受信) 5 echo; “c” (sn=2) ack 5 ack 5 → “sn=5”を期待 (sn=4まで受信) 27 Disable Negle Algorithm Telnet client F1 key (sn=1) “ESC” 1 2 (sn=2) “[” (sn=3) “M” 3 Telnet server PSH 1:2(1) ack 2 PSH 2:3(1) ack 2 PSH 3:4(1) ack 2 to telnet server “ESC” (sn=2) “[” (sn=3) “^[[” “M” (sn=4) “M” 4 PSH 2:5(3) ack 3 ack 4 → OK 1,2,3 sn=“5” → missing 2,3,4 5 PSH 5:6(1) ack 4 ack 2 ack 2 → “^”を期待 “timeout” “^]]M” PSH 2:6(4) ack 4 F1 key echo結果 : 7 “^[[M” ack 6 6 ack 2 以上を受信せず ack 2以降を再送 ack 6 → OK 2,3,4,5 28 トランスポートレイヤ技術 - TCP; Transmission Control Protocol (1) TCPの基本動作 (2) TCP Interactive Data Flow (3) TCP Bulk Data Flow (4) TCP Data Retransmission (5) TCP Persist Timer (6) TCP Keep Alive Timer (7) その他・将来 29 TCP Bulk Data Transmission - Sliding Window ・ Window制御を用いたパケット転送 ①Sliding Window (Receiver設定) ②Congestion Window(Sender設定) (1) ACKなしにwindow数のパケットを転送 (2) ACKのAggregation(ACKパケットの減少) (3) Receiver側によるwindow幅の制御 (4) ACK受信でwindowをスライドさせる 30 TCP Sliding Window Offered window (advertised by receiver) Unsent window 1 2 3 4 5 sent and ACKed sent but not ACKed 6 7 Can send ASAP 8 9 10 11 … Can not send until window slides 31 TCP Sliding Window Sent “3” and “4” Offered window (advertised by receiver) Unsent window 1 2 3 4 5 6 7 8 sent and ACKed 9 10 Can send ASAP sent but not ACKed 3+window=9 Receive ack “5” from receiver 11 … Can not send until window slides 5+window=11 Receive ack “5” from receiver TCP Sliding Window Window advertise by receiver shrink enlarge window closed by ACK reception = ACKed SN Opend by ACK reception (=ack+window) Slide window by ACK from receiver 33 bsdi.1023 1 3 4 5 6 9 svr4.discard SYN 0:0(0) win4096 <mss1024> 2 SYN 3:3(0) ack 1 win4096 <mss1024> ack 4 win4096 PSH 1:1025(1024) ack 4 win4096 PSH 1025:2049(1024) ack 4 win4096 PSH 2049:3073(1024) ack 4 win4096 ack 2049 win4096 7 ack 3073 win3072 8 PSH 3073:4097(1024) ack 4 win4096 ack 4097 win4096 11 PSH 4097:5121(1024) 12 PSH 5121:6145(1024) ack 4 win4096 13 PSH 6145:7169(1024) ack 4 win4096 ack 4 win4096 15 ack 6145 win4096 PSH 7169:8193(1024) ack 4 win4096 ack 8193 win4096 17 10 20 ack 5 win4096 ・ Window Shrink ; “7”: 4096 → 3072 14 (*) aggregate ACK 16 FI 8193:8193(0) ack 4 win4096 ack 8194 win4096 FIN 4:4(0) ack 8194 win4096 ・ Window制御 - window = 4096 - mss = 1024 → 4 segments は、ACK なしに転送可能。 18 19 - “7” ← “4” & ”5” - “10” ← “6” & “9” - “14” ← “11” & “12” - “16” ← “13” & “15” 34 bsdi.1023 1 3 4 5 6 svr4.discard SYN 0:0(0) win4096 <mss1024> SYN 3:3(0) ack 1 win4096 <mss1024> ack 4 win4096 PSH 1:1025(1024) ack 4 win4096 PSH 1025:2049(1024) ack 4 win4096 PSH 2049:3073(1024) ack 4 win4096 PSH 3073:4097(1024) ack 4 win4096 ack 4097 win 0 ack 4097 win 4096 10 PSH 4097:5121(1024) 11 PSH 5121:6145(1024) ack 4 win4096 12 PSH 6145:7169(1024) ack 4 win4096 ack 4 win4096 13 2 8 9 10 FIN PSH 7169:8193(1024) ack 4 win4096 ack 8193 win 0 ack 8193 win 4096 FIN 4:4(0) ack 8194 win4096 14 15 16 [Fast Sender,Slow Receiver] ・ Window shrink - “8” : 4096 → 0 - “14” : 4096 → 0 ・ Window enlarge (= window update) - “9” : 0 → 4096 - “15” : 0→ 4096 (*) segment “13” : FINのPiggy-Back 17 ack 8193 win4096 35 TCP Congestion Window Offered window (advertised by receiver) Unsent window 1 2 sent and ACKed 3 4 5 6 7 Shall not send ASAP Congestion window (“cwnd”=1 ) 8 9 10 11 … Can not send until window slides → sent but not ACKed 36 TCP Congestion Window Sent “3” Offered window (advertised by receiver) Unsent window 1 2 3 sent and ACKed 4 5 6 7 8 9 Shall not send ASAP Shall send without ACK ASAP; cwnd=2 (cwnd←cwnd*2) 10 Can not send until window slides 3+window=9 4+window=10 Receive ack “4” from receiver 11 … Receive ack “4” from receiver TCP Congestion Window ・ Slow Start Policy (cwnd ; exponential increase) cwnd = 1 ; for (セグメント転送) { for (not congestion) { if (セグメント転送ACK受信) { cwnd = cnwd +1 } cwnd = 1 } (*)注意 : Congestion Avoidance では若干異なる。 SenderがLocal に制御することなので、変えることが容易に可能 38 TCP Congestion Window advertised_window congestion cwnd cwnd advertised_window time < Congestionなしの場合 > time < Congestion経験の場合 > (*) Duplicated ACKを使用せず 39 TCP Congestion Window(1) 1 [受] [送] [受] [送] 1 1 1 1 1 1 1 40 TCP Congestion Window(2) 3 2 [受] [送] [受] [送] 2 3 2 2 3 3 2 2 3 3 2 2 3 41 TCP Congestion Window(3) 4 7 [受] [送] 6 5 [受] [送] 4 5 4 7 4 6 5 5 7 4 4 7 6 6 5 5 6 4 4 5 6 7 42 TCP Congestion Window(4) 8 12 [受] [送] 5 6 [受] [送] 8 7 9 8 6 7 10 9 11 10 9 13 12 8 9 8 14 13 8 7 11 10 11 10 9 8 9 10 15 14 8 9 12 11 10 13 12 11 43 bsdi.1029 cwnd =1 1 svr4.discard 1:513(512) ack1 win4096 ack 513 win 8192 cwnd =2 3 4 cwnd =3 6 7 cwnd =4 9 10 513:1025(512) ack1 win4096 1025:1537(512) ack1 win4096 ack 1025 win 8192 1537:2049(512) ack1 win4096 ack 1537 win 8192 2561:3073(512) ack1 win4096 8 3037:3585(512) ack1 win4096 13 FIN PSH 3585:4097(512) ack 1 win4096 ack 3073 win 8192 ack 3585 win 8192 ack 4098 win 7680 18 5 2049:2561(512) ack1 win4096 ack 2049 win 8192 ack 2561 win 8192 cwnd =5 cwnd =6 2 FIN 1:1(0) ack 4098 win 8192 ack 2 win 4096 11 12 14 15 16 17 44 TCP Congestion Window 1. Advertised Window by Receiver 2. Congestion Window (cwnd ) defined by sender (*) ローカルに決めることが可能な値 最適なWindow(Advertised window)サイズ (1) 廃棄が起こらないくらいの大きさ → Congestion Avoidance (2) 最適パイプライン転送 → RTT x 帯域幅 トランスポートレイヤ技術 - TCP; Transmission Control Protocol (1) TCPの基本動作 (2) TCP Interactive Data Flow (3) TCP Bulk Data Flow (4) TCP Data Retransmission (5) TCP Persist Timer (6) TCP Keep Alive Timer (7) その他・将来 46 TCP Data Retransmission (1) Expire of Retransmission Timeout (RTO) Value - RTO calculation using RTT - Exponential Back-off (Max. 64 sec.) (2) Reception of Duplicated ACK - Fast Retransmission / Fast Recovery (3) Congestion Window (cwnd ) Control - Slow Start (exponential increase) - Congestion Avoidance (liner increase) 47 RTO Expired Retransmission bsdi.1023 1 3 4 再送間隔 1.5 sec 3 sec 6 7 8 SYN 0:0(0) win4096 <mss1024> SYN 3:3(0) ack 1 win4096 <mss1024> ack 4 win4096 PSH 1:15(14) ack 4 win4096 ack 15 win 4096 PSH 15:23(8) ack 4 win4096 PSH 15:23(8) ack 4 win4096 PSH 15:23(8) ack 4 win4096 6 sec 9 17 PSH 15:23(8) ack 4 win4096 PSH 15:23(8) ack 4 win4096 64 sec 18 svr4.discard 2 再送トライ (RTO; 再送タイマ) RTO = 1.5 sec /* 変更可能*/ 5 for ( 9 minutes) { if ( RTO expired) { retransmission; RTO=RTO x 2; RTO=min{64sec, RTO}; } } end /* 諦める */ PSH 15:23(8) ack 4 win4096 48 RTO Expired Retransmission 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 0.0 0.0048 0.0064 6.1022 6.2594 24.4801 25.4937 28.4937 34.4937 46.4844 70.4851 118.4864 182.4881 246.4899 310.4910 374.4934 438.4951 : ( 0.0048) ( 0.0016) ( 6.0958) ( 0.1571) (18.2207) ( 1.0136) ( 3.0001) ( 6.0002) (11.9905) (24.0007) (48.0013) (64.0018) (64.0018) (63.9917) (64.0018) (64.0015) bsdi.1029 > svr4.discard: S 1:1(0) win 4096 <mss 1024> svr4.discard > bsdi.1029: S 4:4(0) ack 2 bsdi.1029 > svr4.discard: . ack 5 bsdi.1029 > svr4.discard: P 1:15(14) ack svr4.discard > bsdi.1029: . ack 15 win bsdi.1029 > svr4.discard: P 15:23(8) ack bsdi.1029 > svr4.discard: P 15:23(8) ack bsdi.1029 > svr4.discard: P 15:23(8) ack bsdi.1029 > svr4.discard: P 15:23(8) ack bsdi.1029 > svr4.discard: P 15:23(8) ack bsdi.1029 > svr4.disacrd: P 15:23(8) ack bsdi.1029 > svr4.discard: P 15:23(8) ack bsdi.1029 > svr4.discard: P 15:23(8) ack bsdi.1029 > svr4.discard: P 15:28(8) ack bsdi.1029 > svr4.discard: P 15:23(8) ack bsdi.1029 > svr4.discard: P 15:23(8) ack bsdi.1029 > svr4.discard: P 15:23(8) ack : : : 49 5 5 5 5 5 5 5 5 5 5 5 5 5 RTO Expired Retransmission - RTO Exponential Back-Off - RTO 64 sec 48 sec 32 sec 面積(S) 16 sec 0 1 2 3 4 5 6 7 8 9 If (S ≦ 9 minutes) { continue; } else { abort; } 10 11 Retransmission回数 50 Timeout and Retransmission RTO (Retransmission TimeOut) (1) 古い計算方法 ; RTTの変動に弱い - RTO = 2 x RTT where RTT = α・RTTp + (1-α)・RTTM = 0.9xRTTp + 0.1xRTTM (2) 最近の計算方法 ; 平均偏差の利用 - RTO = 平均RTT + 4 x 平均偏差 51 Retransmission by Duplicated ACK (2) Reception of Duplicated ACK - Fast Retransmission / Fast Recovery Segment廃棄特性 ; → “single (or few) segment(S)” あるい は連続多数。 → 未ACKの同一ACK Segmentsを 複数(3回)受信したら、再送。 52 Fast Retransmission by Duplicated ACK 6401:6657(256) ack1 6657:6913(256) ack1 6913:7169(256) ack1 7169:7425(256) ack1 ack 5889 ack 6145 ack 6401 ack 6657 7425:7681(256) ack1 ack 6657 ① 7681:7937(256) ack1 ack 6657 ② 7937:8193(256) ack1 ack 6657 ③ 8193:8449(256) ack1 6657:6913(256) ack1 “Fast Retransmission” ack 6657 ack 6657 ack 6657 8449:8705(256) ack1 8705:8961(256) ack1 ack 8449 win5888 8961:9217(256) ack1 ack 8705 win5888 Congestion Window Control [目的] cwndの大きな振動を防ぎ、 適切なcwndで運用する [1] cwndの制御 (i) ssthresh以下のcwndサイズ → Exponential increase (slow start) (ii) ssthresh以上のcwndサイズ → Liner increase (congestion avoidance) [2] ssthreshの制御 (i) Timeout ; goto “1” (ii) Duplicated-ACK ; 1/2 cwnd=1; ssthresh=65KB; for () { if (“Timeout”) { cwnd=1; ssthresh = cwnd/2; } if (“duplicated ACK”) { ssthresh=cwnd / 2; cwnd=ssthresh; } if (cwnd ≦ ssthresh) { slow_start; /* exponential */ } else { congestion_avoidance; /* liner */ } } 54 Congestion Window Control (続) ・ ICMP 制御メッセージ (1) ICMP Source Quench → cwnd = 1 ; ssthresh = as is ; (2) Host unreachable → No Action ; 55 Congestion Congestion slow-start avoidance avoidance slow-start Congestion avoidance cwdn_1 cwdn_3 Timeout “ssthresh” Fast Recovery (cwnd_3) / 2 (cwnd_1) / 2 cwnd Target cnwd Fast Recovery 56 トランスポートレイヤ技術 - TCP; Transmission Control Protocol (1) TCPの基本動作 (2) TCP Interactive Data Flow (3) TCP Bulk Data Flow (4) TCP Data Retransmission (5) TCP Persist Timer (6) TCP Keep Alive Timer (7) その他・将来 57 TCP Persist Timer [機能] Advertised window size = “0” でも 1 Byte のデータは送信できるように する。 [目的] window=“0”時の デッドロックの回避。 受信側プロセスの生存確認 [Timer値制御] Exponential Back-off (Max. 60秒) 58 TCP Persist Timer 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 ( 0.1920) ( 0.0050) ( 0.0034) ( 0.0072) ( 0.0052) ( 0.0034) ( 0.0039) ( 0.0079) ( 0.0051) ( 0.0040) ( 0.0039) ( 0.1612) ( 4.9494) ( 0.0040) ( 4.9961) ( 0.0040) ( 5.9962) ( 0.0040) (11.9964) ( 0.0040) (23.9967) ( 0.0040) bsdi.1027 svr4.5555 bsdi.1027 bsdi.1027 svr4.5555 bsdi.1027 bsdi.1027 bsdi.1027 svr4.5555 bsdi.1027 bsdi.1027 bsdi.1027 svr4.5555 bsdi.1027 svr4.5555 bsdi.1027 svr4.5555 bsdi.1027 svr4.5555 bsdi.1027 svr4.5555 bsdi.1027 svr4.5555 > > > > > > > > > > > > > > > > > > > > > > > svr4.5555: bsdi.1027: svr4.5555: svr4.5555: bsdi.1027: svr4.5555: svr4.5555: svr4.5555: bsdi.1027: svr4.5555: svr4.5555: svr4.5555: bsdi.1027: svr4.5555: bsdi.1027: svr4.5555: bsdi.1027: svr4.5555: bsdi.1027: svr4.5555: bsdi.1027: svr4.5555: bsdi.1027: P . . . . . P P . P P P . . . . . . . . . . . 1:1025(1024) ack 1 win 4906 ack 1025 win 4906 1025:2049(1024) ack 1 win 4096 2049:3073(1024) ack 1 win 4096 ack 3073 win 4096 3073:4097(1024) ack 1 win 4096 4097:5121(1024) ack 1 win 4096 5121:6145(1024) ack 1 win 4096 ack 5121 win 4096 6145:7169(1024) ack 1 win 4096 7169:8193(1024) ack 1 win 4096 8193:9217(1027) ack 1 win 4096 ack 9217 win 0 9217:9218(1) ack 1 win 4096 ack 9217 win 0 9218:9219(1) ack 1 win 4096 ack 9218 win 0 9219:9220(1) ack 1 win 4096 ack 9219 win 0 9220:9221(1) ack 1 win 4096 ack 9220 win 0 9221:9222(1) ack 1 win 4096 59 ack 9221 win 0 トランスポートレイヤ技術 - TCP; Transmission Control Protocol (1) TCPの基本動作 (2) TCP Interactive Data Flow (3) TCP Bulk Data Flow (4) TCP Data Retransmission (5) TCP Persist Timer (6) TCP Keep Alive Timer (7) その他・将来 60 Keep Alive Timer [目的] ; プロセスダウンへの対応 (*) TCPコネクションの終了条件 (i) 明示的終了 (FIN segment) (ii) プロセスのダウン [機能] ; TCPコネクションの生存確認 - 2時間ごとにprobe - 10回(75秒間隔でprobe送信)応答なし → 受信プロセスのダウンと判断。 61 TCP Keepalive Timer 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 bsdi.1055 > svr4.echo: P 1:14(13) ack 1 ( 0.0061) svr4.echo > bsdi.1055: P 1:14(13) ack 14 ( 0.0087) bsdi.1055 > svr4.echo: . ack 14 (7199.8797) arp who-has svr4 tell bsdi ( 0.0021) arp reply svr4 is-at 0:0:c0:c2:9b:26 ( 0.0009) bsdi.1055 > svr4.echo: . ack 14 ( 0.0041) svr4.echo > bsdi.1055: . ack 14 (7200.1545) arp who-has svr4 tell bsdi ( 0.0021) arp reply svr4 is-at 0:0:c0:c2:9b:26 ( 0.0009) bsdi.1055 > svr4.echo: . ack 14 ( 0.0040) svr4.echo > bsdi.1055: . ack 14 (7200.1769) arp who-has svr4 tell bsdi ① ( 75.0021) arp who-has svr4 tell bsdi ② ( 75.0020) arp who-has svr4 tell bsdi ③ ( 75.0021) arp who-has svr4 tell bsdi ④ ( 75.1123) arp who-has svr4 tell bsdi ⑤ ( 75.0021) arp who-has svr4 tell bsdi ⑥ ( 75.0020) arp who-has svr4 tell bsdi ⑦ ( 74.9920) arp who-has svr4 tell bsdi ⑧ ( 75.0018) arp who-has svr4 tell bsdi ⑨ ( 75.0021) arp who-has svr4 tell bsdi ⑩ 62 トランスポートレイヤ技術 - TCP; Transmission Control Protocol (1) TCPの基本動作 (2) TCP Interactive Data Flow (3) TCP Bulk Data Flow (4) TCP Data Retransmission (5) TCP Persist Timer (6) TCP Keep Alive Timer (7) その他・将来 63 トランスポートレイヤ技術 - その他・将来 (1) Silly Window Syndrome (2) MTU Discovery (3) Window scaling for long fat-pipe (4) T/TCP (Transaction TCP) (5) Rate Control (6) ECN(Explicit Congestion Notification) 64 Path MTU Discovery [目的] 経路上でフラグメントされない最大のセグメン トサイズ(Path MTU)の検索 [方法] ICMPエラーメッセージの利用 (DF; Don’t Fragment オプション) [頻度] 10分ごとに再検索 65 1 (0.0 2 3 4 5 6 7 8 9 10 11 12 ) solaris.33016 > slip.discard: S 1:1(0) win 8760 <mss 1460> (DF) (0.1016) slip.discard > solaris.33016: S 1:1(0) ack 1 win 4096 <mss 512> (0.5290) solaris.33016 > slip.discard: P 1:513(512) ack 1 win 4096 <mss 512> (0.0038) bsdi > solaris: icmp: slip unreachable - need to frag, mtu = 296 (DF) (0.0259) solaris.33016 > slip.discard: F 513:513(0) ack 1 win 9216 (DF) (0.0923) slip.discard > solaris.33016: . ack 1 win 4096 (0.3577) solaris.33016 > slip.discard: P 1:257(256) ack 1 win 9216 (DF) (0.3290) slip.doscard > solaris.33016: . ack 257 win 3840 (0.3308) solaris.33016 > slip.discard: FP 257:513(256) ack 1 win 9216 (DF) (0.3208) slip.discard > solaris.33016: . ack win 3840 (0.0422) slip.discard > solaris.33016: F 1:1(0) ack 514 win4096 (0.1719) slip.discard > splaris.33016: . ack 2 win 9216 (DF) <mss 1460> solaris <mss 296> bsdi 512 B (DF) slip ICMP “too big (mss 296)” 66 Window Scaling for Long Fat Pipe - RFC1323 Network Ethernet T1(大陸間) T1(衛星) T3(大陸間) OC12(大陸間) Bandwidth(bps) 10.000 M 1.544 M 1,544 M 45,000 M 2,400,000 M RTT(ms) 3 60 500 60 60 BWxRTT(B) 3,750 11,580 96,500 337,500 7,500,000 ・ Max. Window Size ; 2^(16) Bytes = 64KB → Window Scaling ; “wscale” wscale=n → 64 x 2^(n) windowサイズ 67 Window Scaling for Long Fat Pipe 1 2 3 4 5 6 7 8 9 10 11 ( 0.0031) ( 0.2972) (16.6198) ( 0.0030) ( 0.2971) ( 9.4202) ( 0.0024) ( 0.0013) ( 0.2363) (17.5200) 12 ( 0.0031) 13 ( 0.2967) vangogh.4107 > bsdi.echo: S 1:1(0) win 65535 <mss 512, nop, wscale 1, nop, nop, timestamp, 995351> bsdi.echo > vangogh.4107: S 1:1(0) ack 1 win 4906 <mss 512> vangogh.4107 > bsdi.echo: . ack 1 win 65535 vangogh.4107 > bsdi.echo: P 1:14(13) ack 1 win 65535 bsdi.echo > vangogh.4107: P 1:14(13) ack 14 win 4096 vangogh.4107 > bsdi.echo: . ack 14 win 65535 vangogh.4107 > bsdi.echo: F 14:14(0) ack 14 win 65535 bsdi.echo > vangogh.4107: . ack 15 win 4096 bsdi.echo > vangogh.4107: F 14:14(0) ack 15 win 4096 vangogh.4107 > bsdi.echo: . ack 15 win 65535 vangogh.4107 > bsdi.echo: S 1:1(0) win 65535 <mss 512, nop, wscale 2, nop, nop, timestamp 995440> bsdi.echo > vangogh.4107: S 1:1(0) ack 1 win 4096 <mss 512> vangogh.4107 > bsdi.echo: . ack 1 win 65535 nop; no operation wscale ; window scale 68 RFC 1379 ; T/TCP - Transaction TCP [目的] TCPコネクションの確立・開放手続きの 速度アップ [方法] ・ CC (Connection Count) Option ・ SYNへのPiggy-back ; “half-synchronization” (1) SYN, Data, FIN, CC (2) SYN, SYN-ACK, Data, FIN, FIN-ACK, CC, CC-Echo (3) FIN-ACK 69 RFC 1379 ; T/TCP Server Client SYN (a) SYN_ACK(a+1,b) ACK(b+1) Server Client SYN,Data,FIN,CC SYN,S-ack,Data, F,F-ack FIN-ACK Data (a+2) Data_ACK(a+2,b+1) FIN (m,s) ACK (m+1) FIN_ACK (m+1,s) ACK (s+1) 9 セグメント → 3 セグメント 70 TCPレート制御 Destination Node 1 Source Node 1 Window 制御 Rate 制御 Line Speed with Window Shaped Transmission with Window 71 TCPレート制御 Window帯 平均転送速度 Window 制御 ピーク転送速度 Line Speed with Window パケット廃棄 Rate 制御 パケット廃棄 パケット廃棄 Window帯 平均転送速度 ピーク転送速度 Shaped Transmission with Window RED ECN RED ECN RED ECN 72 ECN(Explicit Congestion Notification)制御 TOS for Differentiated Service - PHB(Per-Hop-Behavior) - CU(Currently Unused) => for ECN(Explicit Congestion Notification) ? 0 1 TOSフィールド: PHB: 000000 101110 Others xxxxx0 xxxx11 xxxx01 2 3 4 PHB 5 6 7 CU DE (Default Service) EF (Expedited Forwarding) AF (Assured Forwarding) Standard Purpose Experimental Purpose Experimental Purpose 73 Explicit Congestion Notification (ECN)制御 (8) ECN=11 Congestion Node (Set ECN bit) Reduce Speed (2) ECN=01 Destination Node 1 (9) ECN=11 (3) ECN=01 (1) ECN=00 (7) ECN=10 (5) ECN=11 (4) ECN=10 Source Node 1 (6) ECN=11 Reduce Speed 輻輳を未然に防ぐ; 予防的制御 74