COMS/CSEE 4140 Networking Laboratory Lecture 06 Salman Abdul Baset Spring 2008 Announcements Lab 4 (5-7) due next week before your lab slot Prelab 5 due next week. There will be Lab 5 next week. Midterm (March 10th, duration ~1.5 hours) Assignment 2 issues aslookup compilation? ISP name: nslookup or whois for IP address Lab 4 (count-to-infinity issues) 2 Agenda Autonomous Systems (AS) Policy vs. distance based routing Border gateway protocol (BGP) Transmission control protocol (TCP) 3 Autonomous Systems Terminology local traffic = traffic with source or destination in AS transit traffic = traffic that passes through the AS Stub AS = has connection to only one AS, only carry local traffic Multihomed AS = has connection to >1 AS, but does not carry transit traffic Transit AS = has connection to >1 AS and carries transit traffic 4 Stub and Transit Networks AS 1 AS 1, AS 2, and AS 5 are stub networks AS 2 is a multihomed stub network AS 3 and AS 4 are transit networks AS 2 AS 3 AS 4 AS 5 5 Selective Transit Example: AS 1 Transit AS 3 carries traffic between AS 1 and AS 4 and between AS 2 and AS 4 But AS 3 does not carry traffic between AS 1 and AS 2 The example shows a routing policy. AS 2 AS 3 AS 4 6 Customer/Provider AS 2 Customer/ Provider Customer/ Provider AS 4 Customer/ Provider AS 6 AS 5 Customer/ Provider AS 6 Customer/ Provider AS 6 A stub network typically obtains access to the Internet through a transit network. Transit network that is a provider may be a customer for another network Customer pays provider for service 7 Customer/Provider and Peers AS 1 AS 2 AS 3 Peers Peers Customer/ Provider Customer/ Provider Customer/ Provider AS 4 AS 5 Customer/Provider AS 6 Customer/ Provider AS 6 AS 6 Transit networks can have a peer relationship Peers provide transit between their respective customers Peers do not provide transit between peers Peers normally do not pay each other for service 8 Shortcuts through peering AS 1 AS 2 AS 3 Peers Peers Customer/ Provider Customer/ Provider AS 4 AS 5 Customer/ Provider AS 6 Peers Customer/Provider Customer/ Provider AS 6 AS 6 Note that peering reduces upstream traffic Delays can be reduced through peering But: Peering may not generate revenue 9 ASNs already assigned Source: http://www.potaroo.net/tools/asn32/ private ASN: 65412 – 65536 10 ASNs in use 11 ASN projections 12 ARDs versus ASes Autonomous Routing Domains Don’t Always Need BGP or an ASN Qwest Nail up routes 130.132.0.0/16 pointing to Yale Nail up default routes 0.0.0.0/0 pointing to Qwest Yale University 130.132.0.0/16 Static routing is the most common way of connecting an autonomous routing domain to the Internet. This helps explain why BGP is a mystery to many … 13 ASNs Can Be “Shared” (RFC 2270) AS 701 UUNet AS 7046 Crestar Bank AS 7046 NJIT AS 7046 Hood College 128.235.0.0/16 ASN 7046 is assigned to UUNet. It is used by Customers single homed to UUNet, but needing BGP for some reason (load balancing, etc..) [RFC 2270] 14 ARDs and ASes: Summary Most ARDs have no ASN (statically routed at Internet edge) Some unrelated ARDs share the same ASN (RFC 2270) Some ARDs are implemented with multiple ASNs (example: Worldcom) ASes are just an implementation detail of Inter-domain routing 15 Agenda Autonomous Systems (AS) Policy vs. distance based routing Border gateway protocol (BGP) Transmission control protocol (TCP) 16 Why not minimize “AS hop Count”? National ISP1 National ISP2 YES NO Regional ISP3 Cust3 Regional ISP2 Cust2 Regional ISP1 Cust1 Shortest path routing is not compatible with commercial relations 17 Customer versus Provider provider provider customer IP traffic customer Customer pays provider for access to the Internet 18 The “Peering” Relationship peer provider peer customer Peers provide transit between their respective customers Peers do not provide transit between peers traffic allowed traffic NOT allowed Peers (often) do not exchange $$$ 19 Peering Provides Shortcuts Peering also allows connectivity between the customers of “Tier 1” providers. peer provider peer customer20 Peering Wars Peer Reduces upstream transit costs Can increase end-to-end performance May be the only way to connect your customers to some part of the Internet (“Tier 1”) Don’t Peer You would rather have customers Peers are usually your competition Peering relationships may require periodic renegotiation Peering struggles are by far the most contentious issues in the ISP world! Peering agreements are often confidential. 21 Agenda Autonomous Systems (AS) Policy vs. distance based routing Border gateway protocol (BGP) Transmission control protocol (TCP) 22 The Gang of Four Link State IGP EGP OSPF IS-IS Vectoring RIP BGP 23 BGP Overview BGP = Border Gateway Protocol v4 . RFC 1771. (~ 60 pages) Note: In the context of BGP, a gateway is nothing else but an IP router that connects autonomous systems. Interdomain routing protocol for routing between autonomous systems. Uses TCP to establish a BGP session and to send routing messages over the BGP session. Update only new routes. BGP is a path vector protocol. Routing messages in BGP contain complete routes. Network administrators can specify routing policies. 24 BGP Policy-based Routing Each node is assigned an AS number (ASN) BGP’s goal is to find any AS-path (not an optimal one). Since the internals of the AS are never revealed, finding an optimal path is not feasible. Network administrator sets BGP’s policies to determine the best path to reach a destination network. 25 The Border Gateway Protocol (BGP) BGP = + RFC 1771 “optional” extensions RFC 1997 (communities) RFC 2439 (damping) RFC 2796 (reflection) RFC3065 (confederation) … + routing policy configuration languages (vendor-specific) + Current Best Practices in management of Interdomain Routing BGP was not DESIGNED. It EVOLVED. 26 BGP Route Processing Open ended programming. Constrained only by vendor configuration language Receive Apply Policy = filter routes & BGP Updates tweak attributes Apply Import Policies Based on Attribute Values Best Routes Best Route Selection Best Route Table Apply Policy = filter routes & tweak attributes Transmit BGP Updates Apply Export Policies Install forwarding Entries for best Routes. IP Forwarding Table 27 BGP Attributes Value ----1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 ... 255 Code --------------------------------ORIGIN AS_PATH NEXT_HOP MULTI_EXIT_DISC LOCAL_PREF ATOMIC_AGGREGATE AGGREGATOR COMMUNITY ORIGINATOR_ID CLUSTER_LIST DPA ADVERTISER RCID_PATH / CLUSTER_ID MP_REACH_NLRI MP_UNREACH_NLRI EXTENDED COMMUNITIES Reference --------[RFC1771] [RFC1771] [RFC1771] [RFC1771] [RFC1771] [RFC1771] [RFC1771] [RFC1997] [RFC2796] [RFC2796] [Chen] [RFC1863] [RFC1863] [RFC2283] [RFC2283] [Rosen] Most important attributes reserved for development From IANA: http://www.iana.org/assignments/bgp-parameters Not all attributes need to be present in every announcement 28 LOCAL_PREF Attribute Forces outbound traffic to take primary link, unless link is down.29 NEXT_HOP Attribute EGP: IP address used to reach the advertising router IGP: next-hop address is carried into local AS 30 AS_PATH Attribute Used to detect routing loops and find shortest paths 31 Shedding Inbound Traffic with ASPATH Prepending AS 1 Prepending will (usually) force inbound traffic from AS 1 to take primary link provider 192.0.2.0/24 ASPATH = 2 2 2 192.0.2.0/24 ASPATH = 2 primary backup customer AS 2 192.0.2.0/24 Yes, this is a Glorious Hack … 32 … But Padding Does Not Always Work AS 1 AS 3 provider provider 192.0.2.0/24 ASPATH = 2 192.0.2.0/24 ASPATH = 2 2 2 2 2 2 2 2 2 2 2 2 2 primary backup customer AS 2 192.0.2.0/24 AS 3 will send traffic on “backup” link because it prefers customer routes and local preference is considered before ASPATH length! Padding in this way is often used as a form of load 33 balancing COMMUNITY Attribute to the Rescue! AS 1 AS 3 provider provider AS 3: normal customer local pref is 100, peer local pref is 90 192.0.2.0/24 ASPATH = 2 COMMUNITY = 3:70 192.0.2.0/24 ASPATH = 2 primary backup customer AS 2 192.0.2.0/24 Customer import policy at AS 3: If 3:90 in COMMUNITY then set local preference to 90 If 3:80 in COMMUNITY then set local preference to 80 If 3:70 in COMMUNITY then set local preference to 70 34 BGP Issues - What is a BGP Wedgie? BGP ¾ wedgie Full wedgie policies make sense locally Interaction of local policies allows multiple stable routings Some routings are consistent with intended policies, and some are not If an unintended routing is installed (BGP is “wedged”), then manual intervention is needed to change to an intended routing When an unintended routing is installed, no single group of network operators has enough knowledge to debug the problem 35 YouTube blocking Pakistan blocks YouTube How? (according to BBC) Advertise a shorter route to reach YouTube The incorrect short route gets propagated Seen by two thirds of the Internet Traffic to YouTube goes through Pakistan Since Pakistan blocked YouTube, all traffic reaches a dead end! 36 Dynamic Routing Protocols: Summary Dynamic routing protocols: RIP, OSPF, BGP RIP uses distance vector algorithm, and converges slow (the count-to-infinity problem) OSPF uses link state algorithm, and converges fast. But it is more complicated than RIP. Both RIP and OSPF finds lowest-cost path. BGP uses path vector algorithm, and its path selection algorithm is complicated, and is influenced by policies. BGP has its own problems see WIDGI by Tim Griffin 37 More Readings (Optional) BGP Wedgies: Bad Routing Policy Interactions that Cannot be Debugged JI’s Intro to interdomain routing. "Interdomain Setting of PlanetLab Nodes." PlanetLab Meeting, May 14, 2004. Understanding the Border Gateway Protocol (BGP) ICNP 2002 Tutorial Session 38 Agenda Autonomous Systems (AS) Policy vs. distance based routing Border gateway protocol (BGP) Transmission control protocol (TCP) 39 Transmission Control Protocol (RFC) Reliable and in-order byte-stream service TCP format Connection establishment Flow control Reaction to congestion Packet corruption 40 TCP Format • TCP segments have a 20 byte header with >= 0 bytes of data. IP header TCP header 20 bytes TCP data 20 bytes 0 15 16 Source Port Number 31 Destination Port Number Sequence number (32 bits) header length 0 Flags TCP checksum 20 bytes Acknowledgement number (32 bits) window size urgent pointer Options (if any) DATA 41 TCP header fields Sequence Number (SeqNo): Sequence number is 32 bits long. So the range of SeqNo is 0 <= SeqNo <= 232 -1 4.3 Gbyte Each sequence number identifies a byte in the byte stream Initial Sequence Number (ISN) of a connection is set during connection establishment Q: What are possible requirements for ISN ? 42 TCP header fields Acknowledgement Number (AckNo): Acknowledgements are piggybacked, i.e., a segment from A -> B can contain an acknowledgement for a data sent in the B -> A direction Q: Why is piggybacking good ? A hosts uses the AckNo field to send acknowledgements. (If a host sends an AckNo in a segment it sets the “ACK flag”) The AckNo contains the next SeqNo that a hosts wants to receive Example: The acknowledgement for a segment with sequence numbers 0-1500 is AckNo=1501 43 TCP header fields Acknowledge Number (cont’d) TCP uses the sliding window flow protocol (see CS 457) to regulate the flow of traffic from sender to receiver TCP uses the following variation of sliding window: no NACKs (Negative ACKnowledgement) only cumulative ACKs Example: Assume: Sender sends two segments with “1..1500” and “1501..3000”, but receiver only gets the second segment. In this case, the receiver cannot acknowledge the second packet. It can only send AckNo=1 44 TCP header fields Header Length ( 4bits): Length of header in 32-bit words Note that TCP header has variable length (with minimum 20 bytes) 45 TCP header fields Flag bits: URG: Urgent pointer is valid If the bit is set, the following bytes contain an urgent message in the range: SeqNo <= urgent message <= SeqNo+urgent pointer ACK: Acknowledgement Number is valid PSH: PUSH Flag Notification from sender to the receiver that the receiver should pass all data that it has to the application. Normally set by sender when the sender’s buffer is empty 46 TCP header fields Flag bits: RST: Reset the connection The flag causes the receiver to reset the connection Receiver of a RST terminates the connection and indicates higher layer application about the reset SYN: Synchronize sequence numbers Sent in the first packet when initiating a connection FIN: Sender is finished with sending Used for closing a connection Both sides of a connection must send a FIN 47 TCP header fields Window Size: TCP Checksum: Each side of the connection advertises the window size Window size is the maximum number of bytes that a receiver can accept. Maximum window size is 216-1= 65535 bytes TCP checksum covers over both TCP header and TCP data (also covers some parts of the IP header) 16-bit one’s complement Urgent Pointer: Only valid if URG flag is set 48 TCP header fields Options: End of Options kind=0 1 byte NOP (no operation) kind=1 1 byte Maximum Segment Size Window Scale Factor Timestamp kind=2 len=4 maximum segment size 1 byte 1 byte 2 bytes kind=3 len=3 shift count 1 byte 1 byte 1 byte kind=8 len=10 timestamp value timestamp echo reply 1 byte 1 byte 4 bytes 4 bytes 49 TCP header fields Options: NOP is used to pad TCP header to multiples of 4 bytes Maximum Segment Size Window Scale Options Increases the TCP window from 16 to 32 bits, i.e., the window size is interpreted differently Q: What is the different interpretation ? This option can only be used in the SYN segment (first segment) during connection establishment time Timestamp Option Can be used for roundtrip measurements 50 Three-Way Handshake aida.poly.edu mng.poly.edu S 103188 0193:103 1880193( win 16384 0) <mss 146 0, ...> 8586(0) 8 4 2 7 :1 6 8 5 8 8 S 1724 <mss 1460> 0 6 7 8 in w 4 9 ack 10318801 ack 172488 587 win 175 20 51 Why is a Two-Way Handshake not enough? aida.poly.edu S 1031 880193 :10318 win 16 384 <m 80193(0) ss 146 0, ...> S 1532 211235 win 163 4:1532211235 4 84 <ms s 1460, (0) ...> 6(0) 8 5 8 8 :1724 > 6 8 5 8 48 460 S 172 0 <mss 1 76 win 8 mng.poly.edu The red line is a delayed duplicate packet. Will be discarded as a duplicate SYN When aida initiates the data transfer (starting with SeqNo=15322112355), mng will reject all data. 52 TCP Connection Termination aida.poly.edu mng.poly.edu F 172488734:172488734(0) ack 1031880221 win 8733 . ack 17 2488735 win 174 84 F 10318 80221:1 0318802 ack 1 72 21(0) 488735 win 175 20 222 win . ack 1031880 8733 53 Connection termination with tcpdump aida issues an "telnet mng" aida.poly.edu mng.poly.edu 1 mng.poly.edu.telnet > aida.poly.edu.1121: F 172488734:172488734(0) ack 1031880221 win 8733 2 aida.poly.edu.1121 > mng.poly.edu.telnet: . ack 172488735 win 17484 3 aida.poly.edu.1121 > mng.poly.edu.telnet: F 1031880221:1031880221(0) ack 172488735 win 17520 4 mng.poly.edu.telnet > aida.poly.edu.1121: . ack 1031880222 win 8733 54 TCP States in “Normal” Connection Lifetime SYN_SENT (active open) SYN (SeqNo = x) y, AckNo = o N q e (S N Y S =x+1) LISTEN (passive open) SYN_RCVD (AckNo = y + 1 ) ESTABLISHED ESTABLISHED FIN_WAIT_1 (active close) FIN_WAIT_2 TIME_WAIT FIN (SeqNo = m) (AckNo = m+ 1 ) CLOSE_WAIT (passive close) FIN (SeqNo = n ) (AckNo = LAST_ACK n+1) CLOSED 55 TCP State Transition Diagram Opening A Connection CLOSED passive open send: . / . LISTEN recv: RST close or timeout active open send: SYN Application sends data send: SYN recv: SYN send: SYN, ACK SYN RCVD recvd: ACK send: . / . send: FIN simultaneous open recv: SYN send: SYN, ACK SYN SENT recv: SYN, ACK send: ACK ESTABLISHED recvd: FIN send: FIN 56 TCP State Transition Diagram Closing A Connection active close send: FIN ESTABLISHED FIN_WAIT_1 recv: ACK send: . / . recv: FIN send: ACK recv: FIN, ACK send: ACK FIN_WAIT_2 recv: FIN send: ACK CLOSING recvd: ACK send: . / . passive close recv: FIN send: ACK CLOSE_WAIT application closes send: FIN LAST_ACK TIME_WAIT Timeout (2 MSL) recv: ACK send: . / . CLOSED Issue close() 57 2MSL Wait State 2MSL Wait State = TIME_WAIT When TCP does an active close, and sends the final ACK, the connection must stay in in the TIME_WAIT state for twice the maximum segment lifetime. 2MSL= 2 * Maximum Segment Lifetime Why? TCP is given a chance to resent the final ACK. (Server will timeout after sending the FIN segment and resend the FIN) The MSL is set to 2 minutes or 1 minute or 30 seconds. 58 Rules for sending Acknowledgments TCP has rules that influence the transmission of acknowledgments Rule 1: Delayed Acknowledgments Goal: Avoid sending ACK segments that do not carry data Implementation: Delay the transmission of (some) ACKs Rule 2: Nagle’s rule Goal: Reduce transmission of small segments Implementation: A sender cannot send multiple segments with a 1-byte payload (i.e., it must wait for an ACK) 59 Delayed Acknowledgement TCP delays transmission of ACKs for up to 200ms Goal: Avoid to send ACK packets that do not carry data. The hope is that, within the delay, the receiver will have data ready to be sent to the receiver. Then, the ACK can be piggybacked with a data segment In Example: Delayed ACK explains why the “ACK of character” and the “echo of character” are sent in the same segment The duration of delayed ACKs can be observed in the example when Argon sends ACKs Exceptions: ACK should be sent for every second full sized segment Delayed ACK is not used when packets arrive out of order 60 Observing Delayed Acknowledgements • Remote terminal applications (e.g., Telnet) send characters to a server. The server interprets the character and sends the output at the server to the client. • For each character typed, you see three packets: 1. Client Server: Send typed character 2. Server Client: Echo of character (or user output) and acknowledgement for first packet 3. Client Server: Acknowledgement for second packet 61 Observing Delayed Acknowledgements Telnet session from Argon to Neon Argon Neon This is the output of typing 3 (three) characters : Time 44.062449: Time 44.063317: Time 44.182705: Argon Neon: Push, SeqNo 0:1(1), AckNo 1 Neon Argon: Push, SeqNo 1:2(1), AckNo 1 Argon Neon: No Data, AckNo 2 Time 48.946471: Time 48.947326: Time 48.982786: Argon Neon: Push, SeqNo 1:2(1), AckNo 2 Neon Argon: Push, SeqNo 2:3(1), AckNo 2 Argon Neon: No Data, AckNo 3 Time 55.116581: Time 55.117497: Time 55.183694: Argon Neon: Push, SeqNo 2:3(1) AckNo 3 Neon Argon: Push, SeqNo 3:4(1) AckNo 3 Argon Neon: No Data, AckNo 4 62 Why 3 segments per character? character We would expect four segments per character: cter ACK of chara c echo of chara ter ACK of echoed character character But we only see three segments per character: ACK and echo of character ACK of echoed character This is due to delayed acknowledgements 63 Observing Nagle’s Rule Telnet session between argon.cs.virginia.edu and tenet.cs.berkeley.edu argon.cs.virginia.edu 3000 miles tenet.cs.berkeley.edu This is the output of typing 7 characters : Time 16.401963: Time 16.481929: Argon Tenet: Push, SeqNo 1:2(1), AckNo 2 Tenet Argon: Push, SeqNo 2:3(1) , AckNo 2 Time 16.482154: Time 16.559447: Argon Tenet: Push, SeqNo 2:3(1) , AckNo 3 Tenet Argon: Push, SeqNo 3:4(1), AckNo 3 Time 16.559684: Time 16.640508: Argon Tenet: Push, SeqNo 3:4(1), AckNo 4 Tenet Argon: Push, SeqNo 4:5(1) AckNo 4 Time 16.640761: Time 16.728402: Argon Tenet: Push, SeqNo 4:8(4) AckNo 5 Tenet Argon: Push, SeqNo 5:9(4) AckNo 8 64 Observing Nagle’s Rule char1 Observation: Transmission of segments follows a different pattern, i.e., there are only two segments per character typed r1 + echo of cha ACK of char 1 ACK + char2 f char2 ACK + echo o ACK + char3 f char3 ACK + echo o Delayed acknowledgment does not kick in at Argon The reason is that there is always data at Argon ready to sent when the ACK arrives Why is Argon not sending the data (typed character) as soon as it is available? ACK + char4-7 f char3 ACK + echo o 65 Resetting Connections Resetting connections is done by setting the RST flag When is the RST flag set? Connection request arrives and no server process is waiting on the destination port Abort (Terminate) a connection Causes the receiver to throw away buffered data. Receiver does not acknowledge the RST segment 66 TCP Congestion Control TCP has a mechanism for congestion control. The mechanism is implemented at the sender The window size at the sender is set as follows: Send Window = MIN (flow control window, congestion window) where flow control window is advertised by the receiver congestion window is adjusted based on feedback from the network 67 TCP Congestion Control TCP congestion control is governed by two parameters: Congestion Window (cwnd) Slow-start threshhold Value (ssthresh) Initial value is 216-1 Congestion control works in two modes: slow start (cwnd < ssthresh) congestion avoidance (cwnd ≥ ssthresh 68 Slow Start Initial value: Note: Unit is a segment size. TCP actually is based on bytes and increments by 1 MSS (maximum segment size) The receiver sends an acknowledgement (ACK) for each Segment Set cwnd = 1 Note: Generally, a TCP receiver sends an ACK for every other segment. Each time an ACK is received by the sender, the congestion window is increased by 1 segment: cwnd = cwnd + 1 If an ACK acknowledges two segments, cwnd is still increased by only 1 segment. Even if ACK acknowledges a segment that is smaller than MSS bytes long, cwnd is increased by 1. Does Slow Start increment slowly? Not really. In fact, the increase of cwnd is exponential 69 Slow Start Example The congestion window size grows very rapidly For every ACK, we increase cwnd by 1 irrespective of the number of segments ACK’ed TCP slows down the increase of cwnd when cwnd > ssthresh cwnd = 1 segment 1 t1 ACK for segmen cwnd = 2 cwnd = 4 segment 2 segment 3 ts 2 ACK for segmen ts 3 ACK for segmen segment 4 segment 5 segment 6 ts 4 ACK for segmen ts 5 ACK for segmen ts 6 ACK for segmen cwnd = 7 70 Congestion Avoidance Congestion avoidance phase is started if cwnd has reached the slow-start threshold value If cwnd ≥ ssthresh then each time an ACK is received, increment cwnd as follows: cwnd = cwnd + 1/ cwnd So cwnd is increased by one only if all cwnd segments have been acknowledged. 71 Example of Slow Start/Congestion Avoidance cwnd = 1 Assume that ssthresh = 8 cwnd = 2 cwnd = 4 14 cwnd = 8 10 ssthresh 8 6 4 cwnd = 9 2 6 t= 4 t= 2 t= 0 0 t= Cwnd (in segments) 12 Roundtrip times cwnd = 10 72 Responses to Congestion So, TCP assumes there is congestion if it detects a packet loss A TCP sender can detect lost packets via: Timeout of a retransmission timer Receipt of a duplicate ACK TCP interprets a Timeout as a binary congestion signal. When a timeout occurs, the sender performs: cwnd is reset to one: cwnd = 1 ssthresh is set to half the current size of the congestion window: ssthressh = cwnd / 2 and slow-start is entered 73 Fast Retransmit If three or more duplicate ACKs are received in a row, the TCP sender believes that a segment has been lost. 1K SeqNo=0 AckNo=1024 1K SeqNo=1 024 1K SeqNo=2 048 1. duplicate 1K SeqNo=3 072 2. duplicate Then TCP performs a retransmission of what seems to be the missing segment, without waiting for a timeout to happen. AckNo=1024 AckNo=1024 1K SeqNo=4 096 3. duplicate AckNo=1024 1K SeqNo=1 024 1K SeqNo=5 120 74 Fast Recovery Fast recovery avoids slow start after a fast retransmit cwnd=12 sshtresh=5 cwnd=12 sshtresh=5 1K SeqNo=0 AckNo=1024 1K SeqNo=1 024 Intuition: Duplicate ACKs indicate that data is getting through 1K SeqNo=2 048 1. duplicate After three duplicate ACKs set: Retransmit packet that is presumed lost ssthresh = cwnd/2 cwnd = cwnd+3 (note the order of operations) Increment cwnd by one for each additional duplicate ACK When ACK arrives that acknowledges “new data” (here: AckNo=6148), set: cwnd=ssthresh enter congestion avoidance AckNo=1024 cwnd=12 sshtresh=5 2. duplicate cwnd=12 sshtresh=5 3. duplicate 1K SeqNo=3 072 AckNo=1024 1K SeqNo=4 096 AckNo=1024 cwnd=15 sshtresh=6 1K SeqNo=1 024 1K SeqNo=5 120 ACK for new data cwnd=6 sshtresh=6 AckNo=6148 75 Flavors of TCP Congestion Control TCP Tahoe (1988, FreeBSD 4.3 Tahoe) Slow Start Congestion Avoidance Fast Retransmit TCP Reno (1990, FreeBSD 4.3 Reno) Fast Recovery New Reno (1996) SACK (1996) RED (Floyd and Jacobson 1993) 76 SACK SACK = Selective acknowledgment Issue: Reno and New Reno retransmit at most 1 lost packet per round trip time Selective acknowledgments: The receiver can acknowledge noncontinuous blocks of data (SACK 0-1023, 1024-2047) Multiple blocks can be sent in a single segment. TCP SACK: Enters fast recovery upon 3 duplicate ACKs Sender keeps track of SACKs and infers if segments are lost. Sender retransmits the next segment from the list of segments that are deemed lost. 77 TCP in Linux Congestion control algorithm is pluggable /proc/sys/net/ipv4/tcp_congestion_control TCP read and write buffer sizes /proc/sys/net/ipv4/tcp_r[w]mem 78 Midterm questions ARP, ICMP, UDP, TCP, RIP, OSPF, BGP Compare and contrast design principles in protocols. Fragmentation 79