Plan Ahead 5th week: Congestion control, TCP delay modeling Network protocols: IPv4, IPv6 6th week: network routing, routing in the Internet 7th week: Midterm Broadcast and multicast routing Before final: Data link layer, Ethernet, switches, wireless networking 5/3/05 1 CS118/Spring05 Congestion Control Congestion: “too many sources sending too much data too fast for network to handle” Scenario 1 2 identical senders, 2 receivers, one router w/infinite buffer, no retransmission when congested: large delays; maximum achievable throughput 5/3/05 2 CS118/Spring05 Congestion: scenario 2 one router, finite buffers; senders retransmit when timeout R/2 R/2 R/4 lout lout lout R/2 lin lin l 5/3/05 = in lout R/2 lin R/2 Data losses leads to l' in > lout 3 R/2 retransmission of delayed (not lost) packet makes l' in much larger than lout CS118/Spring05 Congestion: scenario 3 Host A lin : original data lout l'in : original data, plus retransmitted data Q: what happens as l in and l' in increase? finite shared output link buffers Host B • Long delays • superfluous retransmissions • when a packet is dropped, any “upstream transmission capacity” used for that packet was wasted! 5/3/05 4 CS118/Spring05 Approaches towards congestion control Network-assisted congestion control: routers provide feedback to end hosts Single bit congestion indication Explicit rate sender should send at End-end congestion control: no explicit feedback from network congestion inferred from end-system observed loss, delay 5/3/05 approach taken by TCP 5 CS118/Spring05 TCP Congestion Control Add a “congestion control window” congwin on top of flow-control window Congwin recvwin Sender limits LastByteSent-LastByteAcked CongWin How to adjust CongWin CongWin initialized to 1 mss, increase quickly until loss (= congestion) Upon loss: decrease congwin, then begin probing (increasing) again two “phases”: (1)slow start, (2)congestion avoidance • threshold defines the boundary between the two How the sender infers congestion: Timeout, or 3 duplicate ACKs 5/3/05 6 CS118/Spring05 Basic idea: learn from observations when congwin < threshold, increase congwin exponentially when congwin ≥ threshold, increase congwin linearly if packet lost, have gone too far threshold = congwin / 2 If 3 dup. ACKs: network capable of delivering some packets, congwin cut in half If timeout: slow-start again (congwin = 1 mss) Additive Increase, Multiplicative Decrease (AIMD) 5/3/05 7 CS118/Spring05 TCP SlowStart & Congestion Avoidance RTT initialize: Congwin = 1 threshold = RcvWindow if (CongWin < threshold) { for every segment ACKed Congwin++ } until (loss event) /* slowstart is over */ { for every w segments ACKed: Congwin++ } Until (loss event) time /* loss detected */ threshold = Congwin/2 If (3 dup. ACKs) Congwin = threshold Else Congwin = 1 mss 5/3/05 8 CS118/Spring05 TCP sender congestion control State 5/3/05 Event TCP Sender Action Commentary Slow Start (SS) Received CongWin = CongWin + MSS, ACK for If (CongWin > Threshold) previously unacked data set state to “Congestion Avoidance” Resulting in a doubling of CongWin every RTT Congestion Avoidance (CA) Received CongWin = CongWin+MSS * (MSS/CongWin) ACK for previously unacked data Additive increase, resulting in increase of CongWin by 1 MSS every RTT SS or CA Loss event detected by 3 duplicate ACK Threshold = CongWin/2, CongWin = Threshold, Set state to “Congestion Avoidance” Fast recovery, implementing multiplicative decrease. CongWin will not drop below 1 MSS. SS or CA Timeout Threshold = CongWin/2, CongWin = 1 MSS, Set state to “Slow Start” Enter slow start SS or CA Duplicate ACK Increment duplicate ACK count for segment being acked CongWin and Threshold not changed 9 CS118/Spring05 Is TCP fair? Fairness: if N TCP sessions share same bottleneck link, each should get 1/N of link capacity Example: 2 competing connections, same RTT Additive increase gives slope of 1 multiplicative decrease decreases throughput proportionally capacity R equal bandwidth share TCP connection 1 TCP conn 2 loss: decrease window by factor of 2 congestion avoidance: additive increase congestion avoidance: additive increase loss: decrease window by factor of 2 bottleneck router Connection 1 throughput R 5/3/05 10 CS118/Spring05 Fairness (more) Fairness and parallel TCP connections nothing prevents app from opening parallel cnctions do not want rate throttled by between 2 hosts. congestion control Web browsers do this Instead use UDP: Example: link of rate R pump audio/video at supporting 9 cnctions; Fairness and UDP Multimedia apps often do not use TCP constant rate, tolerate packet loss Research area: TCP friendly 5/3/05 11 new app asks for 1 TCP, gets rate R/10 new app asks for 11 TCPs, gets R/2 ! CS118/Spring05 Delay modeling Assumptions: Q: How long does it take to receive an object from a Web server after sending a request? Ignoring congestion, delay is influenced by: Assume one link between client and server of rate R no retransmissions (no loss, no corruption) Window size: First assume: fixed congestion window, W segments Then dynamic window, modeling slow start TCP connection establishment data transmission delay slow start 5/3/05 12 CS118/Spring05 Fixed congestion window (1) Notations: S: #bits in one segment O: #bits in one object R: bandwidth W: window size (# segments) K: O/WS Q: # times server idles if O=∞ P = min(Q, K-1) First case: WS/R > S/R+RTT ACK for first segment in window returns before window’s worth of data sent delay = 2RTT + O/R 5/3/05 13 CS118/Spring05 Fixed congestion window (2) Second case: WS/R < RTT + S/R: wait for ACK after sending window’s worth of data sent delay = 2RTT + O/R + (K-1)[S/R + RTT - WS/R] Server's waiting time 5/3/05 14 CS118/Spring05 TCP Delay Modeling: Slow Start (1) Delay components: • 2 RTTs for connection establish and request • O/R to transmit object • Server's idle time initiate TCP connection request object first window = S/R RTT second window = 2S/R Server idles: P = min{K-1,Q} times Example: • O/S = 15 segments • K = 4 windows •Q=2 • P = min{K-1,Q} = 2 Server idles P=2 times 5/3/05 third window = 4S/R fourth window = 8S/R complete transmission object delivered time at server time at client 15 CS118/Spring05 TCP Delay Modeling: Slow Start (2) Now suppose window grows according to slow start The delay for one object is: P O delay 2 RTT idleTime p R p 1 P O S k 1 S 2 RTT [ RTT 2 ] R R k 1 R O S S P 2 RTT P[ RTT ] (2 1) R R R 5/3/05 16 CS118/Spring05 HTTP Modeling Assume Web page consists of: 1 base HTML page (of size O bits) M images (each of size O bits) Non-persistent HTTP: M+1 TCP connections in series Response time = (M+1)O/R + (M+1)2RTT + sum of idle times Persistent HTTP: 2 RTT to request and receive base HTML file 1 RTT to request and receive M images Response time = (M+1)O/R + 3RTT + sum of idle times Non-persistent HTTP with X parallel connections Suppose M/X integer. 1 TCP connection for base file M/X sets of parallel connections for images. Response time = (M+1)O/R + (M/X + 1)2RTT + sum of idle times 5/3/05 17 CS118/Spring05 HTTP Response time (in seconds) RTT = 100 msec, O = 5 Kbytes, M=10 and X=5 20 18 16 14 12 10 8 6 4 2 0 non-persistent persistent parallel nonpersistent 28 Kbps 100 1 10 Kbps Mbps Mbps For low bandwidth, connection & response time dominated by transmission time. Persistent connections only give minor improvement over parallel connections. 5/3/05 18 CS118/Spring05 HTTP Response time (in seconds) RTT =1 sec, O = 5 Kbytes, M=10 and X=5 70 60 50 non-persistent 40 30 persistent 20 parallel nonpersistent 10 0 28 Kbps 100 1 10 Kbps Mbps Mbps For larger RTT, response time dominated by TCP establishment & slow start delays. Persistent connections now give important improvement: particularly in high delay bandwidth networks. 5/3/05 19 CS118/Spring05 Network layer transport segment from sending to receiving host Source host: encapsulates segments into packets Destination host: delivers segments to transport layer network layer protocols in every host and router Each router examines header fields in all packets passing through it Routing: calculate the best path to each destination Forwarding: move packets from input to output To transport protocol segment S segment Network protocol header 5/3/05 R 20 R D R CS118/Spring05 Makeup lectures on Monday June 6 There will be no class on Thursday June 9 To make it up: 8-9:50am Boelter 5422, or 6-7:50pm Boelter 5419 Pick the lesser evil one Additional office hours on the final exam day: Saturday June 11: 10:00AM - 1:00PM And the Final exam is: 3:00 - 6:00PM 5/3/05 21 CS118/Spring05 Always keep the big picture in mind host host HTTP message HTTP TCP segment TCP router IP Ethernet interface 5/3/05 HTTP IP packet Ethernet interface IP TCP router IP packet SONET interface SONET interface 22 IP IP packet Ethernet interface IP Ethernet interface CS118/Spring05 Network layer: Connection vs. connection-less service Virtual Circuit network provides connection-oriented service source-to-dest path works in a way much like telephone circuit Datagram network provides connectionless service The two services analogous to TCP vs. UDP at transport- layer, but: Network delivery service: host-to-host No choice: a given network provides one or the other but not both (as in transport layer) 5/3/05 23 CS118/Spring05 Virtual circuit Network Use a signaling protocol to setup connection before data can flow every router on source-dest path maintains “state” for each passing connection link, router resources (bandwidth, buffers) allocated to each VC each packet carries VC identifier (not destination host address) VC number must be changed on each link. New VC number comes from forwarding table application transport network data link physical 5/3/05 5. Data flow begins 6. Receive data 4. Call connected 1. Initiate call 3. Accept call 2. incoming call 24 application transport network data link physical CS118/Spring05 Forwarding table VC number 22 12 1 Forwarding table in northwest router: Incoming interface 1 2 3 1 … 3 32 2 interface number Incoming VC # Outgoing interface 12 63 7 97 … 2 1 2 3 … Outgoing VC # 22 18 17 87 … Routers maintain connection state information! 5/3/05 25 CS118/Spring05 Internet: A Datagram Network hosts are connected to subnets subnets are interconnected by IP routers All hosts and routers speak IP routers also “speak” many different link layer protocols IP provides two basic functions globally unique address for all connected points Best effort datagram delivery from source to destination hosts • Fragmentation/reassembly of packets whenever needed H1 IP IP ETH 5/3/05 R2 R1 ETH IP FDDI IP WLAN FDDI 26 H8 R3 WLAN IP ETH ETH CS118/Spring05 The Internet Network layer Host, router network layer functions: Transport layer: TCP, UDP Routing protocols •RIP, OSPF, BGP … Network layer IP protocol •addressing conventions •datagram format •packet handling conventions forwarding table ICMP protocol •error reporting •router “signaling” Link layer Router function 5/3/05 physical layer 27 CS118/Spring05 IP datagram format IP version number header length type of ver head. len service Total length fragment 16-bit identifier flgs offset time to IP header protocol live checksum source IP address upper layer protocol to deliver payload to how much overhead for a TCP segment? 20 bytes of TCP 20 bytes of IP = 40 bytes 5/3/05 destination IP address E.g. timestamp, route recording, Specify list of routers to visit. basic header 3 fields used for packet fragmentation/reassembly max number of remaining hops 32 bits Options (if any) data (variable length, typically a TCP or UDP segment) 28 CS118/Spring05 IP Address structure •32-bits, uniquely identifies a host or router interface –interface: connection between host/router and physical link IP address space: 2-level hierarchy 173.1.1.1 = 10101101 00000001 00000001 00000001 173 4 byte Network-ID 1 1 173.1.1.1 host-ID 173.1.2.1 173.1.1.2 173.1.1.4 What’s a network ? (from IP address 173.1.1.3 perspective) device interfaces with same network part of IP address can physically reach each other without going thru a router 5/3/05 1 173.1.2.9 173.1.3.27 173.1.2.2 LAN 173.1.3.1 29 173.1.3.2 CS118/Spring05 IP Address: how many bits for net-ID Original IP design: class-based address A 0 network B 10 C 110 D 1110 1.0.0.0 to 127.255.255.255 host network 128.0.0.0 to 191.255.255.255 host network host multicast address 192.0.0.0 to 223.255.255.255 224.0.0.0 to 239.255.255.255 Two changes added over the last 25 years Subnetting: add a hidden level to address hierarchy • An organization gets one address block, then split the host part into two parts: subnet and host parts Network ID Host ID CIDR: Classless InterDomain Routing (today) • network portion of address of arbitrary length 5/3/05 30 CS118/Spring05 Classless InterDomain Routing address format: a.b.c.d/x, x # bits in network portion network part 200.23.16.0/23 host part 11001000 00010111 00010000 00000000 Internet Service Providers get blocks of IP addresses from the Internet address authority Internet customers get portion of their ISP’s addr. block ISP's block 11001000 00010111 00010000 00000000 200.23.16.0/20 Organization 0 11001000 00010111 00010000 00000000 200.23.16.0/23 Organization 1 11001000 00010111 00010010 00000000 200.23.18.0/23 Organization 2 ... 11001000 00010111 00010100 00000000 ….. …. 200.23.20.0/23 …. Organization 7 11001000 00010111 00011110 00000000 200.23.30.0/23 5/3/05 31 CS118/Spring05 Hierarchical addressing: route aggregation Hierarchical addressing allows efficient advertisement of routing information: Organization 0 200.23.16.0/23 Organization 1 200.23.18.0/23 Organization 2 200.23.20.0/23 Organization 7 . . . . . . Fly-By-Night-ISP “Send me anything with addresses beginning 200.23.16.0/20” Internet 200.23.30.0/23 ISPs-R-Us 5/3/05 32 “Send me anything with addresses beginning 199.31.0.0/16” CS118/Spring05 Hierarchical addressing: route aggregation Route aggregation helps reduce routing table size multi-homing defeats address aggregation ISPs-R-Us has a more specific route to Org. 7 Organization 0 200.23.16.0/23 Organization 1 “Send me anything with addresses beginning 200.23.16.0/20” 200.23.18.0/23 Organization 2 200.23.20.0/23 Organization 7 . . . . . . Fly-By-Night-ISP Internet 200.23.30.0/23 Multi-homing 5/3/05 ISPs-R-Us 33 “Send me anything anythingwith with “Send me addresses beginning addresses beginning 199.31.0.0/16” 199.31.0.0/16, or 200.23.30.0/23 ” CS118/Spring05 IP Subnet subnet mask: indicates the portion of the address that is considered as “network ID” by the local site subnet mask does not need to align with a byte boundary 11111111111111111111110000000000 Host ID Network ID Viewed from outside 10-bit host ID Viewed from inside Each host must be configured with both an IP address and a subnet mask subnets are invisible outside of the local site backbone routers only know how to forward packets to the networkID Within the organization, routers store: [subnet, mask, next hop] Subnet advantages: aggregate local info., keep backbone routers table size small 5/3/05 34 CS118/Spring05 An example UCLA CS Network# mask next-hop 131.179.96 255.255.255.0 C …… ……….. Global Internet Look up IP addr. A 131.179.96.15 B 131.179.96.0 C Network# next-hop 131.179 B 131.179.96.15 a class-B address Network ID host ID subnet mask(255.255.255.0) 111111111111111111111111 00000000 subnetted address 5/3/05 131 . 179 . 96 35 15 CS118/Spring05 Getting an IP packet from source to dest. Source host A destination B: Host A: [A’s addr & subnet mask] ═ [B’s addr & subnet mask] ? yes: B is on the same net, use link layer to send pkt directly to B Source host Adestination E: [A’s addr & subnet mask] = [E’s addr & subnet mask] ? A yes No: Send pkt to default router 173.1.1.4 173.1.1.2 173.1.1.4 173.1.1.3 directly connect subnets? Yes: send pkt directly to E No: forward to another router according to routing table 5/3/05 173.1.2.1 B Router: Is E on any of my 173.1.1.1 173.1.3.1 36 173.1.2.9 173.1.3.27 173.1.2.2 E 173.1.3.2 CS118/Spring05 IP Fragmentation & Reassembly Different subnets have different MTUs (Maximum Transmission Unit) Sender host always uses its max MTU H1 sending an IP packet of 1300 byte data to H2: MTU=1500B R1 size Routers “fragment” IP packets if the next link has a smaller MTU chop packets to the MTU size of next link further fragmentation down the path possible packet reassembled at dest. host R2 H1 1300B reassembly H2 512B 276B R3 MTU=532B 5/3/05 37 CS118/Spring05 IP Fragmentation: An example 4 5 TOS 1320 000 0 7394 rest of the IP header data (1300 bytes) 4 5 TOS 532 001 0 7394 rest of the IP header H2 5 TOS 532 001 64 7394 rest of the IP header 276B MTU=532B At destination: - identifier: tell all pieces in the same packet - the last fragment: MF=0 - the offsets tell whether there are holes missing in the middle 5 TOS 296 000 128 7394 rest of the IP header data (276 bytes) 5/3/05 512B R3 data (512 bytes) 4 1300B reassembly data (512 bytes) 4 R2 H1 38 CS118/Spring05 ICMP: Internet Control Message Protocol used by hosts & routers to communicate networklevel information error reporting: unreachable host, network, port, protocol echo request/reply ICMP msgs carried in IP packets ICMP message format IP header type code checksum unused (or used by certain ICMP types) IP header and first 64bits of data Or data (according to ICMP types) 5/3/05 39 Type 0 3 3 3 3 3 3 4 Code 0 0 1 2 3 6 7 0 8 9 10 11 12 0 0 0 0 0 description echo reply (ping) dest. network unreachable dest host unreachable dest protocol unreachable dest port unreachable dest network unknown dest host unknown source quench (congestion control - not used) echo request (ping) route advertisement router discovery TTL expired bad IP header CS118/Spring05 NAT: Network Address Translation rest of Internet local network (e.g., home network) 10.0.0/24 10.0.0.4 10.0.0.1 10.0.0.2 138.76.29.7 10.0.0.3 All datagrams leaving local network have same single source NAT IP address: 138.76.29.7, different source port numbers 5/3/05 Datagrams with source or destination in this network have 10.0.0/24 address for source, destination (as usual) 42 CS118/Spring05 NAT: Network Address Translation rest of Internet local network (e.g., home network) 10.0.0/24 10.0.0.4 10.0.0.1 10.0.0.2 138.76.29.7 10.0.0.3 All datagrams leaving local network have same single source NAT IP address: 138.76.29.7, different source port numbers 5/3/05 Datagrams with source or destination in this network have 10.0.0/24 address for source, destination (as usual) 43 CS118/Spring05 NAT: Network Address Translation 2: NAT router changes datagram source addr from 10.0.0.1, 3345 to 138.76.29.7, 5001, updates table 2 NAT translation table WAN side addr LAN side addr 138.76.29.7, 5001 10.0.0.1, 3345 …… …… S: 10.0.0.1, 3345 D: 128.119.40.186, 80 S: 138.76.29.7, 5001 D: 128.119.40.186, 80 1 10.0.0.4 138.76.29.7 S: 128.119.40.186, 80 D: 138.76.29.7, 5001 S: 128.119.40.186, 80 D: 10.0.0.1, 3345 3 10.0.0.1 10.0.0.2 4 10.0.0.3 4: NAT router changes datagram dest addr from 138.76.29.7, 5001 to 10.0.0.1, 3345 3: Reply arrives dest. address: 138.76.29.7, 5001 5/3/05 1: host 10.0.0.1 sends datagram to 128.119.40, 80 44 CS118/Spring05 NAT implementation NAT router must do the following: outgoing datagrams: replace (source IP address, port #) of every outgoing datagram to (NAT IP address, new port #) • . . . remote clients/servers will respond using (NAT IP address, new port #) as destination addr. remember (in NAT translation table) every (source IP address, port #) to (NAT IP address, new port #) translation pair incoming datagrams: replace (NAT IP address, new port #) in destination fields of every incoming datagram with corresponding (source IP address, port #) stored in NAT table Problems due to NAT Increased network complexity, reduced robustness Cannot run services from inside a NAT box address shortage should instead be solved by IPv6 5/3/05 45 CS118/Spring05 IPv6 Motivation: 32-bit address space exhaustion Take the opportunity for some clean-up IPv6 datagram format: Address length changed from 32 bits to 128 bits fragmentation fields moved out of base header IP options moved out of base header • Header Length field eliminated Header Checksum eliminated Type of Service field eliminated Time to Live Hop Limit, Protocol Next Header Precedence Priority, added Flow Label field Length field excludes IPv6 header 5/3/05 46 CS118/Spring05 IPv6 header format Flow Label Version Priority Payload Length Next Header Hop Limit Source Address (16 bytes, 128 bits) Destination Address (16 bytes) IPv4 Version Hdr Len Prec header Identification Time to Live TOS Total Length Flags Protocol Fragment Offset Header Checksum Source Address Destination Address Options Padding 32 bits 5/3/05 47 CS118/Spring05 Changes from IPv4 Priority: identify priority among datagrams in flow Flow Label: identify datagrams in same “flow” (concept of“flow” not well defined). Next header: identify upper layer protocol for data Options: allowed, but outside of the basic header, indicated by “Next Header” field Checksum: removed entirely to reduce processing time at each hop ICMPv6: new version of ICMP additional message types, e.g. “Packet Too Big” multicast group management functions 5/3/05 48 CS118/Spring05 Transition From IPv4 To IPv6 Not all routers can be upgraded simultaneous to allow the Internet operate with mixed IPv4 and IPv6 routers : tunneling Logical view: E F IPv6 IPv6 A B IPv6 IPv6 A B C D E F IPv6 IPv6 IPv4 IPv4 IPv6 IPv6 tunnel Physical view: Flow: X Src: A Dest: F data A-to-B: IPv6 5/3/05 Src:B Dest: E Src:B Dest: E Flow: X Src: A Dest: F Flow: X Src: A Dest: F data data B-to-C: IPv6 inside IPv4 49 B-to-C: IPv6 inside IPv4 Flow: X Src: A Dest: F data E-to-F: IPv6 CS118/Spring05 Interplay between routing and forwarding routing algorithm local forwarding table header value output link 0100 0101 0111 1001 3 2 2 1 value in arriving packet’s header 0111 1 3 2 5/3/05 50 CS118/Spring05 Router Architecture Overview Two key router functions: run routing algorithms/protocol (RIP, OSPF, BGP) forwarding datagrams from incoming to outgoing link 5/3/05 51 CS118/Spring05 Input Port Functions Physical layer: bit-level reception Data link layer: e.g., Ethernet see chapter 5 5/3/05 Decentralized switching: given datagram dest., lookup output port using forwarding table in input port memory goal: complete input port processing at ‘line speed’ queuing: if datagrams arrive faster than forwarding rate into switch fabric 52 CS118/Spring05 Output Ports Buffering required when datagrams arrive from fabric faster than the transmission rate Scheduling discipline chooses among queued datagrams for transmission 5/3/05 53 CS118/Spring05