IT 347 Midterm 2 Review Vocab Review • • • • • • • • ATM CBR ABR VBR UBR MSS MTU AIMD TCP: retransmission scenarios Host A X loss Sendbase = 100 SendBase = 120 SendBase = 100 time SendBase = 120 time lost ACK scenario Host B Seq=92 timeout Host B Seq=92 timeout timeout Host A Transport Layer premature timeout 3-3 TCP retransmission scenarios (more) timeout Host A Host B X loss SendBase = 120 time Cumulative ACK scenario Transport Layer 3-4 TCP ACK generation [RFC 1122, RFC 2581] Event at Receiver TCP Receiver action Arrival of in-order segment with expected seq #. All data up to expected seq # already ACKed Delayed ACK. Wait up to 500ms for next segment. If no next segment, send ACK Arrival of in-order segment with expected seq #. One other segment has ACK pending Immediately send single cumulative ACK, ACKing both in-order segments Arrival of out-of-order segment higher-than-expect seq. # . Gap detected Immediately send duplicate ACK, indicating seq. # of next expected byte Arrival of segment that partially or completely fills gap Immediate send ACK, provided that segment starts at lower end of gap Transport Layer 3-5 Fast Retransmit • time-out period often relatively long: – long delay before resending lost packet • detect lost segments via duplicate ACKs. – sender often sends many segments back-to-back – if segment is lost, there will likely be many duplicate ACKs for that segment • If sender receives 3 ACKs for same data, it assumes that segment after ACKed data was lost: – fast retransmit: resend segment before timer expires Transport Layer 3-6 Host A seq # x1 seq # x2 seq # x3 seq # x4 seq # x5 Host B X ACK x1 ACK x1 ACK x1 ACK x1 timeout triple duplicate ACKs time Transport Layer 3-7 Fast retransmit algorithm: event: ACK received, with ACK field value of y if (y > SendBase) { SendBase = y if (there are currently not-yet-acknowledged segments) start timer } else { increment count of dup ACKs received for y if (count of dup ACKs received for y = 3) { resend segment with sequence number y } a duplicate ACK for already ACKed segment fast retransmit Transport Layer 3-8 TCP Flow Control flow control sender won’t overflow receiver’s buffer by transmitting too much, too fast • receive side of TCP connection has a receive buffer: IP datagrams (currently) unused buffer space TCP data (in buffer) application process • speed-matching service: matching send rate to receiving application’s drain rate app process may be slow at reading from buffer Transport Layer 3-9 TCP Flow control: how it works IP datagrams (currently) unused buffer space TCP data (in buffer) application process rwnd RcvBuffer (suppose TCP receiver discards out-of-order segments) • unused buffer space: • receiver: advertises unused buffer space by including rwnd value in segment header • sender: limits # of unACKed bytes to rwnd – guarantees receiver’s buffer doesn’t overflow = rwnd = RcvBuffer-[LastByteRcvd LastByteRead] Transport Layer 3-10 TCP congestion control: bandwidth probing “probing for bandwidth”: increase transmission rate on receipt of ACK, until eventually loss occurs, then decrease transmission rate continue to increase on ACK, decrease on loss (since available bandwidth is changing, depending on other connections in network) ACKs being received, so increase rate X loss, so decrease rate sending rate X X X TCP’s “sawtooth” behavior X time Q: how fast to increase/decrease? details to follow Transport Layer 3-11 TCP Congestion Control: details • sender limits rate by limiting number of unACKed bytes “in pipeline”: LastByteSent-LastByteAcked cwnd – cwnd: differs from rwnd (how, why?) – sender limited by min(cwnd,rwnd) • roughly, cwnd bytes cwnd bytes/sec • cwnd is dynamic,RTT function of perceived rate = network congestion RTT ACK(s) Transport Layer 3-12 TCP Congestion Control: more details segment loss event: reducing cwnd • timeout: no response from receiver ACK received: increase cwnd slowstart phase: – cut cwnd to 1 increase exponentially fast (despite name) at connection start, or following timeout congestion avoidance: increase linearly • 3 duplicate ACKs: at least some segments getting through (recall fast retransmit) – cut cwnd in half, less aggressively than on timeout Transport Layer 3-13 TCP Slow Start Transport Layer Host A Host B RTT • when connection begins, cwnd = 1 MSS – example: MSS = 500 bytes & RTT = 200 msec – initial rate = 20 kbps • available bandwidth may be >> MSS/RTT – desirable to quickly ramp up to respectable rate • increase rate exponentially until first loss event or when threshold reached – double cwnd every RTT – done by incrementing cwnd by 1 for every ACK received time 3-14 Transitioning into/out of slowstart ssthresh: cwnd threshold maintained by TCP • on loss event: set ssthresh to cwnd/2 – remember (half of) TCP rate when congestion last occurred • when cwnd >= ssthresh: transition from slowstart to congestion avoidance phase duplicate ACK dupACKcount++ L cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0 timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment slow start new ACK cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s),as allowed cwnd > ssthresh L timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment Transport Layer congestion avoidance 3-15 TCP: congestion avoidance • when cwnd > ssthresh grow cwnd linearly – increase cwnd by 1 MSS per RTT – approach possible congestion slower than in slowstart – implementation: cwnd = cwnd + MSS/cwnd for each ACK received Transport Layer AIMD ACKs: increase cwnd by 1 MSS per RTT: additive increase loss: cut cwnd in half (non-timeout-detected loss ): multiplicative decrease AIMD: Additive Increase Multiplicative Decrease 3-16 TCP congestion control FSM: overview cwnd > ssthresh slow start loss: timeout loss: timeout loss: timeout loss: 3dupACK congestion avoidance new ACK loss: 3dupACK fast recovery Transport Layer 3-17 TCP congestion control FSM: details duplicate ACK dupACKcount++ L cwnd = 1 MSS ssthresh = 64 KB dupACKcount = 0 slow start timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment dupACKcount == 3 ssthresh= cwnd/2 cwnd = ssthresh + 3 retransmit missing segment new ACK cwnd = cwnd+MSS dupACKcount = 0 transmit new segment(s),as allowed cwnd > ssthresh L timeout ssthresh = cwnd/2 cwnd = 1 MSS dupACKcount = 0 retransmit missing segment timeout ssthresh = cwnd/2 cwnd = 1 dupACKcount = 0 retransmit missing segment new ACK cwnd = cwnd + MSS (MSS/cwnd) dupACKcount = 0 transmit new segment(s),as allowed . congestion avoidance duplicate ACK dupACKcount++ New ACK cwnd = ssthresh dupACKcount = 0 dupACKcount == 3 ssthresh= cwnd/2 cwnd = ssthresh + 3 retransmit missing segment fast recovery duplicate ACK cwnd = cwnd + MSS transmit new segment(s), as allowed Transport Layer 3-18 Popular “flavors” of TCP cwnd window size (in segments) TCP Reno ssthresh ssthresh TCP Tahoe Transmission round Transport Layer 3-19 Summary: TCP Congestion Control • when cwnd < ssthresh, sender in slow-start phase, window grows exponentially. • when cwnd >= ssthresh, sender is in congestionavoidance phase, window grows linearly. • when triple duplicate ACK occurs, ssthresh set to cwnd/2, cwnd set to ~ ssthresh • when timeout occurs, ssthresh set to cwnd/2, cwnd set to 1 MSS. Transport Layer 3-20 TCP Futures: TCP over “long, fat pipes” • example: 1500 byte segments, 100ms RTT, want 10 Gbps throughput • requires window size W = 83,333 in-flight segments • throughput in terms of loss rate: 1.22 MSS • ➜ L = 2·10-10 Wow RTT L • new versions of TCP for high-speed Transport Layer 3-21 TCP Fairness fairness goal: if K TCP sessions share same bottleneck link of bandwidth R, each should have average rate of R/K TCP connection 1 TCP connection 2 bottleneck router capacity R Transport Layer 3-22 Why is TCP fair? Two competing sessions: • Additive increase gives slope of 1, as throughout increases • multiplicative decrease decreases throughput proportionally R equal bandwidth share loss: decrease window by factor of 2 congestion avoidance: additive increase loss: decrease window by factor of 2 congestion avoidance: additive increase Connection 1 throughput Transport Layer R 3-23 Fairness (more) Fairness and UDP • multimedia apps often do not use TCP – do not want rate throttled by congestion control • instead use UDP: – pump audio/video at constant rate, tolerate packet loss Fairness and parallel TCP connections • nothing prevents app from opening parallel connections between 2 hosts. • web browsers do this • example: link of rate R supporting 9 connections; – new app asks for 1 TCP, gets rate R/10 – new app asks for 11 TCPs, gets R/2 ! Transport Layer 3-24 Chapter 3: Summary • principles behind transport layer services: – multiplexing, demultiplexing – reliable data transfer – flow control – congestion control • instantiation and implementation in the Internet – UDP – TCP Transport Layer Next: • leaving the network “edge” (application, transport layers) • into the network “core” 3-25 Interplay between routing and forwarding routing algorithm local forwarding table header value output link 0100 0101 0111 1001 3 2 2 1 value in arriving packet’s header 0111 1 3 2 Network Layer 4-26 Connection setup • 3rd important function in some network architectures: – ATM, frame relay, X.25 • before datagrams flow, two end hosts and intervening routers establish virtual connection – routers get involved • network vs transport layer connection service: – network: between two hosts (may also involve intervening routers in case of VCs) – transport: between two processes Network Layer 4-27 Network service model Q: What service model for “channel” transporting datagrams from sender to receiver? example services for individual datagrams: • guaranteed delivery • guaranteed delivery with less than 40 msec delay example services for a flow of datagrams: • in-order datagram delivery • guaranteed minimum bandwidth to flow • restrictions on changes in inter-packet spacing Network Layer 4-28 Network layer service models: Network Architecture Internet Service Model Guarantees ? Congestion Bandwidth Loss Order Timing feedback best effort none ATM CBR ATM VBR ATM ABR ATM UBR constant rate guaranteed rate guaranteed minimum none no no no yes yes yes yes yes yes no yes no no (inferred via loss) no congestion no congestion yes no yes no no Network Layer 4-29 VC implementation a VC consists of: 1. path from source to destination 2. VC numbers, one number for each link along path 3. entries in forwarding tables in routers along path • packet belonging to VC carries VC number (rather than dest address) • VC number can be changed on each link. – New VC number comes from forwarding table Network Layer 4-30 VC Forwarding VC number table 22 12 1 3 interface number Forwarding table in northwest router: Incoming interface Incoming VC # 1 2 3 1 … 2 32 12 63 7 97 … Outgoing interface Outgoing VC # 3 1 2 3 … 22 18 17 87 … Routers maintain connection state information! Network Layer 4-31 Virtual circuits: signaling protocols • used to setup, maintain teardown VC • used in ATM, frame-relay, X.25 • not used in today’s Internet application 5. Data flow begins transport network 4. Call connected 1. Initiate call data link physical application transport 3. Accept call network 2. incoming call data link physical 6. Receive data Network Layer 4-32 Datagram networks • no call setup at network layer • routers: no state about end-to-end connections – no network-level concept of “connection” • packets forwarded using destination host address – packets between same source-dest pair may take different paths application transport network data link physical application transport network 2. Receive data data link physical 1. Send data Network Layer 4-33 Datagram Forwarding table 4 billion IP addresses, so rather than list individual destination address list range of addresses (aggregate table entries) routing algorithm local forwarding table dest address output link address-range 1 address-range 2 address-range 3 address-range 4 3 2 2 1 IP destination address in arriving packet’s header 1 3 2 Network Layer 4-34 Datagram Forwarding table Destination Address Range Link Interface 11001000 00010111 00010000 00000000 through 11001000 00010111 00010111 11111111 0 11001000 00010111 00011000 00000000 through 11001000 00010111 00011000 11111111 1 11001000 00010111 00011001 00000000 through 11001000 00010111 00011111 11111111 2 otherwise 3 Q: but what happens if ranges don’t divide up so nicely? Network Layer 4-35 Longest prefix matching Longest prefix matching when looking for forwarding table entry for given destination address, use longest address prefix that matches destination address. Destination Address Range Link interface 11001000 00010111 00010*** ********* 0 11001000 00010111 00011000 ********* 1 11001000 00010111 00011*** ********* 2 otherwise 3 Examples: DA: 11001000 00010111 00010110 10100001 Which interface? DA: 11001000 00010111 00011000 10101010 Which interface? Network Layer 4-36 Datagram or VC network: why? Internet (datagram) ATM (VC) • data exchange among computers – “elastic” service, no strict timing req. • “smart” end systems (computers) – can adapt, perform control, error recovery – simple inside network, complexity at “edge” • many link types – different characteristics – uniform service difficult • evolved from telephony • human conversation: – strict timing, reliability requirements – need for guaranteed service • “dumb” end systems – telephones – complexity inside network Network Layer 4-37 IP Fragmentation and Reassembly Example 4000 byte datagram MTU = 1500 bytes 1480 bytes in data field offset = 1480/8 length ID =4000 =x fragflag =0 offset =0 One large datagram becomes several smaller datagrams length ID =1500 =x fragflag =1 offset =0 length ID =1500 =x fragflag =1 offset =185 length ID =1040 =x fragflag =0 offset =370 Network Layer 4-38 Subnets 223.1.1.2 223.1.1.1 How many? 223.1.1.4 223.1.1.3 223.1.9.2 223.1.7.0 223.1.9.1 223.1.7.1 223.1.8.1 223.1.8.0 223.1.2.6 223.1.2.1 223.1.3.27 223.1.2.2 Network Layer 223.1.3.1 223.1.3.2 4-39 IP addressing: CIDR CIDR: Classless InterDomain Routing – subnet portion of address of arbitrary length – address format: a.b.c.d/x, where x is # bits in subnet portion of address host part subnet part 11001000 00010111 00010000 00000000 200.23.16.0/23 Network Layer 4-40 DHCP client-server scenario A 223.1.2.1 DHCP server 223.1.1.1 223.1.1.2 223.1.1.4 223.1.2.9 B 223.1.2.2 223.1.1.3 223.1.3.1 223.1.3.27 223.1.3.2 Network Layer E arriving DHCP client needs address in this network 4-41 DHCP: example DHCP UDP IP Eth Phy DHCP DHCP DHCP DHCP DHCP DHCP DHCP DHCP DHCP DHCP UDP IP Eth Phy • connecting laptop needs its IP address, addr of first-hop router, addr of DNS server: use DHCP DHCP request encapsulated in UDP, encapsulated in IP, encapsulated in 802.1 Ethernet Ethernet frame broadcast (dest: FFFFFFFFFFFF) on LAN, received at router running DHCP server Ethernet demuxed to IP demuxed, UDP demuxed to DHCP 168.1.1.1 router (runs DHCP) Network Layer 4-42 DHCP: example DHCP UDP IP Eth Phy DHCP DHCP DHCP DHCP DHCP DHCP DHCP DHCP DHCP DHCP UDP IP Eth Phy • DCP server formulates DHCP ACK containing client’s IP address, IP address of first-hop router for client, name & IP address of DNS server encapsulation of DHCP server, frame forwarded to client, demuxing up to DHCP at client client now knows its IP address, name and IP address of DSN server, IP address of its first-hop router router (runs DHCP) Network Layer 4-43 Hierarchical addressing: more specific routes ISPs-R-Us has a more specific route to Organization 1 Organization 0 200.23.16.0/23 Organization 2 200.23.20.0/23 Organization 7 . . . . . . Fly-By-Night-ISP “Send me anything with addresses beginning 200.23.16.0/20” Internet 200.23.30.0/23 ISPs-R-Us Organization 1 200.23.18.0/23 Network Layer “Send me anything with addresses beginning 199.31.0.0/16 or 200.23.18.0/23” 4-44 NAT: Network Address Translation NAT translation table WAN side addr LAN side addr 2: NAT router changes datagram source addr from 10.0.0.1, 3345 to 138.76.29.7, 5001, updates table 1: host 10.0.0.1 sends datagram to 128.119.40.186, 80 138.76.29.7, 5001 10.0.0.1, 3345 …… …… S: 10.0.0.1, 3345 D: 128.119.40.186, 80 10.0.0.1 1 2 S: 138.76.29.7, 5001 D: 128.119.40.186, 80 138.76.29.7 S: 128.119.40.186, 80 D: 138.76.29.7, 5001 3: Reply arrives dest. address: 138.76.29.7, 5001 3 10.0.0.4 S: 128.119.40.186, 80 D: 10.0.0.1, 3345 10.0.0.2 4 10.0.0.3 4: NAT router changes datagram dest addr from 138.76.29.7, 5001 to 10.0.0.1, 3345 Network Layer 4-45 Comparison of LS and DV algorithms Message complexity • LS: with n nodes, E links, O(nE) msgs sent • DV: exchange between neighbors only – convergence time varies Speed of Convergence • LS: O(n2) algorithm requires O(nE) msgs – may have oscillations • DV: convergence time varies – may be routing loops – count-to-infinity problem Robustness: what happens if router malfunctions? LS: – node can advertise incorrect link cost – each node computes only its own table DV: Network Layer – DV node can advertise incorrect path cost – each node’s table used by others • error propagate thru network 4-46 Interconnected ASes 3c 3b 3a AS3 2a 1c 1a 1d 2c 2b AS2 1b Intra-AS Routing algorithm AS1 Inter-AS Routing algorithm Forwarding table Network Layer • forwarding table configured by both intraand inter-AS routing algorithm – intra-AS sets entries for internal dests – inter-AS & intra-As sets entries for external dests 4-47 Inter-AS tasks • suppose router in AS1 receives datagram destined outside of AS1: – router should forward packet to gateway router, but which one? AS1 must: 1. learn which dests are reachable through AS2, which through AS3 2. propagate this reachability info to all routers in AS1 job of inter-AS routing! 3c 3b other networks 3a AS3 2c 1c 1a AS1 1d 2a 1b Network Layer 2b other networks AS2 4-48 Example: Setting forwarding table in router 1d • suppose AS1 learns (via inter-AS protocol) that subnet x reachable via AS3 (gateway 1c) but not via AS2. – inter-AS protocol propagates reachability info to all internal routers • router 1d determines from intra-AS routing info that its interface I is on the least cost path to 1c. – installs forwarding table entry (x,I) x 3c 3b other networks 3a AS3 2c 1c 1a AS1 1d 2a 1b Network Layer 2b other networks AS2 4-49 Example: Choosing among multiple ASes • now suppose AS1 learns from inter-AS protocol that subnet x is reachable from AS3 and from AS2. • to configure forwarding table, router 1d must determine which gateway it should forward packets towards for dest x – this is also job of inter-AS routing protocol! x 3c 3b other networks 3a AS3 2c 1c 1a AS1 2a 1d 1b 2b other networks AS2 ? Network Layer 4-50 Example: Choosing among multiple ASes • now suppose AS1 learns from inter-AS protocol that subnet x is reachable from AS3 and from AS2. • to configure forwarding table, router 1d must determine towards which gateway it should forward packets for dest x. – this is also job of inter-AS routing protocol! • hot potato routing: send packet towards closest of two routers. Learn from inter-AS protocol that subnet x is reachable via multiple gateways Use routing info from intra-AS protocol to determine costs of least-cost paths to each of the gateways Hot potato routing: Choose the gateway that has the smallest least cost Network Layer Determine from forwarding table the interface I that leads to least-cost gateway. Enter (x,I) in forwarding table 4-51 Intra-AS Routing • also known as Interior Gateway Protocols (IGP) • most common Intra-AS routing protocols: – RIP: Routing Information Protocol (DV) – OSPF: Open Shortest Path First (LS) – IGRP: Interior Gateway Routing Protocol (Cisco proprietary)(LS) Network Layer 4-52 BGP basics: distributing path information • using eBGP session between 3a and 1c, AS3 sends prefix reachability info to AS1. – 1c can then use iBGP do distribute new prefix info to all routers in AS1 – 1b can then re-advertise new reachability info to AS2 over 1b-to-2a eBGP session • when router learns of new prefix, it creates entry for prefix in its forwarding table. eBGP session 3b other networks 3a AS3 iBGP session 2c 1c 1a AS1 1d 2a 1b Network Layer 2b other networks AS2 4-53 Path attributes & BGP routes • advertised prefix includes BGP attributes – prefix + attributes = “route” • two important attributes: – AS-PATH: contains ASs through which prefix advertisement has passed: e.g., AS 67, AS 17 – NEXT-HOP: indicates specific internal-AS router to next-hop AS. (may be multiple links from current AS to next-hop-AS) • gateway router receiving route advertisement uses import policy to accept/decline – e.g., never route through AS x – policy-based routing Network Layer 4-54 BGP route selection • router may learn about more than 1 route to destination AS, selects route based on: 1. local preference value attribute: policy decision 2. shortest AS-PATH 3. closest NEXT-HOP router: hot potato routing 4. additional criteria Network Layer 4-55 BGP messages • BGP messages exchanged between peers over TCP connection • BGP messages: – OPEN: opens TCP connection to peer and authenticates sender – UPDATE: advertises new path (or withdraws old) – KEEPALIVE: keeps connection alive in absence of UPDATES; also ACKs OPEN request – NOTIFICATION: reports errors in previous msg; also used to close connection Network Layer 4-56 BGP routing policy legend: B W provider network X A customer network: C Y A,B,C are provider networks X,W,Y are customer (of provider networks) X is dual-homed: attached to two networks X does not want to route from B via X to C .. so X will not advertise to B a route to C Network Layer 4-57 BGP routing policy (2) legend: B W provider network X A customer network: C Y A advertises path AW to B B advertises path BAW to X Should B advertise path BAW to C? No way! B gets no “revenue” for routing CBAW since neither W nor C are B’s customers B wants to force C to route to w via A B wants to route only to/from its customers! Network Layer 4-58 Why different Intra- and Inter-AS routing ? Policy: • Inter-AS: admin wants control over how its traffic routed, who routes through its net. • Intra-AS: single admin, so no policy decisions needed Scale: • hierarchical routing saves table size, reduced update traffic Performance: • Intra-AS: can focus on performance • Inter-AS: policy may dominate over performance Network Layer 4-59 Chapter 4: summary 4. 1 Introduction 4.2 Virtual circuit and datagram networks 4.3 What’s inside a router 4.4 IP: Internet Protocol – – – – Datagram format IPv4 addressing ICMP IPv6 4.5 Routing algorithms – Link state – Distance Vector – Hierarchical routing 4.6 Routing in the Internet – RIP – OSPF – BGP 4.7 Broadcast and multicast routing Network Layer 4-60