© Graduate School , Chinese academy of Sciences. Network Design and Performance Analysis Wang Wenjie Wangwj@gucas.ac.cn Network Design and Analysis-----Wang Wenjie Congestion Control : 1 © Graduate School , Chinese academy of Sciences. Flow and Congestion Control (2) Network Design and Analysis-----Wang Wenjie Congestion Control : 2 © Graduate School , Chinese academy of Sciences. 主要内容 • 数据网络和互联网中的拥塞控制 • 链路级流量和差错控制 • TCP通信量控制 Network Design and Analysis-----Wang Wenjie Congestion Control : 3 © Graduate School , Chinese academy of Sciences. Overview(1) • TCP = Transmission Control Protocol • Connection-oriented protocol • Provides a reliable unicast end-to-end byte stream over an unreliable internetwork. Network Design and Analysis-----Wang Wenjie Congestion Control : 4 © Graduate School , Chinese academy of Sciences. Overview(2) • Connection-Oriented: Before any data transfer, TCP establishes a connection: - One TCP entity is waiting for a connection (“server”) - The other TCP entity (“client”) contacts the server • Reliable Byte stream is broken up into chunks which are called segments - Receiver sends acknowledgements (ACKs) for segments - TCP maintains a timer. If an ACK is not received in time, the segment is retransmitted TCP has checksums for header and data. Segments with invalid checksums are discarded Network Design and Analysis-----Wang Wenjie Congestion Control : 5 © Graduate School , Chinese academy of Sciences. Overview(3) • 基本思想 – The assumption is that packet loss caused by transmission errors is rare and thus packet losses signify congestion. – Sources do not really see packet losses. They detect “signs” of losses, called packet loss indications. • Timeouts • Triple duplicates Network Design and Analysis-----Wang Wenjie Congestion Control : 6 © Graduate School , Chinese academy of Sciences. TCP Credit Allocation Mechanism Note: trailing edge advances each time A sends data, leading edge advances only when B grants additional credit. Network Design and Analysis-----Wang Wenjie Congestion Control : 9 © Graduate School , Chinese academy of Sciences. 信用量机制 • 接受方需要一种机制用来说明给发送方多少信用量 • 保守的方法:有多大可用的缓存就赋予多少信用量 • 这样可能会限制传输连接的吞吐量 • 优化的方法:在数据到来之前预计能够释放多少空间,然后 给出信用量。 Network Design and Analysis-----Wang Wenjie Congestion Control : 11 © Graduate School , Chinese academy of Sciences. 窗口大小对性能的影响 • 吞吐量的大小是和窗口大小、传播时延及数据率相关的。在 链路控制中,窗口大小和序号以frame为单位。TCP中,窗 口大小和序号是以字节为单位。 • 给出下面一个标记: – W = TCP窗口大小 (octets) – R = TCP连接中TCP源可得的数据率 (bps) – D = TPC连接中源端和目的端之间的传播时延 (seconds) • After TCP source begins transmitting, it takes D seconds for first bits to arrive, and D seconds for acknowledgement to return (RTT) • TCP source could transmit at most 2RD bits, or RD/4 octets Network Design and Analysis-----Wang Wenjie Congestion Control : 12 © Graduate School , Chinese academy of Sciences. 最大归一化吞吐量 S(1) 实际上,TPC源端是受限制的,在没有收到确认之前 它不能传输超过窗口大小W字节的数据 1 s 4W RD W RD / 4 W RD / 4 Network Design and Analysis-----Wang Wenjie Congestion Control : 13 © Graduate School , Chinese academy of Sciences. 最大归一化吞吐量 S(2) • With sliding window protocols, – A large window allows a sender to be aggressive in transmission – a small one forces it to stop-and-wait frequently and thus curbs traffic volume (bytes per sec). • Thus, window size is our major concern. Network Design and Analysis-----Wang Wenjie Congestion Control : 14 © Graduate School , Chinese academy of Sciences. Window Scale Parameter (Optional header item) W = RD/4 W = 220 - 1 W = 216 - 1 RD Network Design and Analysis-----Wang Wenjie Congestion Control : 15 © Graduate School , Chinese academy of Sciences. 复杂因素 • 在多数情况下, 许多TCP连接复用到同一个网络接口上, 因此每条连接只能分到可用容量一部分。这降低了 R的大小 ,因而减少了低效率的程度 (S) • 对含有多跳的连接,D是穿过每个网络的时延加上在 router 中的时延,增加了D (S) • R是源端数据率,如果R比从源端到目的端中的某一跳上遇 到的数据率大,则该跳就会是一个 bottleneck (S) • 报文丢失需要重传,则吞吐率降低(S) Network Design and Analysis-----Wang Wenjie Congestion Control : 16 © Graduate School , Chinese academy of Sciences. Retransmission Strategy • • • TCP中重传主要由下面两个原因引起: 1. 到达的报文段损坏了。这可以根据checksum由接 受方确定,并丢弃该报文段 2. 报文段没有到达接受方 (这就需要一种诊断模式)。 在现代网络中,传输线路差错而丢失的报文段很少 (除了 无线连接以外) 大部分的丢失都是发生在拥塞的routers和switches中 Network Design and Analysis-----Wang Wenjie Congestion Control : 17 © Graduate School , Chinese academy of Sciences. TCP Timers • 每个发送的报文段都赋有一个定时器 • 如果在收到该报文段的确认之前,定时器就超时了,则发送 方必须重传 • 设计的关键问题是: 重传定时器的大小 • 太小:会产生太多不必要的重传,浪费网络资源 • 太大:对丢失报文的处理就会不及时。 Network Design and Analysis-----Wang Wenjie Congestion Control : 18 © Graduate School , Chinese academy of Sciences. Two Strategies • 定时器的值设置应比往返时延稍大 (发送报文、接收 ack) • 往返时延是可变的,尤其是在Internet中端到端的往返时 延变化比点到点的连接的往返时延变化要大很多。 Strategies: 1. Fixed timer 2. Adaptive Network Design and Analysis-----Wang Wenjie Congestion Control : 19 © Graduate School , Chinese academy of Sciences. Problems with Adaptive Scheme • 几乎所有的TCP实现都使用了自适应方案,它基于对最近报文 段往返时延的估计 • 对往返时延估计的困难 – TCP接受方并不立即发送确认,而是积累确认 – 如果一个报文段被重传,发送方无法了解到收到的ack是 最初传输的还是重新传输的确认 – Internet的状态可能发送突然的变化 Network Design and Analysis-----Wang Wenjie Congestion Control : 20 © Graduate School , Chinese academy of Sciences. RFC 793 指数平均 SRTT:平滑往返时间估计 SRTT(K + 1) = α × SRTT(K) + (1 – α) × RTT(K + 1) 越久的观察,在估计中所占的比重越小。 Network Design and Analysis-----Wang Wenjie Congestion Control : 22 © Graduate School , Chinese academy of Sciences. 指数平滑系数 = 0.5 = 0.875 Network Design and Analysis-----Wang Wenjie Congestion Control : 23 © Graduate School , Chinese academy of Sciences. TCP Congestion Control • 拥塞产生时:网络可用性及吞吐量下降,响应时间拉长 • 可采用的解决方案: 动态路由,将负载均匀地分布到交换机/路由器和链路上来缓解拥塞 问题:只能在处理不平衡负载和短期通信量聚集的情况是有效 • 拥塞控制的根本目的: 只能通过将进入网络的数据总量限制为网络可以承载的量的范围之内 Network Design and Analysis-----Wang Wenjie Congestion Control : 26 © Graduate School , Chinese academy of Sciences. TCP Congestion Control is Difficult TCP拥塞控制的困难主要有: • IP是不稳定的,无状态的,它没有提供检测更不用说控制拥塞的机 制 – • RFC 3168 在IP中增加了ECN,但还没有广泛地提交使用 TCP只提供end-to-end的流量控制,它只能通过间接方法推测中间 网络的拥塞情况,TCP实体对网络状况的了解并不可靠。 • 各种TCP实体之间并没有相互合作的分布式算法将它们联结在一起 。因此,TCP实体不能相互合作以便将总流量维持在一定水平上。 实际上,它们可能会对资源是竞争的使用。 • 对IP来说: – ICMP源站抑制报文提供了一个原始的手段限制源端流量,但它 本身不是一个拥塞控制的有效手段 – RSVP可能有用,但是还没有实现 Network Design and Analysis-----Wang Wenjie Congestion Control : 27 © Graduate School , Chinese academy of Sciences. TCP 拥塞控制实现情况 重传定时器 Network Design and Analysis-----Wang Wenjie Note: TCP Tahoe and TCP Reno from Berkeley Unix TCP implementations Congestion Control : 31 © Graduate School , Chinese academy of Sciences. Retransmission Timer Management Three Techniques to calculate retransmission time out (RTO) value: 1. RTT 方差估计 2. 指数RTO Backoff(退避) 3. Karn算法 Network Design and Analysis-----Wang Wenjie Congestion Control : 32 © Graduate School , Chinese academy of Sciences. RTT Variance Estimation (Jacobson’s Algorithm) • 重传定时器数值的设置可以使得TCP实体适应往返时间的变 化,但是对于往返时间有较高方差的情况,不一定合适。 • RTT中三种高方差的来源: 1. 如果数据率较低,那么传输时延和传播时间就相对较大 。而且由于IP数据报大小的变化引起的时延的方差也很 大,因此,SRTT估计器就会受到数据特性而不是网络特 性的影响 2. 负载的突然变化也会引起RTT的突然变化 3. 对等TCP实体可能并不对每个报文段都立即发出确认, 因为它有处理时延或因为它利用积累确认的特权。 Network Design and Analysis-----Wang Wenjie Congestion Control : 33 © Graduate School , Chinese academy of Sciences. Jacobson’s Algorithm 思想:在确定RTO和SRTT的关系时考虑观察的差别。 SRTT ( K 1) (1 g ) SRTT ( K ) g RTT ( K 1) SERR ( K 1) RTT ( K 1) SRTT ( K ) SDEV ( K 1) (1 h) SDEV ( K ) h | SERR ( K 1) | RTO ( K 1) SRTT ( K 1) f SDEV ( K 1) Typical values: g 0.125, h 0.25, f 2 (later 4) RTO ( K 1) SRTT ( K 1) max(G , f SDEV (k 1)) Network Design and Analysis-----Wang Wenjie Congestion Control : 34 © Graduate School , Chinese academy of Sciences. Jacobson’s RTO Calculations Decreasing function Increasing function 一旦到达时间稳定下来,变化估值 SDEV就降了下来。在RTT变化时, 取f=2和4,RTO都比较保守,但是 RTT稳定下来时,RTO就开始收敛于 RTT Network Design and Analysis-----Wang Wenjie Congestion Control : 35 © Graduate School , Chinese academy of Sciences. Two Other Factors Jacobson算法改进了TCP的性能,但是它本身不完整。还 有两个因素没有考虑: • 对于重传的报文段,应该使用什么样的 RTO值? ANSWER: exponential RTO backoff algorithm • 哪些往返时间采样值该用做Jacobson算法的输入? ANSWER: Karn’s algorithm Network Design and Analysis-----Wang Wenjie Congestion Control : 36 © Graduate School , Chinese academy of Sciences. 指数RTO 退避 • 问题提出: 如果TCP发送端在一个报文段上发送超时,它必须重传该报文段。 RFC793要求对这个重传报文段使用相同的RTO值。然而,如果该超时 是由于网络拥塞引起,保持相同的RTO并不明智。 • 解决的思路:TCP源端在同一报文段重传时增加其 RTO的值,即退避过 程 • 简单的实现方法: 对一个报文段的每次重传都乘以一个常数值 RTO = q × RTO • q = 2称为二进制指数退避 Network Design and Analysis-----Wang Wenjie Congestion Control : 37 © Graduate School , Chinese academy of Sciences. Which Round-trip Samples? • 对于重传的报文段,收到的ACK有两种可能: 1. 2. Ack是第一次传输的报文段的 Ack是第二次传输的报文段的 • TCP 源端无法区分着两种情况 • Karn使用下面方法解决这个问题: – – – – 不要使用重传的报文段测得的RTT更新SRTT和SDEV 当重传发生时,计算指数退避RTO 对后续报文段使用退避 RTO计算,直到收到一个未被重传报文段 确认为止 然后使用Jacobson算法计算后面的 RTO Network Design and Analysis-----Wang Wenjie Congestion Control : 38 © Graduate School , Chinese academy of Sciences. Window Management • Slow start • Dynamic window sizing on congestion • Fast retransmit • Fast recovery • Limited transmit Network Design and Analysis-----Wang Wenjie Congestion Control : 39 © Graduate School , Chinese academy of Sciences. Dynamic Window Sizing on Congestion • 慢启动方法在初始化连接时很有效,在出现拥塞时可进行调整 • 出现一个丢失的报文段表明出现了拥塞 • 谨慎和保守的方法是重新设置 cwnd为1,然后开始慢启动过程 • 实际上这还不够保守: “让网络进入饱和很容易,当让网络 从中恢复却很难” (Jacobson) • 这样: 使用慢启动,然后当cwnd达到一个门限值后就线性增 加。 Network Design and Analysis-----Wang Wenjie Congestion Control : 41 © Graduate School , Chinese academy of Sciences. Illustration of Slow Start and Congestion Avoidance Network Design and Analysis-----Wang Wenjie Congestion Control : 42 © Graduate School , Chinese academy of Sciences. Computing CWND • Upon connection establishment: – cwnd = segsize – ssthresh = 65535 • When a timeout occurs, – ssthresh = cwnd/2 –cwnd = segsize • When a new Ack is received, – If cwnd<=ssthresh, cwnd += segsize – Otherwise, cwnd += segsize*segsize/cwnd Network Design and Analysis-----Wang Wenjie Congestion Control : 43 © Graduate School , Chinese academy of Sciences. Fast Retransmit (TCP Tahoe) • 重传定时器RTO常常比 RTT要大很多 • 如果一个报文段丢失了,则 TCP可能不能及时重传 • TCP规定: 如果一个报文段未按顺序到达,则必须立即发 送一个ACK • Tahoe/Reno快速重传规则:如果收到了 同一个报文段的4 个 ack (即3个重复的ack),该报文段丢失的可能非常大,则 发送方就可以立即重传,而不需等待时间到。 Network Design and Analysis-----Wang Wenjie Congestion Control : 44 © Graduate School , Chinese academy of Sciences. Fast Retransmit Triple duplicate ACK Network Design and Analysis-----Wang Wenjie Congestion Control : 45 © Graduate School , Chinese academy of Sciences. Fast Recovery (TCP Reno) • 当TCP使用快速重传机制重传一个报文段时,它是假定该报文段丢失了 • 这时,TCP应采取拥塞避免措施 • 简单的方法是使用slow-start/congestion avoidance方法。 该方法有时过分保守:多个ACK返回本身就说明数据报文段正相当经常地到 达对方 • 快速恢复:当第三个重复的ACK达到时,重传该丢失的报文段。 设置门限值 threshold为cwnd的一半,设置拥塞窗口为: threshold +3, 继续以线性增 长方式进行下去。 • 这避免了初始阶段的慢启动 Network Design and Analysis-----Wang Wenjie Congestion Control : 46 © Graduate School , Chinese academy of Sciences. Fast Recovery Example Reno Fast Recovery (simplified) Tahoe Slow Start Network Design and Analysis-----Wang Wenjie Congestion Control : 47 © Graduate School , Chinese academy of Sciences. 受限传输 • TCP检测报文段丢失有两种方法:自适应定时器和快速重传 • 快速重传设计的目的是克服重传超时机制在某些情况下反应 速度慢的问题 • 其问题是:如果发送方拥塞窗口太小,快速重传机制可能不 能触发。如 cwnd = 3。这里有几个问题: 1. 在什么情况下发送方的拥塞窗口太小? 2. 该问题常见吗? 3. 如果常见,为什么不通过减少触发快速重传所需要的ACK的个数 解决它? Network Design and Analysis-----Wang Wenjie Congestion Control : 48 © Graduate School , Chinese academy of Sciences. Limited Transmit Algorithm • 受限传输算法要求TCP发送方在满足下面三个条件时传送一 个新的报文段: 1. 收到两个连续相同的 ack 2. 目的端TCP实体的通告窗口允许该报文段的传输。也就是说源端 TCP实体有足够的信用量可以用于发送一个新的报文段 3. 在发送出去了这个新报文段之后,发送出去的数据量不超过 cwnd +2 Network Design and Analysis-----Wang Wenjie Congestion Control : 49 © Graduate School , Chinese academy of Sciences. TCP Performance Network Design and Analysis-----Wang Wenjie Congestion Control : 50 © Graduate School , Chinese academy of Sciences. Analysis of TD-Only Scenarios Network Design and Analysis-----Wang Wenjie Congestion Control : 51 © Graduate School , Chinese academy of Sciences. Evolution of Window Size Network Design and Analysis-----Wang Wenjie Congestion Control : 52 © Graduate School , Chinese academy of Sciences. Packets Sent during a TD Period Network Design and Analysis-----Wang Wenjie Congestion Control : 53 © Graduate School , Chinese academy of Sciences. Average Window Size Network Design and Analysis-----Wang Wenjie Congestion Control : 54 © Graduate School , Chinese academy of Sciences. Average TCP Throughput Network Design and Analysis-----Wang Wenjie Congestion Control : 55 © Graduate School , Chinese academy of Sciences. Taking Timeouts into Account Network Design and Analysis-----Wang Wenjie Congestion Control : 56 © Graduate School , Chinese academy of Sciences. Evolution of Window Size Network Design and Analysis-----Wang Wenjie Congestion Control : 57 © Graduate School , Chinese academy of Sciences. Network Design and Analysis-----Wang Wenjie Congestion Control : 58 © Graduate School , Chinese academy of Sciences. Network Design and Analysis-----Wang Wenjie Congestion Control : 59 © Graduate School , Chinese academy of Sciences. Network Design and Analysis-----Wang Wenjie Congestion Control : 60