Scalable Distributed Router Mechanisms to Encourage Network Congestion Avoidance by Rena Whei-Ming Yang Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY February 1998 @ Massachusetts Institute of Technology 1998. All rights reserved. Author .......... Department of Electialngineering and Computer Science February 9, 1998 Certified by. John T. Wroclawski Research Scientist Supervisor Th Z7 Accepted by Arthur C. Smith Chairman, Department Committee on Graduate Theses I . - ~.i ~ Eng Scalable Distributed Router Mechanisms to Encourage Network Congestion Avoidance by Rena Whei-Ming Yang Submitted to the Department of Electrical Engineering and Computer Science on February 9, 1998, in partial fulfillment of the requirements for the degree of Master of Engineering in Electrical Engineering and Computer Science Abstract Many forces have led to an increase in the amount of network traffic which is nonadaptive in the presence of congestion. This non-congestion-controlled network traffic is difficult to constrain because of the aggregating process which occurs within the network. Previous attempts have been made to enforce compliance to network feedback by isolating and regulating individual flows, but with these come considerable development and/or computational costs. However, several new designs are emerging which allow less costly access to information to identify traffic flows which could be considered non-congestion-controlled and which provide ways to penalize these flows locally. This thesis studies a means of using such mechanisms to identify nonadaptive network flows, and proposes a protocol to push this information, along with penalization responsibility, towards the flows' sources. This reduces the negative effects that these flows have on adaptive network traffic competing for the same resources. We propose a design for such a pushback protocol, build a network simulation of this pushback protocol integrated with an identification and penalization scheme, and analyze its effectiveness in constraining nonadaptive network traffic in several scenarios. Thesis Supervisor: John T. Wroclawski Title: Research Scientist Acknowledgments Many people have provided valuable contributions and support during this thesis' execution and writing. I would like to thank these people, in particular, for their help: John Wroclawski, my thesis advisor: for his patience, and insightful comments and conversations. The conceivers and writers of the Advanced Network Architecture's latest DARPA proposal [4]: for the idea of this pushback protocol. The real and virtual inhabitants of NE43-537: for their feedback, conversation, and ideas. I especially want to thank Wenjia Fang, Rob Cheng, Dina Katabi, Elliot Schwartz, and Dan Lee: for their passionate interest in everything they do and their keen evaluations and questions about the design of this protocol. My family: for their trust and belief in me, and their guidance throughout my life. All of my friends, especially Art Housinger: for their support, their words of encouragement, and their belief that I could finish this thesis, even when I had none. The developers of RED and its penalty box extensions, FRED, and ns: for the infrastructure and ideas without which I could not have developed any of this. The Defense Advanced Research Projects Agency contract #DADT63-94-C-0072: which provided financial sustenance whilst I was doing all of this research. Anne Hunter: for her vast knowledge of the answers to all questions, her patience and tolerance, and her belief in the student at MIT. Contents 10 1 Introduction 2 Background ................ 1.2 Motivation ........... 1.3 Goals ........... 1.4 Thesis Overview ............................. .. ........ .. .. ...... ...... 11 12 .............. .. 13 14 Existing Measures 2.1 Per-Flow or Quasi-Per-Flow Constraint . ................ 2.2 Lightweight Identification with Penalization 2.3 3 10 ............ 1.1 2.2.1 RED with Penalty Box ................... 2.2.2 FRED ............ 14 15 . ............. 16 ... 17 ............. .. 18 Reserved Bandwidth ........................... 18 . ....... ...... 2.3.1 RSVP .............. 2.3.2 Expected Capacity Framework . ................. 18 20 Design Issues 3.1 Protocol Expressiveness 3.2 Propagation ........... 3.3 3.2.1 Path Determination. . .................. 3.2.2 Flow Granularity ............ .. Pushback Timeouts ................... 3.3.2 Penalty Timeouts ......... 23 ............ . ... . . .. .......... ....... Timeouts/Time Scales ................... 3.3.1 21 ...... ................... .... .. ... . ...... 23 24 26 26 27 4 5 3.4 Penalization ............... 3.5 Trust Issues .................. 3.6 Contiguous Propagation ............. 28 . ............ 28 ................. ...... 29 Design Description 30 4.1 High Level Operation of Integrated System . .......... 4.2 Implementation . .................. . . . . ... 4.2.1 Granularity 4.2.2 Pushback Packet 4.2.3 Nonadaptive Identification Test . .............. 4.2.4 Pushback Sender ........... 4.2.5 Penalization ................... 4.2.6 Pushback Receiver ................... 4.2.7 Trust ....... ........ ................... ....... .......... . .... . . . ....... ...... .. . ....... .. . 35 . 37 .......... 37 39 5.1 Simple Pushback 5.2 Robustness in Adverse Environments . ................. .................. 5.2.1 Lossy Environments 5.2.2 Routing Dynamics 32 33 ....... .... 31 31 . ... 30 31 Simulations/Evaluation 5.3 6 ... .......... ................... ....... 39 43 ... ............ Cost ................... .. . ............. 5.3.1 Overhead/processing 5.3.2 Effective Time Scale of Protocol . ................ ................... 43 . .. .. Evaluation ....... 6.2 Remaining Issues/Future Work 48 . 48 49 Conclusion 6.1 47 55 ...... .............. ................... 55 .. 56 List of Figures 20 . ................. 3-1 Local Identification and Penalization 3-2 Identification + Pushed Back Penalization . .............. 4-1 Pushback Packet Format . .................. 5-1 Simple Pushback Network Topology . .................. 40 5-2 No nonadaptive constraint, on links L1 and L2 . ............ 41 5-3 FRED nonadaptive constraint + local penalization, on links L1 and L2 42 5-4 FRED nonadaptive constraint + 1 pushback, on links L1 and L2 . .. 5-5 Lossy Network Topology ............ 5-6 Loss rates 0-20%, measured on link L1, L2, and L3 . ......... 44 5-7 Loss rates 30-50%, measured on link L1, L2, and L3 . ......... 45 5-8 Loss rates 60-80%, measured on link L1, L2, and L3 .......... 46 5-9 Routing Dynamics Topology .......... 21 ...... .... .. ...... ........... 5-10 Short term route oscillations: uptime mean=1s, downtime mean=0.1s 31 42 43 47 51 5-11 Medium term route oscillations: uptime mean=10s, downtime mean-ls 52 5-12 Longer term route oscillations: uptime mean=20s, downtime mean=20s 53 5-13 Longer term route oscillations: uptime mean-=50s, downtime mean=5s 54 Chapter 1 Introduction 1.1 Background In 1986, the Internet suffered a series of congestion collapses, or periods when the network was fully or nearly fully utilized, yet little of the network traffic sent was actually arriving at its destination. These collapses occurred because currently existing transport protocols lacked sufficient adaptive mechanisms to deal with congestion. In 1988, several algorithms were suggested to increase Internet stability via end to end transport protocol use of congestion control mechanisms [11]. These modifications by Jacobson for congestion avoidance and control, which included guidelines for adapting host sending rates and timing behavior to create a more stable system during times of congestion, corrected certain features of the original TCP specification which were found to be inadequate. These modifications were distributed and adopted widely with the Berkeley UNIX operating system, BSD. Indeed, RFCs 1122 and 1123: "Requirements for Internet Hosts" [1, 2] actually demand that all hosts implement Jacobson's modifications to the original TCP specification. Although hosts complied with these demands, it was obvious that the network architecture had no mechanism to enforce such network standards and policies. 1.2 Motivation More recently, the expansion and commercialization of the network, poor implementations of transport protocols, new types of network services, and the ever increasing pressure for performance of network products, have led to an increase in the amount of network traffic which is non-congestion-controlled. Floyd and Fall [8] identify two dangers with this trend. The first of these dangers is that by ignoring congestion indications by the network, these nonadaptive network flows may bring about conditions which could lead to congestion collapse. The second danger is that these network flows, by being nonadaptive, capture a disproportionately large fraction of the available bandwidth during times of congestion, relative to adaptive network flows. This may cause acute unfairness and disadvantage to adaptive network flows, and forms one of the incentives for being nonadaptive. To conserve resources, the network aggregates traffic, losing its ability to extract details about the behavior of any of the individual flows that it contains. This lost detail is exactly the information necessary for any nonadaptive identification and constraint mechanism to operate, and so non-congestion-controlled network traffic which is being deployed remains unconstrained. Given the increase in this new generation of non-congestion-controlled traffic and its dangers, there is an immediate, urgent need for moderation [3]. Previous attempts have been made to enforce compliance to network feedback by isolating and regulating individual flows. Unfortunately, although this design limits the effects of misbehavior within the network, it also incurs considerable development and/or computational cost. The two goals of enforcing network feedback and minimizing network cost may seem at odds with one another. However, several new designs are emerging which appear to address both of them. Initial work has been done to use less costly means of extracting information to identify traffic flows which could be considered misbehaving and to penalize these locally. This thesis continues in such a direction, but attempts to increase efficiency in dealing with misbehaving traffic by using the resources of upstream neighbors to push penalization closer to the flow's origin and constrain the flow before it can cause damage, and to free the resources of heavily loaded central nodes. We propose to use mechanisms to help identify network flows which may be nonadaptive in the face of congestion, then to integrate such identification tests with a pushback protocol, to push the identity of nonadaptive network flows towards the source of these flows, and to penalize these flows at some node closer to the source of the nonadaptive network flow's source, decreasing the congestion in the intervening network nodes, and possibly, reducing the disadvantage to some of the adaptive network flows competing for the same resources. 1.3 Goals In designing such a system, we must keep several goals in mind. There are several layers to such goals. The global goal is to encourage end-to-end congestion control by creating incentives to do so, in the form of penalties against network flows which are non-congestion-controlled. Another goal is to minimize the negative effects of existing nonadaptive network flows. More specific to our protocol: * In the case when no other routers respond to pushback protocol, our system should produce network behavior similar to a system which locally identifies and penalizes nonadaptive network flows. This implies no significant delays in identification or penalization, and also, no changes to the regulated flow set. * It should also be backwards compatible with both routers which do not run a nonadaptive test, and routers which do not understand this protocol. At its heart, this protocol is an optional addition to the current capabilities of the network. Because of this, its use should not noticeably degrade the functioning of a router, regardless of the presence or absence of other routers which understand the protocol. * In the case when other routers respond to pushback protocol, the system should produce network behavior at least as efficient as a system which locally identifies and penalizes nonadaptive network flows, and more efficient network behavior in the common case. * The system should be robust. The protocol should be able to adapt to a considerable amount of failure in routers or links over a reasonable period of time. It should also adapt to routing changes over time. * The system should require minimal network and processing resources. It should provide a benefit commensurate with its overhead. Cost of the protocol should be measured according to the amount of scarce resources it requires. 1.4 Thesis Overview In the next chapter, we discuss different approaches that have been suggested in the past to address the existence of nonadaptive network flows. Chapter 3 discusses the design space for the integrated identification, pushback protocol, and penalization system. Chapter 4 describes the system that we have designed and built in the ns environment [16]. Chapter 5 attempts to illustrate the actual effectiveness of our system in the presence of nonadaptive network flows in several network scenarios through simulation, and evaluates the design that we have created. Chapter 6 draws conclusions from the simulations we have run and discusses areas that require further study. Chapter 2 Existing Measures The changing climate of the network and the observation that the network architecture has no mechanism to enforce network congestion avoidance and control policies has led to a number of proposals for dealing with nonadaptive flows. Instead of assuming that network users are universally compliant to congestion notification and trusting that endpoints will be congestion controlled, they instead assume that users may be selfish, and that some means of enforcing or encouraging congestion control should exist within the network. In this chapter, we cover three categories of such measures. 2.1 Per-Flow or Quasi-Per-Flow Constraint The first category of such measures is based on fair queuing. The fair queuing designs attempt to partition and isolate network flows from each other so nonadaptive flows are placed into their own queue [7, 18]. This enforces "fairness" in the system by limiting the amount of queue resources a given flow can consume. In a busy router using fair queuing, a traffic flow which attempts to gain a larger portion of the bandwidth would only increase its own delay. Although this design enforces good behavior of the traffic flows, it is accompanied with high cost to the network, either in processing cost to the router, or in development of specialized hardware to optimize performance. This approach also does not scale well, because the resources to execute fair queuing increase linearly with the number of flows passing through a node. A number of efforts have been made to decrease the resources necessary to maintain the good behavior of fair queuing. Stochastic fairness queuing is one such effort [17]. In stochastic fairness queuing, flows are hashed into queues, instead of being mapped to a particular queue in a. strict one-to-one manner. These queues are periodically rehashed, so any unfairness which exists because adaptive flows are hashed into the same queue with nonadaptive flows is transient. This design requires a constant set of resources, but as the number of flows increases, its ability to warrant good behavior degrades. 2.2 Lightweight Identification with Penalization The following two approaches are router mechanisms to extract information which has been lost by the aggregation process without having to resort to costly per flow scrutiny. These lighter weight mechanisms provide a means to identify network flows which exhibit nonadaptive characteristics and separate them from the aggregated flow for closer scrutiny and/or penalization. Random Early Drop (RED) [9] forms the basis for both of the lightweight identification methods covered in the following sections. In and of itself, it is a queue management algorithm for congestion avoidance in packet switched networks. It executes this queue management function by probabilistically dropping packets depending on its estimate of a time averaged queue length. If the time-averaged queue length lays below a certain threshold, it drops no packets. In its congestion avoidance phase, it preemptively drops packets with a probability which is proportional to the time averaged queue length. If the time averaged queue length rises above a certain threshold, the RED node begins to drop all packets. Although its purpose is not to detect non-congestion-controlled network flows, information gathered from its queue maintanence activities can be used to for this purpose. 2.2.1 RED with Penalty Box The Penalty Box extension [8] to RED uses RED's congestion avoidance and control packet drops to sample the network traffic. One very useful characteristic of the packets that the RED algorithm drops is that they are a representative sampling of the traffic flowing through that gateway; the number of dropped/marked packets from each flow is proportional to the bandwidth that these packets are consuming. RED statistical sampling works on the assumption that there is some amount of congestion in the network, and because RED drops packets with a probability which is proportional to the time averaged queue length, its "sampling" frequency is proportionate to the congestion level at the router while RED is in its congestion avoidance phase. The penalty box extension to the RED queue management design utilizes this characteristic to sample the network traffic flowing through a node. From a limited number of samples, it is able to identify candidates for further investigation. From a window of information defined by a history of RED drops, the router calculates a normalized drop metric which has been shown to accurately estimate the arrival rate for high-bandwidth flows. The high-bandwidth flow candidates are then tested for the presence or absence of different characteristics of adaptive network flows. The first two tests, TCP- unfriendliness and unresponsiveness, estimate the arrival rate of particular suspicious flows based upon the RED packet drop history, then compare this arrival rate to what it should be for an adaptive network flow. In the TCP-unfriendliness case, a comparison is made to a computed maximum sending rate of any TCP that is conforming to the required TCP congestion avoidance and control algorithms. This test requires preconfiguration of an estimate of the minimum round trip time and maximum packet size for all flows passing through that node, and focuses specifically on detecting flows which do not behave like TCP, a widely used adaptive network protocol. The unresponsiveness test is a bit more general. It operates over time, using the fact that adaptive flows' sending rate estimates should decrease in response to increases in the long term drop rate. A comparison is made to bandwidth an adaptive flow should have, given the changes in drop rate in the previous epoch. The very-high-bandwidth test uses the observation that adaptive network flows generally do not occupy disproportionate fractions of the bandwidth. After a group of misbehaving flows has been identified, they must be regulated. The "penalty box" portion of the mechanism partitions network flows into a group which has failed the misbehavior tests and group which has not and schedules constrained flows in penalty. In the simulations of [8], penalized flows are placed into a separate class-based queue, where the congestion level is designed to be higher than the congestion level of the partition containing unpenalized network flows. And, like the fair queuing mechanism, in times of congestion, the more a nonadaptive flow sends, the more it will congest its own limited resources along with the limited resources of other penalized flows. 2.2.2 FRED The FRED algorithm is another mechanism to identify and constrain nonadaptive network flows using information gathered from the RED queue management system [15]. It proposes one conceptual addition to the original RED mechanism in order to deal with nonadaptive network flows. It observes that a flow's output share is proportional to the input queue share it is able to capture. Even with RED, input queue representation is proportionate to a network flow's arrival rate. Because nonadaptive network flows tend to consume a larger proportion of the input queues, they also consume a larger proportion of the bandwidth exiting a node. From these observations, FRED proposes that the node should constrain any network flow which is consuming a disproportionate share of the node's buffer space. It accomplishes this by keeping records of all the active flows. It defines active flows as those flows which have packets buffered at the local node. By only tracking active flows, it limits the amount of information that it must maintain to a constant level based on the number of buffers it has. Per-active-flow, it tracks the instantaneous number of packets queued. If a flow attempts to queue more than a maximum threshold of packets, its strike factor is incremented. After a certain number of strikes, the flow is marked for constraint. Under constraint, if this flow's packets cause it to surpass the per-flow packet average, those packets will be dropped. Its estimate of the per flow average queue length is RED's time averaged queue estimate divided by its count of the current number of active flows. Like RED with Penalty Box, its use of buffer constraint assumes a certain level of congestion at that point in the network. 2.3 Reserved Bandwidth The next category of measures observes that the current network architecture is inadequate to support network flows which require guarantees beyond the best-effort service of the existing network. This category of measures offers a means for the resources of nonadaptive flows to be requested and provides a way for the network to control and partition its own bandwidth and resources, but still suffers from the basic problem of nonadaptive flows which do not provision for themselves or flows which provision for a certain amount of resources and send beyond this amount. 2.3.1 RSVP The ReSerVation Protocol (RSVP) [23] is an IP-based approach that provides end users with a means of requesting the network resources that they require. RSVP provides a way to describe any flow specification that might require a reservation of network resources and a way to describe many different styles of resources that may be reserved within that network. It also provides a way of installing state into the network which will be robust and dynamic enough to allow for changing conditions in the network, as well as a means of propagating this information so that it may form a path between a sender and a receiver. 2.3.2 Expected Capacity Framework The Expected Capacity Framework [6] is another approach which provides end users a means of requesting a specific level of service from the network. Instead of the costly RSVP approach to reserving resources with absolute guarantees, this mechanism provides a means for networks to offer known levels of resources to end users with high assurance of availability, and it deals with end users which send excessive traffic by aggressively dropping their packets. It does this through the efforts of two processes, a tagger located at the edge of the network, and a dropper, located more centrally within the network. The tagger's function is to tag packets with a bit indicating whether that packet is "in" or "out" of a previously agreed service profile. During times of congestion, the dropper discards packets. It drops "out" packets more severely that "in" packets. Chapter 3 Design Issues Existing mechanisms suggested in earlier work show that routers may be able to provide a scalable means to identify misbehaving users, and to penalize these users at the local router. Although identifying and penalizing misbehavior locally leads to a very self-sufficient system, a certain amount of overall efficiency is lost. nodes in this direction , benefit from penalization __ R3 identifying and penalizing router wasted bandwidth on misbahving flow's traffic _ _ _ R RR2 SRC misbehaving traffic source Figure 3-1: Local Identification and Penalization It is lost in bandwidth spent carrying the nonadaptive flow's traffic through the network to a point of congestion, and is lost in penalizing these flows in congested central nodes, whose processing time could be more wisely spent forwarding packets. This thesis suggests a mechanism through which we may use such local algorithms to test for nonadaptive network flows, but through which we may use the resources from upstream routers to add efficiency to the overall system. Instead of waiting until a point of congestion to penalize a flow, we use the resources of upstream neighbors to push penalization closer to the flow's entry point into the network and constrain that flow before it can cause congestion or unfairness. This also frees the resources nodes in this direction benefit from penalization pushback identifying router pushback penalizing router misbehaving traffic source Figure 3-2: Identification + Pushed Back Penalization of heavily loaded central nodes by pushing the responsibility of penalization towards the edges of the network. This mechanism will be provided through a pushback protocol which will act as a means to transfer information from a sensor to an actuator, from the identifier's identification tests to the penalizer's penalization policy. Several issues will be covered in this chapter concerning the design of such a protocol. 3.1 Protocol Expressiveness In order to be effective, the information carried by the pushback protocol must be sufficient to effectively transfer control of penalization from the identifying node to any potential penalizing nodes. This information may be divided into two parts: * Flow identifier * Penalization information The flow identification portion must provide enough information for unique identification at the smallest granularity flow that could be reasonably penalized. Because most systems will likely use classification of packet headers as a means of identifying those flows which require penalization, this should probably include at least the flow's source address, source port, destination address, and destination port. Penalization information includes any information that a remote node would need to have in order to determine a suitable penalty. The penalization information ex- pressed by the protocol must be general enough for a majority of nodes that might accept this protocol to map into a policy that they can execute locally. Exactly what is expressed in this penalization information depends on who controls what is placed in these parameters. Control of this information may lie with the initiator of the protocol (the identifier), with the receiver of the protocol (the eventual penalizer), with some mixture of these two, or from some outside source. If this responsibility lays with the identifier, the initiator of the protocol must suggest to the penalizer a penalization action. The penalizer then requires no processing in order to figure out what penalization to execute. This may seem attractive at first, but with the great diversity of receivers for the protocol, expressing penalization goals in a way that all nodes are capable of executing, or mapping to a capability that they do have, becomes complex. If the responsibility lays with the penalizer, the initiator of the protocol must express information about its internal state, and the penalizer will process this information in order to form a suitable local penalty. The strength in this is that the penalizer may have more information about its local environment than the identifier. With the combined information about the identifier's internal state, its own capabilities, and its local environment, a penalizer node may be able to generate a more suitable penalty than the identifier. Any information expressed by the protocol must not rely on relative state at the identifier's location. For instance a scheduling penalty, which relies on local congestion levels, may have little effect at a remote router if the congestion level at that remote router is negligible. 3.2 3.2.1 Propagation Path Determination In order to propagate penalization information to a node that will be able to have an effect on a nonadaptive network flow, we may choose to send messages upstream, towards the source of that flow, or downstream, along with the nonadaptive traffic. If we send the message upstream towards the flow's source, penalization may be installed directly into one of these upstream nodes; if we send messages downstream towards the flow's destination, a higher level mechanism must exist to penalize that flow from downstream. Complications exist in either direction. In the upstream direction, there is no established way to determine the previous hop for a particular flow, and not all network paths are bi-directional. In the downstream direction, although routing tables provide an easy and established means of propagating messages through the network, some sort of higher level mechanism or policy must exist to penalize a flow from a downstream location. Because we do not wish to focus on building such a higher level policy to penalize network flows using the downstream method, and because we currently cannot rely upon the existence of such a policy, we explore in this section a number of methods for determining and propagating information in the upstream direction for a network flow, given a packet from that flow. What we require is an inverse flowpath function, a one-to-one mapping between information available in a packet and its flow's previous hop. Routing protocols today specify ways to access next hop information for packets, given the packet's destination. Some people discussing next generation routing protocols suggest that these protocols should also include ways to get previous hop information. In the absence of such protocols providing previous hop information, we must form our own ad hoc mechanisms for finding upstream routers. If one assumes symmetric paths, one can use the same routing tables which are used to forward packets, to propagate them backwards. Research shows [19], how- ever, that symmetric routing is often far from the reality of the current network. If penalization does not occur on the same path as data is forwarded, however, our protocol is useless. Many link layers in the network today, like ethernet and point-to-point, have packet headers which include previous hop information. If we assume that a majority of link layers within the network provide access to previous hop information, we may be able to extract previous hop addresses directly from the link layer information of a flow's packet. One of this approach's assets is its low bandwidth consumption. Its negatives include its dependence upon link layer information which may or may not be accessible, its inflexibility, and its violation of layer abstractions. Also, in some more complex routing/bridging instances, the previous hop information available from the link layer may be incorrect. On the other end of the spectrum, one can use some form of broadcast, multicast, or constrained broadcast to find an upstream path. Using plain broadcast, one could imagine a pushback message being propagated to all possible previous hop nodes for a flow, but constraining this broadcast might more wisely limit the number of nodes unnecessarily effected by the pushback protocol. This constraint on the broadcast might require testing of whether a node is receiving a certain amount of packets from the suspect network flow or relate to packet forwarding information. This approach, although requiring less bandwidth, incurs a significant amount of latency in testing the suspect flow. Its assets are that it doesn't need to know about link layer details, but its negatives are its latency and processing cost or its bandwidth consumption. 3.2.2 Flow Granularity Flow granularity is the size chosen to define a flow, such as a TCP connection or all the packets from a particular host. It dictates how network traffic will be aggregated in the different stages of our protocol: identification, propagation, and penalization. The granularity of identification defines the size of flow that is tested in a nonadaptive identification test, the granularity of propagation effects the number of message that are sent through the network, and the granularity of penalization helps to define which packets are grouped together in a penalty. If these granularities are too small, large amounts of resources will be necessary to keep track of each of these small flows, or the number of limited resources allocated for handling nonadaptive flows will not have enough detail to control these flows; if the granularity is too large, there is a danger that adaptive flows may be cocategorized with one or more nonadaptive flows and thereby doubly penalized, once by the penalization scheme, but once more because they will suffer excess congestion by being categorized along with any nonadaptive network flows. Identification, propagation, and penalization can all operate at many granularities in this system. One can take several approaches to forming the appropriate granularity; one can start with a small granularity and conglomerate, or one can start with a large granularity and refine. One can also alter or maintain granularity at different points in the pushback process. One approach is to predefine a constant granularity and inflict it upon the whole network. This has the advantage that it is easiest to implement, but different regions or levels of hierarchy may operate at different granularities, and the predefined granularity may be too large for some environments or too small for others. Another approach is to control aggregation in a distributed manner, changing granularities as needed between borders within the network. Each region in the network may then have the freedom to define its own granularity, and border nodes will be responsible for translating between the different granularities of neighboring networks. This is a very versatile system, but its complexity and latency may become issues. Complexity may be an issue because of the difficult task of designing such a translation, and latency may be an issue due to time spent translating granularities. A final approach may include defining a certain granularity for identification and propagation within the middle of the network, but using information available at the network's edge to help guide granularity decisions for penalization. This information comes in the form of "profiles", or sizes of aggregation that have been negotiated with edge nodes . Since these profiles are negotiated and retained at the edges of the network, the central portion of the network should not have to reason about granularities. The granularity in the central portion of the network must necessarily be small in order to be able to support any granularity of a profile at the edge. The disadvantage of this small granularity is that numerous messages will have to be sent propagating information about each of the numerous small nonadaptive flows, causing higher bandwidth consumption for the protocol. 3.3 3.3.1 Timeouts/Time Scales Pushback Timeouts Because we cannot assume that this mechanism will be ubiquitously deployed and because we are not guaranteed that our pushback messages will arrive at their destination, we must use some form of acknowledgment mechanism in deciding whether the previous hop that we have determined has taken up the responsibility to either push penalization back farther, or to penalize the specified flow itself. The absence of an acknowledgment along with a timeout will be our feedback. Such a pushback timeout should be set conservatively; it must be long enough for most packets to be queued locally, transmitted to the previous hop router, queued there, processed, and the acknowledgment to arrive locally and be processed. Several scenarios may have occurred if the local node does not receive acknowledgment of its pushback message. The previous hop node may have received the pushback message and may not support the protocol. Another case is that the previous hop node does understand the pushback protocol, but the pushback message that we sent may have gotten lost before it was received at that node. A last scenario is that the previous hop does understand the protocol, has sent an acknowledgment to us, but this acknowledgment has gotten lost on its way back to us. Because the network can generally be considered best effort only, particularly in a congested state, we must provision for these last two scenarios by retransmitting our pushback messages. The number of times we retransmit will depend upon the network resources we are willing to consume and the latencies we are willing to endure. 3.3.2 Penalty Timeouts There are also several reasons to provide a timeout to penalization. One constraint on our control system is our high level goal to encourage congestion control by penalizing nonadaptive network flows in such a way that their throughput is worse than if they were congestion controlled. The execution of this goal has the following effect: Once a penalization process is running at its destination, the original identifier, and in fact no nodes in the network upstream from the penalizer node, can sense the behavior or misbehavior of a constrained network flow. Because our goal is to penalize nonadaptive network flows beyond the point of being able to sense them, dynamically searching for a penalization operating point will work poorly; searching will only allow us to approach a point where these penalized nonadaptive network flows no longer exhibit nonadaptive behavior, but we require an operating point more severe than this. To encourage congestion control, we must punish severely enough that the allowed sending rate of a nonadaptive network flow is less than that of an adaptive network flow. Also, because identification of a nonadaptive network flow is merely a hypothesis, the network cannot punish a network flow indefinitely. This would require unnecessary network resources in many cases, and the network flow should be given a chance to amend its behavior and release its penalization. The combined goals of penalizing well past the point where we can reason about the behavior of a nonadaptive flow, robustness in the face of routing changes and node failures, and fairness to penalized network flows, warrant a soft state approach with a timeout mechanism on penalization and a refreshing of state if the nonadaptive flow is sensed once again. We do this by installing penalties into penalization nodes, then timing out after a particular period of time. This "penalization timeout" should be small enough that it will be sufficiently reactive to network changes and failures, as well as improved host behavior, but not so small that refreshing of the pushback protocol requires a significant portion of nodes' processing time or links' bandwidth. 3.4 Penalization A balance between resource use and penalization latency must be found in timing our penalization. Local penalization may start before the pushback protocol is even initialized, or as late as after one or more pushback timeouts. If local penalization occurs before the pushback, there is a danger that numerous nodes along the flow's path will be simultaneously penalizing the network flow, using up several nodes' processing time. Alternatively, if pushback timeouts are lengthy enough and penalization doesn't start until a number of these timeouts, the latency for penalization of a flow may become quite large. Penalization can also take on different qualities. Initiation and termination of these penalties can occur over different scales of time. One might imagine that a penalty could commence gradually or it might take effect immediately and absolutely. Penalization severity may range from allowing none of the specified network flow's packets to pass, to allowing all of its packets to pass through a penalizing node. The node controlling penalization also controls the severity of this penalization. The most useful way of setting such a penalization severity is to form a penalization severity proportional to the severity of the misbehavior of a flow, as determined by the information available. 3.5 Trust Issues In transferring control of penalization that could be done locally, we must be concerned with issues of trust within this system. A certain amount of trust is already inherently present inside of the network in the form of routing information. Routers must trust each others updates to a certain degree, in order to carry out the distributed routing algorithms of the present network. In this system, trust is required for efficiency of the protocol in addition to correct functionality. If nodes do not trust any of their neighbors and must verify all information relating to propagation and penalization, the time scale over which the protocol will operate and the processing overhead required to acquire and maintain the necessary information may be prohibitive. In the forward direction, an identifying node relies upon the cooperation and correct functioning of its neighboring nodes when it releases its penalization responsibilities. If its neighbors have incorrectly notified it that the penalization responsibility has been taken by another node, and it releases its own penalization of that flow, then a nonadaptive flow may remain unconstrained. On the other hand, the processing of numerous nodes may be temporarily devoted to penalizing a single flow if the identifying node releases this penalty only when it can no longer sense that flow's misbehavior. In the backward direction, a node must also trust the information that a supposed identifier has sent it. An abuse of this protocol or a malfunctioning of a node may constitute a denial of service attack to a network flow or to nodes whose processing resources must be used to penalize a network flow. If those nodes must test the penalization judgments of all supposed identifiers, however, the latency and processing power required to decide whether a flow is, indeed, nonadaptive, could cause problems. In some parts of the network, it may even be impossible to make such a judgment. 3.6 Contiguous Propagation A possible problem with this protocol is that propagation can only progress as far as the farthest contiguous node which understands the pushback protocol. This could pose a problem with deployment. A possible enhancement to this work is to find a means of skipping nodes which do not understand the protocol, to rendezvous nodes with prospective upstream points, or for nodes which are protocol compliant to advertise, and if an identifier can determine that such a node is on the path towards the source of a nonadaptive flow, it could send to them. Chapter 4 Design Description 4.1 High Level Operation of Integrated System We have implemented a particular design for an integrated identification, propagation, and penalization system within the ns network simulator environment. At a high level, the system operates in the following manner: on particular links in our network, we monitor network traffic with a slightly modified version of the FRED algorithm. If the packets queued by a particular active flow cause it to fail an adaptive test, we initiate our pushback protocol. We determine a previous hop, and send a pushback message containing penalization information and an identifier for the misbehaving flow, to an upstream router. If this upstream router node implements the pushback protocol, that router will acknowledge our pushback message, then continue to push this message upstream towards the misbehaving traffic's source. This recursive process will terminate at the last router on this upstream path which is knowledgeable about the pushback protocol; this router will take responsibility for penalizing the misbehaving traffic source. 4.2 4.2.1 Implementation Granularity The original granularity of all of our nonadaptive identification tests was that of a TCP connection. We will maintain this TCP granularity, although different nonadaptive identification tests may allow this to be set to other granularities. For propagation and penalization, we've chosen an IP source/destination address pair granularity. This choice has the advantage that it will not be fooled by applications which divide their network transactions into multiple TCP connections to increase throughput. No granularity changes occur within the network because of the uniform granularity used for propagation and penalization. 4.2.2 Pushback Packet Our Pushback Packet, however, contains parameters to express a flow's source IP address, source port, destination IP address, and destination port. This allows capability of expressing anything as small as a TCP connection. The Pushback Packet also contains fields for expressing penalization parameters. A type field allows for different kinds of penalization, and a variable amount of space is given for different penalization parameters. Currently our Pushback Packet can only support a penalization parameter set which includes a type descriptor, a numerical measure of the identifier's current state for a flow, a numerical measure of the identifier's target state for a flow, and a penalization timeout. Pushback Packet Information FlowlD Source IP Address Source Port Destination IP Address Destination Port Penalization Descnption Type Penalization Parameter Space Figure 4-1: Pushback Packet Format 4.2.3 Nonadaptive Identification Test The test that we use for identification of nonadaptive flows is what we call the FRED strike test 1. The tests developed in [8] can also be substituted, as well as any decisive and reliable nonadaptive tests which may be developed in the future. As explained in the Existing Measures chapter, the FRED test tracks the instantaneous queue lengths of all active flows. If this instantaneous queue length exceeds a certain threshold, the packet which pushed it over this threshold will be dropped and that flow's strike value will be incremented. After a certain number of strikes, the flow enters a constrained state. In its constrained state, the flow is allowed to queue no more packets than the average per active flow. We have modified the original FRED algorithm slightly in our implementation. This modification was made to solve a problem which does not actually occur with any frequency. The algorithm for FRED, as specified in [15] only allows flows to rid themselves of their misbehaving status within that node when all of their packets have been dequeued. Originally, it was thought that this would target both high bandwidth nonadaptive flows and long-lived adaptive flows which maintained a queue of packets at a node, when in fact, it is quite unusual for an adaptive flow to maintain a queue at a node for any extended period of time. We made the following modification to the original FRED algorithm to allow flows to rid themselves of their misbehaving status more quickly: If an active flow's does not misbehave for an iteration, it will be rewarded through a decrementing of its strike factor. This causes the nonadaptive constraint and pushback tests to be more lenient towards flows. This may create results slightly different from the original FRED algorithm, but since it is more lenient towards suspect flows, it should not invoke excessive penalization behavior and should not invalidate the results we present in this thesis. The actual criteria for pushback is more strict than that for dropping a packet; it is based upon the strikes that a flow has received over a period of time. We observe 1In this way, we are dependent upon the correct operation of the FRED algorithm for accurate identification of high bandwidth nonadaptive flows that at a given congestion level, high bandwidth nonadaptive network flows tend to accumulate strikes at a relatively swift rate. TCP flows, after an initial burst of strikes from slow start overestimation, tend to accumulate strikes at a much more gradual rate, while in TCP's linear congestion avoidance phase. We push back on flows which accumulate strikes several times faster than its peers. 4.2.4 Pushback Sender The Pushback Sender is the workhorse of the pushback portion of this system. It can be triggered either by the nonadaptive identification test in the identifying node or by the arrival of a pushback message from a downstream node. It resides in a Pushback Node, any node that contains a Pushback Sender/Pushback Receiver pair and a penalization mechanism. The Pushback Sender agent has many functions. Its first is to determine a recipient of a pushback message. We use available link layer information to find the previous hop's identity. A node finds this by scanning incoming traffic for packets which fit a flow's profile. When found, that packet's link layer address identifies the previous hop for the next stage of the pushback protocol. Since any suspect flow is, by the nature of the nonadaptive test, a high bandwidth nonadaptive flow, the latency between initiating the pushback process and finding its previous hop should be small. This approach assumes that there is only one previous hop for any given suspect flow. It may be expanded to find multiple previous hops by caching all of the previous hops found for a flow over a period of time, and sending to these. If a previous hop cannot be determined, then the host may choose to become the penalizer for a flow, or it may decide that it is not the correct recipient of that flow's pushback message. This may occur because of routing changes or failures in the network. There are several responses to this case. One is to penalize the flow locally. Another one is to refrain from ACKing so a downstream node's pushback times out and it becomes the penalizer. A third, which we have not explored, is to send a negative ACK, informing the downstream node that this node is unable or unwilling to perform the responsibilities implied by ACKing. Next, if a previous hop has been found, that previous hop is used as the destination for a Pushback Packet that is constructed. A flow identifier, an appropriate penalization timeout, and some internal state information about the identifier are inserted into the packet. This penalization timeout was chosen to be a constant value of ten seconds, but this parameter's value has obviously not been explored thoroughly enough. Further research is necessary to find an optimum value for most network conditions, or a means of finding such an optimum value dependent on measures of network conditions at the time of identification or penalization. The packet is then forwarded to the previous hop node. Meanwhile, a timer is set. This pushback timer is sufficiently long that the packet should have plenty of time to be transmitted, received, and acknowledged. Like the penalization timeout, this pushback timeout was set to a constant value. Again, this approach was taken for simplicity. One can certainly imagine a pushback timeout set by an estimate of a flow's arrival rate or some other measure. We also leave this as an area for future research. There is a decent chance that a flow, once it fails the nonadaptive test, will fail it once again sometime before either we receive an acknowledgment from a neighboring pushback-aware node or our pushback timeout expires. We do not wish to flood the network with repetitive information about a particular misbehaving network flow. For this reason, although the nonadaptive test may fail repetitively over a short span of time locally, we disable sending further pushbacks for this flow until the pushback timeout. After sending a Pushback Packet, the Pushback Sender must wait until the first of two events happen: * Pushback Acknowledgment * Pushback Timeout If the first event that occurs is the Pushback Acknowledgment, the local node cancels the pending pushback timeout and any penalties which have been initiated against that flow. If it trusts the acknowledgment, it may assume that its penalizing responsibilities are over. If the timeout is the first event that occurs, we may choose to retransmit the pushback message or give up trying to pushback. Because we will be operating in a best effort environment, we choose to retransmit several times before giving up. Each node in the pushback path which understands the pushback protocol will experience a similar pushing back process. Pushback should terminate before sending to the actual source of the traffic. 4.2.5 Penalization One node will be the final node in the pushback path. That node must initiate penalization. In our system, the penalizer controls the penalization severity; it computes this based on information which the identifier has sent it and some long term information which it has stored about suspect nonadaptive network flows. In our detection/penalization method, the penalty severity is computed from a ratio of the time average per-active-flow queue length maintained by FRED and the number of packets the penalized flow has in the queue, along with the number of offenses which a given flow has committed within the time span of our long term memory. For every offense that a particular flow has committed, the severity of the penalization is doubled. The timeout for ending the penalization is also doubled, up to a certain point. This limitation is set in order to remain responsive to network changes and failures. In this way, a repetitively sensed nonadaptive flow will require progressively less network bandwidth and less overhead in terms of the pushback protocol pushbacks. We have chosen the following style of penalization: Since we will be dealing with flows which are nonadaptive in the face of congestion, packet drops will not effect these flows in the same severe manner that they would effect an adaptive network flow. For this reason, we have chosen a moderately aggressive probabilistic packet drop approach for penalization, assuming a constant sending rate, so dropping a certain percentage will reduce a flow's bandwidth consumption by that percentage. In the design section, we mentioned that initiation and termination of penalization can occur at different time scales. Because we base our penalization severity on the arrival rate of a flow at the time of the pushback, the first estimation of a nonadaptive flow's arrival rate should be accurate. Also, because we send periodic pushback messages while sensing a nonadaptive flow, we do not install our penalties gradually; this might cause multiple pushbacks for the same flow. If we were to terminate our penalties gradually, the first sensing of the nonadaptive flow may be an inaccurate measurement of its unpenalized behavior. For this reason, we choose to install and terminate our penalties to an absolute degree, instead of gradually. In timing our penalization, if one attempt to pushback upstream fails, we begin local penalization. This is a middle ground between penalizing immediately and penalizing after the maximum number of timeouts/failures. It attempts to minimize the number of nodes penalizing a flow simultaneously in the network, but not incur the excessive latency of waiting for several pushback timeouts. The penalty itself is executed by a fragment of code included in the forwarding path of the penalizing node. A packet matcher will identify packets which belong to constrained flows, and these packets will be subjected to a probabilistic drop. The penalizing node is also responsible for updating a long-term data structure which records the behavior of suspect misbehaving flows. The primary use of this information is in reducing the system resources spent on continually misbehaving network flows. Although information within this data structure times out after a period of time, this timeout is on a time scale much longer than the pushback or penalty timeouts in this system. Ideally, if a nonadaptive network flow continues to misbehave, this offense information should be maintained until the flow's current penalization has expired and updated information about the nonadaptive flow has been propagated to the penalizing node. We choose this lengthy timeout to be several multiples of a flow's penalty timeout. This long term information is stored only in the penalizing router. Update of this offense information would ideally record the number of offenses a nonadaptive network flow has incurred throughout the entire network, but dynamic conditions in the network may make this impossible. It is possible to store such information in the identifier, in all nodes in the upstream path towards the penalizer, or in the penalization node. If it resides in the identifying node, changing congestion conditions in the network may change the identifier of the nonadaptive network flow. If this information is located in the penalizing node, routing changes may cause the penalizer to change. If the information is located in all nodes along the current path of the flow, this will require the update of several nodes, and offense information may still be globally inaccurate because of routing changes. We choose the location of this long term information to be the penalizer, because it minimizes the overall memory that must be updated, and is more stable than the identifier in the presence of changing congestion conditions. 4.2.6 Pushback Receiver Pushback Nodes also contain a Pushback Receiver. We use explicit acknowledgment of pushback messages. The Receiver's most fundamental purpose is to create and send these acknowledgment packets with the appropriate information, and to contact the Pushback Sender at the same node to initiate a further pushback. 4.2.7 Trust As we have for other components of this system, we choose a moderate stance on trust. We divide the network into two logical parts, that which lays within the same administrative domain (AD), and that which lies outside. Within the borders of an AD, we assume that nodes generally trust each other's judgments on misbehavior. Therefore, within a region of trust, for efficiency purposes, we propagate the judgment of misbehaving network sources with little verification. Between borders, however, we may choose to verify these opinions by using local means to observe and test the behavior of suspect flows over time, before propagating them through one's own network and incurring bandwidth and processing cost. This provides a means of propagating quickly and with little processing within an administrative domain, while protecting one's internal network from the opinions and malfunctions of the outside world. In the backward direction, border gateways may check that claims of misbehaving flows are real by initiating their own tests of a particular flow. If they cannot sense this or are unwilling to do such a test, then they may choose to not acknowledge the peer border router which sent them the pushback message, they may choose to send a negative acknowledgment to a downstream gateway, or they may choose to continue the pushback's propagation. In the other direction, if a border gateway senses that the upstream path hasn't sufficiently penalized the misbehaving network flow, despite acknowledgments that it receives, it may assume this responsibility itself. Chapter 5 Simulations/ Evaluation We have run numerous simulations on our integrated pushback system to help guide our design decisions, to gather evidence that our system operates in an expected manner, and to get an idea of what this system is capable of and what its limitations are in various network conditions. 5.1 Simple Pushback In this section, we attempt to show that our system is able to accurately identify highbandwidth misbehaving flows. We show evidence that it produces network behavior at least as efficient as a system which locally identifies and penalizes nonadaptive network flows, and more efficient network behavior in the common case. We also show that the protocol we have developed is able to operate correctly in the presence of routers which do not run a nonadaptive test, as well as routers which do not understand this protocol. Most importantly, we show that our system is effective in pushing congestion towards a flow's source, and that it helps to constrain the effects of existing nonadaptive network flows, as well as penalizing severely enough to encourage congestion control. The network we use in this section is shown in Figure 5-1. In the upper network is a TCP flow which is running the new Reno TCP algorithm (TCPnewreno) and a constant bit rate (CBR) network flow which is sending at 800Kbps. In the lower 1Mbps 1Mbps Figure 5-1: Simple Pushback Network Topology network, are two TCPnewreno flows which share a 1Mbps bandwidth link. The upper and lower networks converge on what may quickly become a highly congested 1Mbps link. Figure 5-2 shows simulations of the network traffic on links without any sort of regulating mechanisms for nonadaptive network flows. As one might expect in this environment, a TCP which shares any congested links with the high bandwidth constant bit rate source can manage to gain only the remaining bandwidth that the CBR does not occupy; in our network, this is very little. The TCP which shares the upper network with the CBR flow manages to gain 200Kbps, while the TCPs from the lower network are able to get almost none of their fair share of the bandwidth. Figure 5-3 shows network traffic with the FRED algorithm and nonadaptive network test running on the congested link between routers R3 and R4, along with a pushback-aware node R3. Note that no pushback is occurring in this case, only local penalization. Even with local penalization, we can already see some benefit to the adaptive flows. With local penalization, the two TCPs on the lower network benefit greatly, and are able to take advantage of the bandwidth that the penalized CBR is no longer consuming. The TCP which shares another congested link with the CBR, however, is still only able to achieve the remaining 200kbps that the CBR is not consuming. Figure 5-4 shows simulations with the FRED algorithm and nonadaptive network test running on the congested link between R3 and R4 and two pushback-aware nodes, at R3 and RI. In this network, penalization is pushed back once. At this point, the CBR allows the TCP of the upper network to find an operating point which is nearly an equal share with the other two TCPs on the congested link. With the CBR interfering minimally with any adaptive flows, each of the TCPs is able to gain approximately one-third of the bandwidth on the congested link.. One may note, that with the FRED mechanism running on the congested link, even when the CBR's penalization times out, it does not gain the full 800kbps that it might because the FRED mechanism constrains the queue space that it can acquire. Figures 5-3 and 5-4 show the soft state penalization and timeout cycle through the bandwidth measurements of the CBR flow. Initially, the CBR flow is sensed as high bandwidth nonadaptive by the FREI link. This information is pushed back to a penalization node and penalized for a period at a certain severity. Xfter a period of time, this penalization times out, and the oscillation begins again, with increasing penalization severity and timeouts for repetitively sensed nonadaptive flows. L2, no constraint L, no constraint melm 40130 Figure 5-2: No nonadaptive constraint, on links L1 and L2 41, 3u, LI, local penalization xi I L2, local peiaization INA 1 O lmsib& am an loaM lam Mi Figure 5-3: FRED nonadaptive constraint + local penalization, on links L1 and L2 e.wxIa LI, 1 pushback ' -- L2, I pushback la 1.2 aoo o t10,m 10an ZMOO LN an in W10000 in IM. Mm ZQQA Figure 5-4: FRED nonadaptive constraint + 1 pushback, on links L1 and L2 42 5.2 Robustness in Adverse Environments In this section, we show that our pushback system can handle a considerable amount of network failure in the form of lossy links, as well adapt to network topology changes due to routing dynamics over a period of time. 5.2.1 Lossy Environments 2 TCP 10Mbps 1Mbps 10Mbps 10Mbps 10Mbps R2 R3 R4 1M s 10Mbps R5 Ps SINK L2 LI 10Mbps CBR Figure 5-5: Lossy Network Topology Our lossy network is shown in Figure 5-5. On the lower network is a CBR network flow attached to a chain of links, ending in a congested link where this CBR flow and two TCP flows merge. These two TCP flows come from the upper network. At the merging link is running FRED with a pushback test process. All the links on the lower network are pushback-aware. The links in the CBR's upstream direction have been given increasing amounts of lossiness. This lossiness is in the form of a uniform probability packet drop. Figures 5-6 through 5-8 show the effects of increasing packet loss on the ability of the pushback protocol to operate. The effects of this lossiness on the pushback messages become clearly evident only after 20% packet loss of these pushback messages on each lossy link. Before that point, retransmissions are able to deal with packet loss sufficiently. Our decision to limit our protocol to three retransmissions causes some of the pushbacks to be lost at higher packet loss rates, however. We can see the effects of our decision to penalize after one pushback timeout. Following the sensing of a nonadaptive network flow, the penalized flow's bandwidth has a slight dip before it settles in to an extended penalization level. This is the point where multiple nodes are penalizing that flow. No Loss ia r ) i, i "u L rr - -I ii LW'i j II ~ ui U : r. 10% Loss L2 " i i H li -.~ i I- I i !j , , L1 >1 ,_.. 20% Loss L2 Ii - i I .i i -ini Figure 5-6: Loss rates 0-20%, measured on link L1, L2, and L3 44 30% Loss L2 --- II~ fi: -p , i ', TI, IiI - Lt~ ~ ~i - L ilil . -j Li Li - _ r, SI - i - 5 ii r I - 40% Loss L2 --- I ~1F tlrliR,1 la: ir~rt " 4 "ti ', -? i i i 1 i L3 --- --. "r wr "p -J .II " i -i I LIeas i m' - I - m L1 , i 50% Loss - - S L2 4-~i Ii IIr~~U~ ;-"---~ 50%6 Loss L2 me ari li •1. ii 1 ii -. ,r . . . . LJ m ii" , .. .i'. 4, ~i.-~-- Is 4 Ii . iAr 'Bf I . . . . . .. - , ~brq iiuar ;I II slbln ruuI Figure 5-7: Loss rates 30-50%, measured on link L1, L2, and L., 45 60% Loss I.3 7Jl "u I - 70% Loss I r- i U i mo .Y ,N . - , : . i In; Nmu I',,,- 80% Loss .. LI. ------ ..... L2 . . '70. 1~:: Figure 5-8: Loss rates 60-80%, measured on link L1, L2, and L3 46 5.2.2 Routing Dynamics MbpsMbps 1 1Mbps 1 Mbps 1 Mbps 1 Mbps L1 Figure 5-9: Routing Dynamics TopologyMbps R6 R7 1 Mbps Figure 5-9: Routing Dynamics Topology Figure 5-9 shows the network we used to test our protocol's responsiveness to routing changes in the network. A questionable link in our network, Lq, toggles between an up and down state with "uptimes" and "downtimes" exponentially distributed about a mean value. This causes the constant bit rate traffic to be sent on the upper network when that link is down, and on the shorter path lower network when the link is operational. FRED is running on the first link after the upper, lower, and TCP links converge. We experiment with the questionable link up and down times in order to understand our system more thoroughly in the presence of different rates of routing dynamics. As a control case, we pushback to the upper and lower network's common node, R5, where the CBR's path does not effect its penalization. We use this control to compare to the next set of more interesting measurements, where we push back further, to routers R3 and R6. We take these measurements on links L1, L2, and L3. The measurements on links L1 and L2 tell us that our protocol is able to correctly find the current path of the nonadaptive traffic, and push back in this direction. Whichever path the CBR flow is taking when our nonadaptive test senses it receives the pushback and penalization responsibility. The measurements on link L3 tell us what the net effect of pushing back does for the two competing TCPs which share a congested link with this constant bit rate traffic. If the protocol were to be instantaneously reactive to any routing changes, the measurements for the multiple pushbacks and our control case should be the same. Because we have chosen a timeout and refreshing soft state approach, which has a less hefty overhead, we cannot guarantee instantaneous reactions, but our approach seems to operate quite well in all cases but one. This case, as may be expected, is the case where the CBR traffic alternates routes at a rate much faster than our timeout/refresh interval. This case may be likened to route flutter in a real network. Although our system itself is not as effective in penalizing the CBR traffic at this dynamic a rate, our penalization of half of the paths that the nonadaptive traffic travels maintains a significant level of constraint on its traffic. 5.3 Cost Certainly a system such as ours is not without cost in operation. In this section, we evaluate a few parameters of cost that our system incurs. 5.3.1 Overhead/processing In our goals, we proposed that our system should require minimal network and processing resources. In this section, we begin to reason about the resources that our proposed pushback system would require. Each of the lightweight identification mechanisms covered in this paper require a constant amount of memory resources to identify a nonadaptive flow. By using such an identification mechanism, we inherit this memory overhead. In penalization, our system requires memory resources proportional to the number of identified nonadaptive flows, as does the local identification and penalization system. In addition, however, our system keeps track of longer term information about misbehaving flows. The major source of overhead in our system, however, is exacted from the inter- mediate points which participate in the pushback process. Each intermediate node must provide processing power to acknowledge pushbacks, determine previous hops, form packets, and forward them to the appropriate location. Intermediate links must provide bandwidth for forwarding pushback messages and acknowledgments. All of these resources are proportional to the number of nonadaptive flows identified in the network. Throughout our design, we have attempted minimize the amount of both bandwidth and processing time that our protocol requires. From our simulations, repetitively sensed nonadaptive flows are given progressively smaller amounts of network resources. Here we see the effects of this design when we compare a scenario where N different nonadaptive flows are identified over a short period of time, as opposed to a single nonadaptive flow identified N times. In the first case, the amount of long term information installed at the penalization nodes is linearly proportional to N. In the second, this long term information should be constant. The intermediate nodes of the first case also pay a much higher price per time period because repetitively identified nonadaptive flows require exponentially decreasing resources per unit time, up to the point where the penalization timeout reaches a maximum. 5.3.2 Effective Time Scale of Protocol A portion of the cost of this protocol is in opportunity cost in penalizing shorter term nonadaptive network flows. In this section, we address this concern briefly. The latencies of processing, queuing, and transmission of the pushback message may define a control difference between a pushed back penalization system and a system which identifies and penalizes locally. It may also define and constrain the time scale over which we can be effective against nonadaptive flows. The original local penalization mechanism could sense and constrain those flows on time scales longer than the combined time to identify a nonadaptive flow and initiate a penalization mechanism. Because of the way we have chosen to time our penalization after one pushback timeout, the system that we have designed has a smaller set of flows which it can effect, those which are longer than our integrated identification, propagation, and penalization setup. The time difference of penalization between a system which penalizes locally and our system lays in the number of hops our protocol is able to push back, the network conditions that we operate in, and the timeouts that we choose for our protocol. The more hops that our protocol is able to propagate a pushback message, the larger the amount of latency between identification and penalization. In the most ideal case, for each hop that our protocol propagates, only the time to process, acknowledge, and propagate may be added to our "time" overhead, along with the pushback timeout value set to decide whether penalization responsibility must be taken at the local node. In the worst case, retransmission timeouts must be added for loss of pushback packets and acknowledgments. The actual values of the pushback timeout will govern what sorts of latencies one could expect between identification and penalization, and actual opportunity cost will depend on the time scales that nonadaptive flows operate in our network. In our system, under ideal conditions, our nonadaptive test operates on the order of seconds, and our actual propagation time and timeout values are small in comparison, so pushing back only adds a small fraction of latency to the entire system. controk single pushback L1l - 4d -- -r -4---------- U mam., L t, multiple pushbacks L2 N-A=o~- lI ' -----r------~---- MU MU I 1, 1 L3 ~rv i' i .. m~..-- ii jI iW met me -a i *i i it tL::yrL 1 mar IMe -," I ' --~ 11 L --. rr~c~~~ij~5~Pyj -IN 1111 Figure 5-10: Short term route oscillations: uptime mean=ls, downtime mean=0.ls 51 ,control: singlepushback S -NAO- JLA L3 _ __ It ii ii am ' a u ' u SU pushbacks multiple U )U~l multiple pushbacks L2 Li ,--- ji1 dI,,, hi ..!ll t~r'.l Ii ~ \i-7 I i I Y4 - i -...Fd iiill i! ilI l. 'A Figure 5-11: Medium term route oscillations: uptime mean=lO0s, downtime mean=ls control: single pushback LI . 21 :1'Ij = MAN -~ -S - ! L2 11 Oui iii lu" , ix - - II- tII i i -4! ,,*1- I N -f LJ i , , -- ~-_~ UT U -- UL;... - O ,. -- '- U LU ... _._ r-- t it I ;-FT, , MiiG , ui -O S tMI .l ...... - /.... - u -- ;- u multiple pushbacks L2 Li ,M . 17, -i I - ~L3 ... 1 ms t I fiL am ITI ,-ta I a { um le ,I U S ,m . ... . ,1U _ . _ sU L. aU .. _ IP _ II Figure 5-12: Longer term route oscillations: uptime mean=20s, downtime mean=20s 53 control: single pushback -B ms I i am I multiple pushbacks r.,, L2 LI i 'i .. .. . tm a m mu L3 , n .. . . .. •. .. . . . m ....... -. . .,--L --- r-- _. . . . - mu Figure 5-13: Longer term route oscillations: uptime mean=50s, downtime mean=5s 54 Chapter 6 Conclusion 6.1 Evaluation The expansion and commercialization of the network, new protocol implementations and services, and the intense pressure for increased network performance have led to an increase in the amount of network traffic which is non-congestion-controlled. This trend will certainly not decrease in the future. With this non-congestion-controlled network traffic come the dangers of congestion collapse and severe unfairness to adaptive network flows. In the past, several attempts have been made to enforce compliance to network feedback by isolating and regulating individual flows, but only at great cost. More recently, several designs have emerged which allow less costly means to identify high bandwidth traffic flows which could be considered nonadaptive and to penalize these traffic flows locally. This thesis has attempted to take this work one step further. It proposes a protocol to push information about high bandwidth nonadaptive network flows back towards the source of this traffic, reducing its adverse effects on traffic competing for the same network resources and reducing the chance for congestion collapse. In this thesis, we have attempted to investigate the integrated identification, propagation, and penalization components of this pushback system. We have explored the design space for such a system, simulated this system to help reason about the param- eters of its design, and through these simulations, shown evidence of this protocol's effectiveness and appropriateness in reaching our stated goals. We have shown evidence of the protocol's ability to use neighboring node's resources to reduce adverse effects of high bandwidth nonadaptive network flows on adaptive traffic in several network topologies and network conditions. We have also shown the protocol's ability to operate at least as effectively as a local identification and penalization mechanism, even in the presence of high packet loss. We have further shown that although our system isn't as responsive as one which reacts immediately to routing changes, our protocol is quite effective in networks with routing dynamics which change at a rate slower than our timeout/refresh intervals, and moderately effective in networks with routing dynamics which change faster than our timeout/refresh intervals. 6.2 Remaining Issues/Future Work We have by no means fully explored the design space that we originally mapped out, though. Several issues remain for further exploration: * Setting penalization policy at different locations and in the presence of different network conditions. * Path determination, particularly the problem of propagation to upstream or entry nodes for a network flow. * The use of different granularities for identification, propagation, and penalization. * The possibilities for deployment which do not rely upon contiguous propagation of messages. * Generalizing to the case where high bandwidth nonadaptive flows are not unusual, but numerous in the network. * Finding the optimum values or approaches to finding the values of various parameters of the pushback protocol. * Investigating the following effect: penalization release of nonadaptive flows may cause adaptive flows to greatly reduce their sending rate, in turn reducing the congestion level at a node. Since our identification methods are all dependent upon a certain level of congestion to exist in order to operate, this may cause a nonadaptive flow to not be sensed and/or penalized for long periods of time, until a sufficient congestion level is reached. Perhaps the most important future work, however, is to implement the protocol that we propose, or a mechanism which is able to accomplish the same goals, into a real network, and monitor its effectiveness in the presence of actual high bandwidth nonadaptive network flows, failures, and other adversities. Bibliography [1] Bob Braden and et al. Requirements for internet hosts - application and support. RFC-1123. IETF, 1989. [2] Bob Braden and et al. Requirements for internet hosts - communication layers. RFC-1122. IETF, 1989. [3] Bob Braden and et al. Recommendations on queue management and congestion avoidance. Internet Draft, September 1997. [4] D. Clark, Karen R. Sollins, and John T. Wroclawski. Robust, multi-scalable networks. DARPA Proposal, MIT Laboratory for Computer Science, 1997. [5] D. D. Clark. The design philosophy of the darpa internet protocols. In SIGCOMM '88 Symposium on Communications Architectures and Protocols, Computer Communication Review, volume 18, pages 106-114, Stanford, CA, USA, August 1988. [6] David D. Clark and Wenjia Fang. Explicit allocation of best effort packet delivery service. unpublished manuscript, 1997. [7] Alan Demers, Srinivasan Keshav, and Scott Shenker. Analysis and simulation of a fair queuing algorithm. In SIGCOMM Symposium on Communications Architectures and Protocols, ACM, pages 1-12, Austin, Texas), September 1989. [8] S. Floyd and K. Fall. Router mechanisms to support end-to-end congestion control. Technical report, Information and Computing Sciences Division, Lawrence Berkeley National Laboratory, February 1997. [9] S. Floyd and Van Jacobson. Random early detection gateways for congestion avoidance. IEEE/A CM Transactions on Networking, 1:397-413, August 1993. [10] S. Floyd and Van Jacobson. Link-sharing and resource management models for packet networks. IEEE/A CM Transactions on Networking, 3(4):365-386, August 1995. [11] Van Jacobson. Congestion avoidance and control. ACM Computer Communication Review, 18:314-329, August 1988. [12] Raj Jain. Congestion control in computer networks: Issues and trends. IEEE Network, 4:24-30, May 1990. [13] Yannis A. Korilis and Aurel A. Lazar. Why is flow control hard: Optimality, fairness, partial and delayed information. Technical report, Center for Telecommunications Research, Columbia University, New York, NY, 1992. [14] H. T. Kung and R. Morris. Credit-based flow control for atm networks. IEEE Network Magazine, 9(2):40-48, March 1995. [15] D. Lin and R. Morris. Dynamics of random early detection. In SIGCOMM '97 Symposium on Communications Architectures and Protocols, Computer Communication Review, Palais des Festivals, Cannes, France, September 1997. [16] S. McCanne and S. Floyd. ns (network simulator). URL http://www- nrg.ee.lbl.gov/ns, 1995. [17] P. E. McKenney. Stochastic fairness queuing. In Proceedings of the Conference on Computer Communications (IEEEInfocom), San Francisco, California), June 1990. [18] John Nagle. Congestion control in ip/tcp internetworks. RFC-896, Ford Aerospace and Communications Corporation, 1984. [19] V. Paxson. End-to-end routing behavior in the internet. IEEE/ACM Transactions on Networking, 5(5):601-615, October 1997. [20] Larry L. Peterson and Bruce S. Davie. Computer Networks: A Systems Approach. Morgan Kaufmann, San Francisco, CA, 1996. [21] J. H. Saltzer, D. P. Reed, and D. D. Clark. End-to-end arguments in system design. ACM Transactions on Computer Systems, 2:277-288, November 1984. [22] A. Shenker. Making greed work in networks: A game-theoretic analysis of switch service disciplines. In SIGCOMM Symposium on Communications Architectures and Protocols, pages 47-57, London, UK, September 1994. [23] Lixia Zhang and et al. Rsvp: a new resource reservation protocol. IEEE Network, 7:8-18, September 1993.