Achieving Per-Flow Weighted Rate Fairness in a Core Stateless Network Raghupathy Sivakumar Tae-Eun Kim Narayanan Venkitaraman Jia-Ru Li Vaduvur Bharghavan University of Illinois at Urbana-Champaign Email:fsivakumr,tkim,murali,juru,bharghav g@timely.crhc.uiuc.edu Abstract Corelite is a Quality of Service architecture that provides weighted max-min fairness for rate among flows in a network without maintaining any per-flow state in the core routers. There are three key mechanisms that work in concert to achieve the service model of Corelite: (a) the introduction of markers in a packet flow by the edge routers to reflect the normalized rate of the flow, (b) weighted fair marker feedback at the core routers upon incipient congestion detection, and (c) linear increase/multiplicative decrease based rate adaptation of packet flows at the edge routers in response to marker feedback. 1. Introduction The current Internet supports a single service model simple best effort service. However, the increasing diversity of applications using the Internet has led to the emergence of Quality of Service (QoS) as a critical issue to be addressed in the future Internet. Two broad paradigms that have been proposed in the last few years for supporting quality of service in the Internet are Integrated services (Intserv) and Differentiated services (Diffserv). The Intserv approach supports absolute per-flow quality of service measures but requires a substantial amount of per-flow state to be maintained in the routers of the network [10]. Since high speed routers in the core of backbone networks typically serve hundreds of thousands of flows simultaneously, it has been argued that Intserv is not a scalable solution for providing QoS support in the Internet. Diffserv, on the other hand, proposes a scalable service discrimination model without requiring any per-flow state management at the routers in the network core [8, 13, 14]. Diffserv is gaining some popularity as the QoS paradigm of the future Internet, in large part because it moves the complexity of providing quality of service out of the core and into the edges of the network, where it is feasible to maintain a restricted amount of per- flow state 1 . Although Diffserv scores over the Intserv approach in terms of scalability, the key question is: what kind of service model can the Diffserv approach support?. Existing instantiations of the Diffserv model support coarse service differentiation focusing primarily on aggregates of flows, and differentiating between service classes rather than providing per-flow QoS measures. However, recent works in corestateless networks [4, 5, 11] have proposed approaches to achieve finer grained service differentiation in an attempt to emulate the richer Intserv service model within the framework of the scalable Diffserv network model. In this paper, we present the Corelite QoS architecture with a focus on the set of edge router and core router mechanisms for achieving per-flow weighted rate fairness in a core-stateless network. Weighted rate fairness achieves fine grained service discrimination on the end-to-end rate allocated to flows, and has been previously used in stateintensive Intserv-like networks as a sophisticated service model for providing QoS. However, we believe that Corelite is among the first approaches to achieve weighted rate fairness in a core stateless network [5, 6]. Corelite is broadly based on the Diffserv approach, and the key tenets that form the basis of the Corelite design are: (i) no per-flow state in the core of the network, (ii) a simple forwarding behavior for the core routers, (iii) a low overhead (weighted) fairfeedback scheme at the core routers to provide early congestion feedback to the edge routers, and (iv) rate adaptation (at the edges) without any packet loss in the network. Guided by the above principles, Corelite supports weighted rate fairness among flows in the network. Briefly, each flow in the network can choose to belong to one of many rate classes (each rate class has an associated rate weight) and the rate alloted to a flow is in accordance with a weighted version of the max-min fairness [1] algorithm given the flows in the network and their respective rate weights. The rest of the paper is organized as follows. Section 2 describes the weighted rate fairness service model, 1 In this paper, we borrow from the terminology used in [5] and refer to networks conforming to the network model with no state maintenance in the core as core-stateless networks. . We define b(i)=w(i) to be the normalized rate of flow i. The goal of weighted rate fairness is thus to achieve maxmin fairness of the normalized rates rather than the actual rates of flows. While Corelite does not place any bounds on the number or range of the distinct rate weights that can be supported, we expect that a network administrator will typically provide a small number of rate classes for a network, and associate a rate weight with each class. Each flow will then select a rate class. Max-min fairness is a well known concept [1], and weighted max-min fairness is a fairly straightforward extension. Unfortunately, achieving weighted max-min fairness without maintaining per-flow state routers is no easy task. In a Diffserv-like network, no core router knows about which flows are traversing through it, let alone their rate weights, no edge router knows which other flows are sharing the path traversed by a flow that is controlled by it, there is no centralized knowledge, and decisions for rate adaptation are made for each flow based purely on the feedback received for that flow. and presents the high level Corelite approach for achieving weighted rate fairness. Section 3 explores the key core router functionality in Corelite in greater detail. Section 4 evaluates the performance of Corelite and compares its quantitatively with related work. Section 5 discusses some related work. Section 6 concludes the paper. 2. The Corelite Architecture The Internet can be viewed as an agglomeration of autonomous heterogeneous network clouds. Each network cloud consists of core routers at the center and edge routers at the fringes. An end host to end host connection can potentially flow through multiple network clouds. However, the mechanisms proposed in Corelite are for a single network cloud and hence can be deployed in a network cloud independently of other network clouds. Further, since Corelite proposes mechanisms for a single network cloud as opposed to an inter-network in general, its mechanisms are edge-to-edge mechanisms and not end-to-end mechanisms. Thus, any reference to a flow in the rest of this paper signifies an edge to edge flow that can potentially comprise of several end to end micro flows. There are two key components to be addressed in such a setup: (a) edge router-core router interaction within a network cloud, and (b) edge router-edge router interaction across neighboring network clouds. In this paper we will only focus on the first component: interaction between the edge router and core router in a network cloud. Towards this end, we will consider a single network cloud, and show how we can achieve weighted rate fairness among the flows in that traverse the cloud without maintaining any per-flow state in the core routers. We now define weighted rate fairness formally. 2.2 Achieving Weighted Rate Fairness in Corelite Let us now look at the basic operation of Corelite. There are three main steps as illustrated in Figure 2: 1. Shaping and Marking at the Edge Router: Each (ingress) edge router maintains the allowed transmission rate bg (f ) for every flow passing through it (into the network cloud), and shapes the flow’s traffic according to its current bg (f ). In addition to shaping, the edge router periodically introduces marker packets in the flow transparent to the sender and receiver of the packet flow, such that the rate of transmission of the marker packets reflects the normalized rate of the packet flow. Specifically, an edge router introduces a marker packet after every Nw data packets (or bytes) of the packet flow, where Nw = K1 w, K1 is a constant and w is the weight class of the flow. Thus, a flow f that transmits at the rate of bg (f ) has a marker packet rate of bg (f )=(K1 :w(f )). Recall that bg (f )=w(f ) is the normalized rate of the flow. In Figure 2.(1), K1 = 1. Thus flow A has a marker packet inserted after each data packet, and flow B has a marker packet inserted after every alternate data packet. Note that the marker rates reflect the normalized flow rates. 2.1 Weighted Rate Fairness In Corelite, each flow is assigned a rate weight, and the network bandwidth is distributed among competing flows in accordance with their rate weights in order to achieve weighted rate fairness. We define weighted rate fairness as a weighted version of max-min fairness, where two flows that share the same bottleneck link are allocated the link bandwidth in the ratio of their rate weights. Let b = < b(i)ji 2 F > denote a rate allocation vector, where b(i) is the rate allocated to flow i, and let w(i) denote the rate weight of flow i. A weighted fair rate allocation vector b of rate thus satisfies the following condition: for any other distinct rate allocation vector b0 , The marker packet is logically distinct though it may be physically piggybacked to a data packet. The source address of the marker is the edge router that generated it, and the contents of the marker identify the packet flow to which it corresponds uniquely within the edge router. 8i; b0 (i) > b (i) ) 9j; b0 (j )=w(j ) b(i)=w(i) andb0 (j ) < b (j ) 2 along the flow path). 2. Marker Caching and Feedback at the Core Router: When a core router receives a data packet, it forwards the packet according to its standard forwarding behavior. When it receives a marker packet, it forwards the packet likewise, but also copies the marker packet into a local marker cache (which is a circular queue). The marker cache thus contains the recent history of packet transmissions, and the number of markers of a packet flow in the marker cache is proportional to the normalized rate of the flow. The marker cache in Figure 2 illustrates this fact. It is important to note that the core router does not inspect the contents of the marker packets and performs no per-flow processing or state management at all2 . bg (f ) = bg (f ) + maxf0; bg (f ) :m(f )g if m(f ) = 0 if m(f ) > 0 where and are increase and decrease constants respectively, and m(f ) is the number of markers received in the last period. The edge router reacts to the maximum of the markers received from any core router for each flow rather than the sum of the markers received for the flow because the goal is to throttle the rate in response to the bottleneck link. We know from the core router behavior that m(f ) is proportional to bg (f )=w(f ). Thus, the decrease function in the rate adaptation algorithm is effectively a weighted variant of the well known linear-increasemultiplicative-decrease (LIMD) rate adaptation algorithm that is known to converge to fairness [3]. Figure 2.(3) shows the updated rate after flows A and B receive feedback, with = 1. The rates for the flows evolve as shown in Figure 2.(4) and asymptotically oscillate around the intersection of the fairness and efficiency lines. Periodically, each core router detects incipient congestion by checking for queue buildup in the packet queue(s). Upon detecting incipient congestion, the core router does not drop queued packets. Instead, it computes how many marker notifications it must send back, randomly selects markers from the marker cache, and sends each selected marker back to the edge router that generated the marker (based on the source address of the marker). The expected number of markers selected for a flow is proportional to its normalized transmission rate. In Figure 2.(2), flows A and B transmit at the same absolute rate; thus the normalized rate of flow A is twice that of flow B, and A receives twice as many marker feedbacks as B does. Of course, the core router does not know or care which flows it generates markers for. Note that the feedback is generated on the basis of markers in the cache rather than packets currently in the queue or incoming packets in the next epoch. Thus the feedback mechanism in Corelite is designed to be independent of the scheduling discipline at the core router and fairly insensitive to bursty flows. 3. Rate Adaptation at the Edge Router: Periodically (once every fixed size “epoch”), each edge router checks for marker feedback from core routers. For each flow that traverses through it, if the edge router received markers for the flow in the last period, it throttles back the rate for the flow proportional to the number of received markers; otherwise it increases the rate for the flow by a constant (to probe for additional rate Figure 1. Illustration of the Operation of Corelite. What makes Corelite achieve weighted rate fairness without maintaining any per-flow state? There are two features that work in concert: first, the core router generates feedback in proportion to the normalized rate of a flow (i.e. m(f ) = k:bg (f )=w(f )) because edge routers insert markers to reflect the normalized rate of the flow, and second, the edge router throttles the rate in proportion to the number of 2 It can be argued that the marker cache implicitly maintains some perflow state in the form of markers even if the core router does not explicitly maintain per-flow state. In fact, we have only used the marker cachebased mechanism as a simple way of introducing the Corelite approach. In Section 3, we will describe an equivalent mechanism for generating weighted fair marker feedback that does not require the maintenance of marker caches, and is thus truly flow stateless. 3 received markers (i.e. bg (f ) = bg (f )(1 =w(f ))). In effect, we execute an enhanced LIMD algorithm at the edge routers where the feedback is known to be fair. This leads to weighted rate fairness, as we show through both simulations and analysis. Thus far, we have presented a high level overview of Corelite. Several important details remain to be addressed: (a) how is incipient congestion detected, (b) how many markers are selected upon detection of congestion, (c) how big does the marker cache need to be, and (d) can we get rid of the marker cache altogether and come up with a much simpler scheme for fair marker selection. In the next section, we address the issues raised above. Specifically, we explore an alternative approach for marker feedback selection at the core router that does not require the maintenance of marker caches, thereby reducing the memory overhead of Corelite and making it truly flow stateless. rate throttling by at least . Then, the number of markers sent back, Fn , is computed by the following equation: qthresh f qavg g + k(qavg qthresh )3 (1 + qavg ) (1 + qthresh ) where is the service rate of the outgoing link in packets Fn = per congestion epoch. The first term represents the estimated amount by which the input rate must be throttled under the assumption of M/M/1 queues, and the second term represents a selfcorrecting factor if the assumption is of M/M/1 wrong. If k = 0, the second term vanishes. Then Fn represents the difference between the estimated packet arrival rate that corresponds to the average queue size of qavg qavg (i.e. qavg ) and the desired expected arrival rate +1 that corresponds to the average queue size of qthresh (i.e. qthresh qthresh ), for the case of a Poisson arrival process with +1 exponentially distributed packet service time. Thus, Fn is the amount by which the aggregate traffic rate must be throttled. Since each marker packet causes the edge router to throttle the rate for the corresponding flow by at least , the core router needs to send back Fn markers to ensure the required drop in aggregate input traffic rate. Of course, the traffic assumptions of Poisson arrivals and exponential service times are not always true. In particular, it may turn out that fewer markers are selected according to this formula than required. In this case, if k 2 = 0, the rate of increase of the marker selection rate, d Fn 1 dqavg / (qavg +1)2 , is negative and may lead to progressively larger queue buildups and subsequent packet drops. This is where the second term comes in. With a small but non-zero k , after the queue size becomes sufficiently large, d2 Fn =dqavg starts to be dominated by the k:qavg contribution of the second term and thus, larger average qavg causes the core router to send back sufficient markers that the input traffic will be throttled effectively. This keeps the queues from overflowing. Additionally, since k is small, for small queue sizes the second term has no significant impact in terms of generating markers too conservatively. Our initial simulations on measuring the sensitivity of Corelite to input traffic patterns indicates that the computation for Fn works reasonably well even if the Poisson traffic assumptions do not hold. However, a more detailed sensitivity analysis of this function is ongoing work. Further, the congestion estimation module can be replaced with no impact on the rest of the Corelite mechanisms. 3. Marker Management in Corelite In Corelite, the edge router is responsible for two functions: (a) shaping and marker injection, and (b) rate adaptation, and the core router is responsible for two functions: (a) incipient congestion detection, and (b) marker management and weighted fair marker feedback. Since the uniqueness in Corelite is the ability of core routers to provide weighted fair feedback without maintaining per-flow state, we now focus on the two core router mechanisms. 3.1 Incipient Congestion Detection Each core router monitors its packet queues, and upon detecting incipient congestion, sends back sufficient number of markers to the edge routers, so that the consequent rate throttling performed by the edge routers will reduce the aggregate input traffic into the core router in the future, and thus alleviate congestion before queues become full and packets are dropped. In Corelite, the core router detects incipient congestion by monitoring the length of its packet queues. A core router may have multiple packet queues depending on its forwarding behavior. For purposes of congestion detection, we only care about the aggregate queue size over all the queues corresponding to a link. The congestion detection function is performed periodically, once every congestion epoch. The core router maintains the average of the aggregate queue size, qavg , during the epoch. At the end of every epoch, the core router checks to see if qavg exceeds a predefined congestion threshold queue size qthresh . If qavg > qthresh , then the core router concludes that there is incipient congestion and that it must send back markers to initiate rate throttling at the edge routers. The remaining question is how many markers to send back? Consider that the sending of each marker causes a 3.2 Marker Selection In this section we present a marker maintenance and selection mechanism to generate weighted fair feedback. “Weighted fair feedback” means that when a link detects incipient congestion and decides to send back Fn markb (f )=w(f ) ers, it sends Fn Pg b (i)=w(i) markers to flow f . The i g 4 edge router scheme requires no marker caches, and additionally only sends feedback selectively for those flows whose input traffic is larger than their weighted fair share (i.e. flow f receives no feedback if bg (f ) weighted fair share, and rebg (f )=w(f ) ceives Fn P otherwise). b (i)=w(i) capacity/delay S11 S1 S15 S16 S20 4 Mbps/40 ms R9 4 Mbps/ 40 ms 4 Mbps/ 40 ms S2 R10 S3 C1 4 Mbps/ 40 ms C2 4 Mbps/ 40 ms C3 4 Mbps/ 40 ms C3 R13 ijbg (i)>weightedfairshare g S9 4 Mbps/ 40 ms 4Mbps/ 40 ms 4 Mbps/ 40 ms R20 Further, the scheme does not require per-flow processing or maintaining per-flow state at the core routers. The selective marker feedback approach presented in this section is motivated by CSFQ [5], in which the goal is to select markers corresponding to only those flows that are transmitting more than their weighted fair share. When the edge router sends a marker packet, it also puts the normalized packet transmission rate, rn = bg =w, for the flow in the marker packet. The core router maintains a running average, rav , of the labelled rate rn over all the markers that traverse through it. This is the only additional state variable that is required at the core router to achieve a weighted fair marker selection. The core router determines the number, Fn , of markers to send as feedback to throttle flow rates. Further, the core router selects only markers whose normalized rate rn is greater than or equal to the running average rav maintained at the core. Note that rav is the computed average of the normalized rate, and rav overestimates the average since more markers are encountered corresponding to flows with larger normalized rate. In other words, selecting flows with rn rav isolates only those flows that are over utilizing the link unfairly. What remains is to describe the precise marker selection mechanism without requiring core router per-flow state. The probability of selecting a marker is computed as pw = Fn wav , where wav is the running average of the number of markers observed in each epoch. Additionally, a deficit variable is maintained, reset to 0 at the start of each epoch. When the core router sees a marker, it first selects the marker for feedback with a probability of pw : (a) if the marker is selected and its labelled rate rn rav , the marker is sent back to the edge router that generated it; (b) otherwise if the marker is selected but its labeled rate rn < rav , the marker is not sent back as feedback, but the deficit variable is incremented; (c) otherwise, if the marker is not selected, but the deficit variable is positive and the labelled rate rn rav , then the marker is sent as feedback and the deficit variable is decremented. In summary, the deficit variable ensures that if a marker corresponding to a flow with normalized rate lower than the running average happens to be selected, then it is swapped with a future marker corresponding to a flow whose normalized rate is at or above the average. The above approach has the advantage of selectively throttling misbehaving flows without maintaining per-flow state, which is a very powerful feature. However, there are some issues associated with the approach. First, we only S10 R1 R2 R5 R6 R7 R8 R11 R12 Figure 2. Network Topology consider the incoming markers in the current epoch when selecting markers for feedback. This makes the approach susceptible to bursts in packet/marker arrival. Second, there is no guarantee that the required number of markers will in fact be selected in the current epoch. In summary, the algorithm is similar to CSFQ, but improves on CSFQ: it does not depend on the accuracy of explicit fair share measurement unlike CSFQ. In the performance evaluation section, we show how and why this approach performs better than CSFQ. Over the last two sections, we have described the key mechanisms that enable Corelite to achieve weighted rate fairness in a core-stateless network. These include traffic shaping, marking and rate adaptation mechanisms of edge routers (discussed in Section 2) and the congestion estimation and marker selection mechanisms of core routers (discussed in Sections 2 and 3). In the next section, we will evaluate the performance of Corelite. 4. Performance Evaluation In this section, we use simulations to evaluate our model and compare the weighted rate fairness obtained in the context of Corelite, against the weighted version of CSFQ [5]. The simulations 3 , performed using the ns-2.1b4a simulator [12], serve to justify the validity of the mechanisms used in Corelite. In this section we present two sets of results. In the first set of results, we illustrate the efficacy of the mechanisms used to provide minimum rate contracts and weighted rate fairness in Corelite by computing the expected values and comparing it with the actual rates allocated (bg ) to flows. For the next set of results, we compare the weighted rate fairness obtained in the context of Corelite, against the weighted version of CSFQ 4 . In this case we compare their behavior in steady state and when flows dynamically enter and exit the network. The topology used for the simulations is shown in Figure 3 The ns implementation of the new objects and the simulation scripts used to obtain the results in this section can be obtained from http://timely.crhc.uiuc.edu/Projects/Corelite/ 4 The CSFQ implementation for ns was obtained from http://www.cs.cmu.edu/˜istoica/csfq/software.html 5 Alloted rate Number of packets successfully sent 120 60000 flow1 flow2 flow3 flow4 flow5 flow6 flow7 flow8 flow9 flow10 flow11 flow12 flow13 flow14 flow15 flow16 flow17 flow18 flow19 flow20 alloted_rate 80 60 40 50000 40000 total_sent 100 flow1 flow2 flow3 flow4 flow5 flow6 flow7 flow8 flow9 flow10 flow11 flow12 flow13 flow14 flow15 flow16 flow17 flow18 flow19 flow20 30000 20000 20 10000 0 0 0 100 200 300 400 500 600 700 800 0 time in seconds 100 200 300 400 500 600 700 800 time in seconds Figure 3. Instantaneous Rate Figure 4. Cumulative Service 2. Flows 1, 9, 10, 11 and 16 start at time t=250 seconds and stop at time t=500 seconds. All other flows start at time t=0 seconds and stop at time t=750 seconds. 2. It consists of three congested links and has flows that traverse different number of congested links. Flows in topology 1 also have high and varying round trip times ranging from 240ms to 400ms. The source agents that we have used to obtain the results for Corelite and CSFQ use similar rate adaptation schemes viz. decrease the sending rate proportional to the number of congestion indication messages received (losses in case of CSFQ) or increase the sending rate by one every epoch. After startup, the agents remain in the slow-start phase (doubling the sending rate every second) until they receive the first congestion notification or until the out-of-profile rate exceeds ss-thresh (set to 32 packets per second) at which point they reduce their rate by half and switch to the linear increase phase. All the simulations presented in this section use a fixed packet size of 1KB, K 1 (used for generating markers) of 1, and (used for rate adaptation) of 1, a queue size of 40 packets, congestion detection threshold of 8 packets, and an epoch size of 100ms at the core router. All the links have a bandwidth of 4Mbps(500 packets per second) and a latency of 2ms. For CSFQ, K (used in estimating flow rate) and Klink (average interval for computing rateTotal) were set to 100ms. In all the cases, we assume that the flows always have packets to send. The results from this scenario are shown in Figures 3 and 4. We will first calculate the expected rate for the flows and then compare it with the rates obtained by the flows. To calculate the expected rates for the flows 1 to 20, observe that all the links have a bandwidth of 500 packets per second. Initially, when flows 1, 9, 10, 11 and 16 are not in the network, each flow should get a rate of 33.33 packets per second per unit weight. In Figure 3 flows 5 and 15 have an alloted rate of 99.99 packets per second (33.33 * 3) since they have a rate weight of 3. The other flows in the network have rate weights of 2 and hence have an alloted rate of 66.66 packets per second. At time t=250 seconds, when flows 1, 9, 10, 11 and 16 are introduced, the fair share per unit weight drops down to 25 packets per second. Consequently, flows 5 and 15 have an alloted rate of 75 packets per second, flows 1, 11 and 16 have an alloted rate of 25 packets per second and all other flows have an alloted rate of 50 packets per second. Finally, when flows 1, 9, 10, 11 and 16 stop, the other flows climb up to their original rate allocations. Figures 3 and 4 show the results of the simulations and they conform to the expected values. In Figure 3 we observe that all flows except 1,9,10,11 and 16 start at time 0, and converge rapidly to their fair share. When flows 1,9,10,11,16 start at time 250seonds, other flows fall back almost instantaneously. The new flows receive no congestion notifications until they reach a point close to its fairshare. The 3 flows at the bottom of Figure 3 are flows 1, 11 and 16, having a weight of 1. Although these flows traverse different paths they all approximately get their fair share of 25 packets per second. The largest bunch of flows right above these three flows are the flows with a weight of 2. They receive approximately twice the amount of excess bandwidth compared to flows with weight 1. This set again has flows traversing different number of congested links and 4.1 Weighted Rate Fairness with Network Dynamics In this scenario, we illustrate how Corelite can effectively support weighted rate fairness in a core stateless network. We consider a total of 20 flows with flows 1 to 5, 11 to 12 and 16 to 20 passing through only a single congested link between C1-C2, C2-C3 and C3-C4 respectively and have a round trip time of 240 ms. Flows 6 to 8 and 13 to 15 traverse two congested links and have a round trip time of 320 ms while flows 9 and 10 traverse three congested links and have a round trip time of 400 ms. Flows 5 and 15 have a rate weight of 3, and flows 1, 11 and 16 have a weight of 1 each. All other flows have their weights set to 6 Alloted rate Alloted rate 90 90 flow1 flow2 flow3 flow4 flow5 flow6 flow7 flow8 flow9 flow10 80 70 70 60 alloted_rate alloted_rate 60 flow1 flow2 flow3 flow4 flow5 flow6 flow7 flow8 flow9 flow10 80 50 40 50 40 30 30 20 20 10 10 0 0 0 10 20 30 40 time in seconds 50 60 70 80 Figure 5. Corelite Instantaneous rate 0 10 20 30 40 time in seconds 50 60 70 80 Figure 6. CSFQ Instantaneous rate 30, though their weighted fair share rate is around 70 packets per second. Also, note that these flows experience more packet drops along the way - between times 30 50seonds - before they reach their fair rate. The selective marker feedback mechanism used with Corelite does not try to estimate the fair rate. Instead, it only computes an average of the normalized rates observed in the w marked packets and throttles only those flows that send more than this computed average. In Figure 5, flows 7 to 10, with weights 4 and 5, complete their slow-start phase and move into the linear increase phase at time 7seonds. They receive congestion notifications only after they are close to their respective fair share rates. This results in Corelite converging more than 30 seconds faster than CSFQ. hence with different round trip times. The closely spaced parallel lines in the cumulative service graph shown in Figure 4 shows that the total service obtained by flows having the same weight is the same irrespective of their round trip times and the number of congested links they traverse (recall that our fairness model is maxmin rather than proportional or min-potential delay). 4.2 Weighted Fair Rate Allocation (Corelite vs CSFQ) In the remaining sections, we compare the performance of Corelite against the weighted version of CSFQ. In this section we start multiple flows with different weights at the same time, and compare the startup and steady state behavior of Corelite and CSFQ. We use topology 1, with 10 flows having five different weights such that flow i has a weight di=2e. The results for this scenario obtained with Corelite and CSFQ are shown in Figures 5 and 6 respectively. Both the mechanisms achieve results that closely approximate the ideal values in steady state. However, their startup behaviors differ with Corelite converging faster than CSFQ. In this scenario, with Corelite, none of the flows experienced packet drops and flows sending at a rate lower than its fair share never received congestion notifications. However, with CSFQ, when many flows startup simultaneously the estimated fair rate deviates from its correct value because it does not track the rapidly changing fair share correctly. If the fair share is underestimated, then packets from flows that are sending below the actual fair share can be dropped. On the other hand, if the fair share is overestimated then more packets will be accepted than the router can transmit, resulting in queue buildups and potential overflows. This results in flows observing losses even before they reach their fair share. Thus in CSFQ the drop behavior degenerates into a tail-drop behavior when the buffer overflows. This occurs when the estimated fair-share at the core router is higher than the correct value. In Figure 6, flows 7 and 8, move into linear decrease phase when their rates are only 4.3 Weighted Fairness with Network Dynamics (Corelite vs CSFQ) In this section we compare the behavior of the two schemes, when flows with different weights enter the network one after another in rapid succession. We use the topology in Figure 2 with 20 flows, flows 1, 11 and 16 having a weight of 1 and flows 5, 10 and 15 having a weight of 3. All other flows have a weight of 2. Figures 7 and 8 correspond to Corelite and CSFQ respectively when flows start 1 seond apart in ascending order of flow number. Clearly, convergence is faster in Corelite than in CSFQ. This is because unlike Corelite, where all flows move to the linear increase phase only after reaching a point close to their final rate, in CSFQ, flows observe losses early in their life time resulting in slower convergence. As we mentioned in the previous section, when flows enter in the network in rapid succession, the estimated fair share in CSFQ will not converge to the correct value instantaneously and the core router can degenerate into tail dropping. However, in Corelite, even if there are packet drops, the feedback generation is still fair. As edges react only to congestion indications, the rates allocated to flows remains fair. Figures 9 and 10 show the results obtained with Corelite 7 Alloted rate Alloted rate 90 90 flow1 flow2 flow3 flow4 flow5 flow6 flow7 flow8 flow9 flow10 flow11 flow12 flow13 flow14 flow15 flow16 flow17 flow18 flow19 flow20 70 alloted_rate 60 50 40 flow1 flow2 flow3 flow4 flow5 flow6 flow7 flow8 flow9 flow10 flow11 flow12 flow13 flow14 flow15 flow16 flow17 flow18 flow19 flow20 80 70 60 alloted_rate 80 50 40 30 30 20 20 10 10 0 0 10 20 30 40 time in seconds 50 60 70 80 Figure 7. Corelite Instantaneous rate 0 10 20 30 40 time in seconds 50 60 70 80 Figure 8. CSFQ Instantaneous rate and CSFQ respectively, when flows 1 to 20 start 1seond apart in ascending order and after a life of 60seonds stop one second apart in the same order. The flows then restart, 5 seconds after they had stopped. Thus there are flows simultaneously entering and leaving the system during the time between 65 and 80 seonds. The figures clearly show that Corelite adapts gracefully to the dynamics of the network, where as with CSFQ the difference in performance obtained especially by flows with higher weights and that are shortlived is significant because flows have a greater chance of exiting their slow-start prematurely. Corelite avoids this and provides improved fairness even for short-lived flows. 5. Related Work Although Corelite is based on the Diffserv philosophy of maintaining no per-flow state in the core, it differs significantly from the existing approaches in terms of the semantics of marking, specific functionalities of the core and the edge router, and the service profiles it offers to users. A service in Diffserv is typically for traffic aggregates, not individual flows [8]. Most existing Diffserv approaches [13, 14] use a mechanism of marking, where packets are marked based on whether they are in-profile or out-of-profile. In particular, packets belonging to in-profile traffic are marked while the others are left unmarked. Marked packets receive preferential treatment as they are forwarded in the network, in terms of a lower drop priority and/or higher scheduling priority. Core routers drop best effort traffic before dropping marked packets. However, there is no explicit support for fairness between flows. There has been a lot of work done on incipient congestion detection mechanisms [7, 9]. In [7] when a packet arrives, the router calculates the average queue length for the last busy+idle period and the current busy period. When the average queue length exceeds one, it sets the congestion indication bit in the arriving packets. In RED[9], the router maintains an exponentially-weighted moving average of the queue length which is used to detect congestion. It maintains two thresholds. If the average queue length is less than the minthresh , no packet is dropped and when it is greater than maxthresh all packets are dropped. When the average queue length is in between these two values, packet are dropped with a probability that is a function of the average queue length. However, it provides no fairness guarantees. FRED[2] extends RED to provide some degree of fair bandwidth allocation. However, it maintains state for all flows that have at least one packet in the buffer. Also it deviates from the ideal case in a number of scenarios as pointed out in [5]. In CSFQ [5], the core router dynamically calculates the fair-share for flows, and on congestion prob- 4.4 Summary of Performance Evaluation The results that we have presented here serve as a proof of concept for the Corelite mechanisms and justify to some extent the claims made in this paper. Our initial results show that Corelite provides per-flow rate contracts and weighted fair allocation of bandwidth without any per-flow state in the core, the system is stable and adapts itself gracefully to the network dynamics. Comparisons with CSFQ indicate that though both the mechanisms perform well in steady state, Corelite performs significantly better than CSFQ when the fair share at the core router varies rapidly. In the multiple-hop case, flows traversing multiple congested links in CSFQ experience more losses and hence get a lower cumulative throughput than the ones traversing a single congested link. This is not the case in Corelite because the edge router can distinguish the congestion indications generated from different core routers. Our simulations with different core router epoch sizes, different marking thresholds, and channels with large latencies indicate that Corelite is not very sensitive to these parameters. The simulations are however, by no means comprehensive. Simulations using different adaptation schemes at the edge router and different congestion estimation schemes at the core router, and using agents like TCP which involve interaction between the edge router and end-host are part of ongoing work. 8 Alloted rate Alloted rate 90 90 flow1 flow2 flow3 flow4 flow5 flow6 flow7 flow8 flow9 flow10 flow11 flow12 flow13 flow14 flow15 flow16 flow17 flow18 flow19 flow20 70 alloted_rate 60 50 40 flow1 flow2 flow3 flow4 flow5 flow6 flow7 flow8 flow9 flow10 flow11 flow12 flow13 flow14 flow15 flow16 flow17 flow18 flow19 flow20 80 70 60 alloted_rate 80 50 40 30 30 20 20 10 10 0 20 40 60 80 100 time in seconds 120 140 160 Figure 9. Corelite Instantaneous rate 0 20 40 60 80 time in seconds 100 120 140 160 Figure 10. CSFQ Instantaneous rate abilistically drops packets only from flows whose current utilization of bandwidth is greater than the estimated fairshare. CSFQ thereby achieves bandwidth allocation that closely approximate the fair-share without maintaining perflow state. However, we have compared Corelite and CSFQ in the previous section to discuss some of the trade-offs. [4] I. Stoica, H. Zhang. Providing guaranteed services without per-flow management. Proceedings of SIGCOMM, September 1999. [5] I. Stoica, S. Shenker, and H. Zhang. Core-stateless fair queueing: Achieving approximately fair bandwidth allocations in high speed networks. Proceedings of ACM SIGCOMM, September 1998. [6] N. Venkitaraman, R. Sivakumar, T. Kim, S. Lu, and V. Bharghavan. The corelite qos architecture: Providing a flexible service model with a stateless core. TIMELY Research Report, February 1999. [7] R. Jain, and K.K. Ramakrishnan. Congestion avoidance in computer networks with a connectionless network layer: Concepts, goals and methodolog. Proceedings of IEEE Computer Networking Symposium, April 1988. [8] S. Blake, et al. A framework for differentiated services. Internet Draft, October 1998. [9] S. Floyd, and V. Jacobson. Random early detection gateways for congestion avoidance. IEEE/ACM Transactions on Networking, 1(4), August 1993. [10] S. Shenker, and C. Patridge. Specification of guaranteed quality of service. RFC 2212, September 1997. [11] T. Kim, R. Sivakumar, K-W. Lee, V. Bharghavan. Multicast service differentiation in core-stateless networks. Proceedings of International Workshop on Network Group Communication, November 1999. [12] UCB/LBNL/VINT Network Simulator - ns (version 2). http://www-mash.cs.berkeley.edu/ns/. [13] V. Jacobson. Differentiated services architecture. Talk in the Int-serv WG at the Munich IETF, August 1997. [14] W. Feng, D. Kandlur, D. Saha, and K. Shin. Adaptive packet marking for providing differentiated services in the internet. Proceeding of International Conference on Network Protocols, October 1998. 6. Conclusion A key design challenge for scalable QoS architectures has been whether it is possible to provide per-flow metrics for rate without maintaining per-flow state in the core of the network. In this paper, we have described the fundamental mechanisms in Corelite that enable us to support weighted rate fairness, which provides per-flow end-to-end relative service classes for rate, with routers that have a simple forwarding behavior and maintain no per-flow state in the core of the network. minimum rate contracts. In Corelite, markers are used to (a) normalize the rate of the flow according to the rate weight of the flow, and (b) enable the core router directly generate a congestion notification to the edge router thereby enabling it maintain the allowed transmission rate of individual flows and drop packets from ill behaved flows at the edges of the network. We are still investigating the interactions required between the edge routers of different autonomous domains, the interactions between the end-host and the edge router and aggregation of flows at the edge router. References [1] D. Bertsekas, and R. Gallager. Data Networks. PrenticeHall, Second Edition. [2] D. Lin, and R.Morris. Dynamics of random early detection. Proceedings of ACM SIGCOMM, September 1997. [3] D-M. Chiu and R. Jain. Analysis of the increase and decrease algorithms for congestion avoidance in computer network. Journal of Computer Networks and ISDN Systems, 17(1), June 1989. 9