Usage Profiles: Allocation of Network Capacity to Internet Users BARKER by MASSACHUSETTS'INSTITUTE OF TECHNOLOGY Pierre Arthur Elysee APR 2 4 2001 Bachelor in Computer Systems Engineering University of Massachusetts at Amherst, 1997 LIBRARIES Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering and Computer Science at the MASSACHUSETTS INSTITUTE OF TECHNOLOGY January 2001 ©2001 Massachusetts Institute of Technology All rights reserved. Author ........................ ................. Department of Electrical Enginee ng and Computer Science January, 2001 Certified by ................................ David D. Clarke Senior Research Scientist ,Thesis Suvervisor Accepted by ............ ,%... ............................ . . . . .. ; .. ......................... Arthur C. Smith Chairman, Department Committee on Graduate Students Usage Profiles: Allocation of Network Capacity to Internet Users by Pierre Arthur Elysee Submitted to the Department of Electrical Engineering and Computer Science in partial fulfillment of the requirements for the degree of Master of Science in Electrical Engineering. Abstract In the Internet of today, there are only very crude controls placed on the amount of network capacity that any one user can consume. All users are expected to slow down when they encounter congestion, but there is little verification that they actually do, and there are no controls that permit a relative allocation of capacity to one user over another. The research in this thesis describes a method to impose a usage limit, or "usage profile" on the behavior of individual users. In particular, this thesis explores the design of usage profiles that allow bursty traffic patterns, as opposed to continuous rate limits. This work describes an effective usage profile algorithm for web traffic which has a very bursty character. The following approach studies the characteristics of web traffic, and introduces the fundamental concepts to establish the necessary framework. Through simulations, it analyzes an existing usage profile, the leaky-bucket scheme for different token rates and different data sets, and points out its limitations in the context of web traffic. Then, it proposes a new usage profile, the Average Rate Control Usage Profile (ARCUP) algorithm, that best regulates web traffic. Several variants of this algorithm are presented throughout. It discusses the characteristics of a good profile in order to facilitate the choice of a specific variant. The selected variant of the ARCUP algorithm is simulated for different target rates and different data sets. The results show that this algorithm will work for any data sets that are heavy-tailed distributed, and for different target rates which represent different usage profiles. This thesis concludes with a summary of findings and suggests possible applications. Thesis Supervisor: David C. Clark Title: Senior Research Scientist This page left intentionally blank 3 Acknowledgements I'd like to express my sincere gratitude to professor Dave Clark for his insights, motivation, and encouragement throughout this thesis. I would like to extend my gratitue as well to professor Al Drake who guided me through my dark days here. Finally, I would like to thank my friends who have been instrumental to my success at MIT , in particular Amit Sinha and Eric Brittain. I would like to thank all my professors and advisors who have contributed to my education and success. Finally, my thanks go to those who are dearest to me: my mother, Odilia Nazaire and my father Wilner Elysee for their everlasting love and support. 4 This page left intentionally blank 5 Table of Contents 1 2 3 4 Introduction 8 1.1 Introduction ................................................................................................. 1.2 Web overview ................................................................................................ .9 1.3 C haracteristics of w eb traffic ........................................................................ 10 1.4 Genesis of data used in simulations ............................................................ 11 1.5 M ethodology ................................................................................................ .. 8 . 12 14 Theoretical background 2.1 D efinition of self-sim ilarity .......................................................................... 14 2.2 D efinition of heavy-tailed............................................................................. 15 2.3 Definition of ON/OFF sources......................................................................16 2.4 Examining ON-times or file transfer sizes....................................................16 2.5 E xam ining OFF-tim es................................................................................. 17 2.6 Fractional Brow nian Motion........................................................................ 17 2.7 Feedback C ontrol System ............................................................................ 19 22 Leaky bucket algorithm 3.1 D efinition of a leaky bucket........................................................................ 22 3.2 Analyzing the leaky bucket algorithm for different token rates .......... 23 3.3 Token rate equals to 10,000 bits per second............................................... 24 3.4 Token rate equals to 20,000 bits per second............................................... 26 3.5 Token rate equals to 25,000 bits per second............................................... 27 3.6 Token rate equals to 8,000 bits per second................................................. 28 3.7 Performance of the leaky bucket algorithm on different data sets...............28 3 .8 S umm ary ..................................................................................................... Averate Rate Control Usage Profile Algorithm . 30 32 4 .1 Introduction ............................................................................................... 4.2 U ncontrolled average rate:.......................................................................... 4.3 M axim um data control.....................................................................................35 4.4 Decrease and increase the peak rate by a fixed factor ................................. 4.5 Varying the peak rate and maximum data control...................38 4.6 Performance of the ARCUP algorithm for different target rates.................41 4.7 Effect of maximum data size on obtained average rate ................ 45 4.8 Effect of maximum data on transmission duration ................... 46 6 . . 32 33 36 4.9 Algorithm performance for different data sets.................................................47 4.10 Running the algorithm at the peak, target, and average rates ...................... 55 5 Conclusion, Applications, and Future Work 5.1 C onclusion ................................................................................................ 5 .2 A pp lication ................................................................................................. 5.3 F uture w ork ................................................................................................ 5.4 R eferences.................................................................................................. . 60 . 61 62 A Source codes and Data sample A .1 A.2 A.3 A.4 A.5 58 . . 58 . . 59 . . 62 G enerating O FF tim es ............................................................................... ...68 This module represents the leaky bucket's scheme source codes ........ . ..73 This module represents the uncontrolled ARCUP algorithm ........... This module represents the uncontrolled ARCUP algorithm .................... 78 84 Sample of data used in simulations .......................................................... 7 Chapter 1 Introduction The goal of this chapter is to introduce our work and to establish a basis for its usability. It contains the following information: an introduction, a brief overview of the Web, a description of the characteristics of web traffic, and an analysis of the data used in simulations. 1.1 Introduction The Internet today uses a service model called "best effort". In this service model, the network allocates bandwidth amongst all the instantaneous users as best as it can, and attempts to serve all of them without making any explicit commitment as to the quality of service (QOS) offered [9]. For years, there have been heated debates in the Internet community regarding the types of services that should be provided in the future. For some researchers, the existing service has been working fairly well so far. Others argue that users who are willing to pay more money in order to have a better service should be given the option. Previous work in [9] has explored the issue of extending the Internet by adding features that permit allocating different service levels to different users. A number of schemes have been proposed to accomplish this goal: fair allocation service, priority scheduling, expected capacity allocation, guaranteed minimum capacity. Closely related to the work presented in this paper is the guaranteed minimum capacity scheme. This service provides a guaranteed worst case rate along any path from source to destination (for more details see [9]). A drawback with this scheme is that it assumes that the traffic offered by the user is a steady flow. This is not the case for Internet traffic. The majority of Web users are not interested in a capacity profile that allows them to go at a continous steady rate. While cruising the web, for instance, the normal usage pattern is short on-periods separated by long off-periods. The ideal profile from the user's prospective would allow bursts to occur at a high speed. It should award a user that is not sending continously at the high rate, and constrain a user that is doing the contrary. Appropriate profiles are needed to match bursty usage patterns. The problem with defining 8 such a profile is made harder by the fact that the size of web transfers are "heavy-tailed", which means that while almost all Web transfers are very small, most of the bytes transferred are in a few very large transfers. In a previous study conducted in [8], a leakybucket scheme has been proposed to regulate web traffic. This paper confirms that the leaky-bucket scheme penalizes web traffic by introducing excessive delays. Then, it proposes a scheme that better regulates this type of traffic. The rest of this paper has the following organization: the remaining of chapter I emphasizes the need for usage profiles, defines the characteristics of web traffic, and analyzes the genesis of the data used in simulations; chapter II contains theoretical background information; chapter III presents and analyzes the "leaky bucket" algorithm; chapter IV presents and analyzes the "average rate control usage profile" (ARCUP) algorithm; chapter V proposes possible applications of the usage profile algorithms and a summary of findings; it contains references, source code, and a sample of the data sets used in simulations. 1.2 Web overview The eminent role of the World Wide Web as a medium for information dissemination has made it important to understand its properties. In recent years, the Web has been used, among other applications, to trade stocks on-line, to conduct electronic commerce, and to publish and deliver information in a variety of ways such as raw data, formatted text, graphics, audio, video, and software [5]. An easy way to think of the Web, however, is as a set of cooperating clients and servers. We interact with the Web through web browsers; Netscape and Internet explorer currently dominate the market. A browser is a graphical client program that allows users to access remotely located files or objects [11]. In order to access a file, we need to know its URL (uniform resource locator) which resembles the following: http://www.haitiglobalvillage.com/ http://www.soccer.com/ http://www.amazon.com/ 9 Since the Web is organized as a client-server model, each file is stored on a specific host machine. Upon a request from the user, the requested file is transferred to the user's local machine. These exchanges generate traffic over the Internet. Today, the Web generates more data traffic on the Internet than any other application. In an attempt to control the amount of traffic generated by each user on the Internet, one proposal is to impose a usage profile on each user to regulate their traffic pattern. A usage profile is a mechanism that shapes traffic and limits the length of bursts; further, it defines a limit on the maximum traffic generated by each user and discourages users from abusing the network by introducing additional delays on their transfers should they violate their profile. 1.3 Characteristics of web traffic The design and implementation of an attractive "usage profile" algorithm for Web browsing requires a thorough understanding of Web traffic (in this paper, we use the terms web traffic and Internet traffic interchangeably since the Web is currently the main contributor of network traffic). For years, researchers assumed that traffic on the Internet followed the model of a poisson process. Poisson arrival processes have a bursty length characteristic which tends to be smoothed by averaging over long period of time [1]. To the contrary, recent studies of LAN and wide-area network have provided ample evidence that Internet traffic is self-similar [1]. As a result, such traffic can be modeled using the notion of self-similarity. Self-similarity is the trait we attribute to any object whose appearance is scale-invariant. That is, an object whose appearance does not alter regardless of the scale at which it is viewed [13]. [2] shows that Internet traffic exhibits longrange dependence as well. Long-range dependence involves the tail behavior of the autocorrelation function of a stationary time series while self-similarity refers to the scaling behavior of the finite dimensional distribution of a discrete or continous time process [2]. Futher, a process is considered to be long-range dependent or heavy-tailed if its auto-correlation function decays hyperbolically rather than exponentially [2]. Since self-similar processes display similar features, they are often referred to as heavy-tailed [8]. The difference between self-similarity and heavy-tailed will be addressed in the next chapter. 10 1.4 Genesis of data used in simulations To understand the nature of web traffic, we made use of real users' traces. These traces were used to evaluate our proposed control algorithms. They were collected at Boston University's Computer Science Department. Researchers at BU added a measurement apparatus to the Web browser NCSA Mosaic, the preferred browser at the time (November 1994 through February 1995). These researchers were able to monitor the transactions between each individual user on their LAN and the Internet. These traces contain records of the HTTP request and user behavior occurring between November 1994 and May 1995. During that period a total of 9,633 Mosaic sessions were traced, corresponding to a population of 762 users, and resulting in 1,143,839 requests for data transfer. Here is a sample trace: gonzo "http://www.careermosaic.com/cm/cml.html" 4776 0.806263 gonzo "http://www.careermosaic.com/cm/images/cmHome.gif" 90539 7.832752 gonzo "http://www.careermosaic.com/cm/cml.html" 0 0.0 Each line corresponds to a single URL request by the user and can be read as follows: User name: gonzo URL: http://www.careermosaic.com/cm/ File name: cml.html Size of the document in bytes: 4776 bytes Object retrieving time in seconds: 0.806263 These data were collected at the application level. They were thoroughly examined in [1] which showed that the heavy-tailed nature of transmission and idle times are not primarily due to network protocols or user preference; the heavy-tailed property can rather 11 350 I data set gonzo 300 250 C/) -2 200 0 E 150 100 50 0 0 2 4 6 8 10 File sizes in bytes 12 14 16 18 X 104 Figure 1-1: Distribution of data set gonzo be attributed to information storage and processing. The results of these studies show that files transmitted through the Internet and files stored on servers obey heavy-tail distributions. Figure 1-1 shows that data set gonzo is indeed heavy-tailed. Most of the files have sizes less than 20,000 bytes while some are as big as 160,000 bytes. With confidence, we use these data sets to simulate our algorithms. 1.5 Methodology In this paper, ON-times are represented by data collected by researchers at BU from the monitoring of transactions between each individual user on their LAN and the Internet. When not specified, data set gonzo will be the one considered in our analyses to represent ON-times. OFF-times are generated by a model based on a fractional Brownian motion algorithm. The leaky-bucket and ARCUP algorithms are the two schemes evaluated and contrasted in this paper; they are both written in C. The leaky-bucket scheme is simulated 12 for different token rates in order to determine the most appropriate rate for a given user. Moreover, it is simulated for different data sets which illustrate its performance and its limitation in each case. More precisely, it shows that leaky-bucket is not the preferred scheme to regulate heavy-tailed distributed traffic. To cope with the shortcomings of the leaky-bucket scheme, the ARCUP algorithm is proposed. I construct a hypothetical user model by combining the ON times derived from BU user data, and the OFF times derived from our fBM model. I picked a target value of average usage for each user of 10,000 bits/second, which is a not unreasonable overall usage rate for a user exploring the Web today. To adjust each data set to achieve this long-term average rate, I scale the OFF times produced by model appropriately. The result of this scaling operation is that for each user trace, if no control is applied (that is, if there is no usage profile), the average rate over the whole trace will be 10,000 bits/sec. In general terms, the method used to evaluate each proposed usage profile is as fair as possible. The profile is initially set so that it also has a long-term average permitted rate of 10,000 b/s. We observe the extent to which the bursty behavior of the user is passed through the profile unchanged. We then adjust the average rate. We then developed several variants of this algorithm. The best variant is simulated for different target rates and data sets. It embodies the most effective usage profile algorithm for Web traffic that emerges from this study. On the one hand, this algorithm permits bursty traffic if the user's average rate is less than the contracted target rate. On the other hand, it prevents the user from abusing his/her profile. 13 Chapter 2 Theoretical background The background information presented in this chapter is aimed at putting our work into context. It is designed to ease the reader's understanding and will help the reader to appreciate the merit of our results. This chapter is organized as follows: sections I and II define self-similarity and heavy-tail behavior; section III defines and examines the nature of ON/ OFF sources; section IV presents a definition of fractional Brownian motion; section V defines autocorrelation; and finally, section VI introduces the notion of feedback control which is essential for the understanding of the ARCUP algorithm. 2.1 Definition of self-similarity Self-similarity (this definition closely follows the one given in [1]): let X (t) be a stationary time series with zero-mean ( i.e., pt = 0 ). The m-aggregated series is defined by summing the original series X over non-overlapping blocks of size m. X is said to be H-self-similar, if for all m >= 0, X(m) has the same distribution as X rescaled. Mathematically, tiM X~=m-_ Xi (2-1) (i = t- 1) for all natural m. When X is H-self-similar, its autocorrelation function is given by: r(k) = E[(X- (Xt+k) 2 (2-2) which is the same for the series X(m) for all m. Therefore, the distribution of the aggregated series is the same as that of the original except for a change in scale. In general, a process with long-range dependence has an autocorrelation function 14 r(k) = k (2-3) as k goes to infinity, where 0< b <1. The series X(t) is said to be asymptotically secondorder self-similar with Hurst parameter H = 1- b/2 [3]. The Hurst parameter gives the degree of long-range dependence present in a self-similar time series. 2.2 Definition of heavy-tailed The understanding of heavy-tailed distribution is very important in network engineering due to their relationship to traffic self-similarity. Unlike exponential, poisson, or normal distribution, heavy-tailed distributions exhibit uncommon properties. To date, the simplest heavy-tailed distribution known is the Pareto distribution. Both its probability mass function and its cumulative mass function are given by: P(x) = akx-(a+1) (2-4) F(x) = P[X:x] = I- (2-5) for cc, k > 0, and x >=k. In this distribution, if c <= I (see equation 1 below) the distribution is unstable with infinite mean and infinite variance. If cc<= 2, on the other hand, the distribution is stable with finite mean but infinite variance [1]. Consequently, the degree of self-similarity largely depends on the parameter alpha. Regardless of the behavior of the distribution, the process is heavy-tailed if the asymptotic shape of the distribution is hyperbolic. The Pareto distribution has an attractive trait; its distribution is hyperbolic over its entire range . It is widely used to generate self-similar traffic. In fact, it is shown in [2] that superposing a number of Pareto distributed ON/OFF sources produces a time series sequence that is asymptotically self-similar. 2.3 Definition of ON/OFF sources Using Web traffic, user preference, and file sizes data, [5] explains why the transmission times and quiet times for a given Web session are heavy-tailed. Web traffic can be 15 modeled using heavy-tailed distribution; in the literature, they are referred to as ON and OFF times distributions. From a user's point of view, ON times correspond to the transmission duration of Web files, and OFF times represent times when the browser is not actively transferring data [5]. It is imperative to mention that transmission times depend on network condition at the time as well. Technically, ON times represent periods during which packets (or cells depending on the architecture in use) arrive at regular intervals while OFF times exemplify periods with no packets arrival. Some systems have an ON/ OFF characteristic where an ON-period can be followed by other ON-periods and OFFperiods by other OFF-periods [2]. In contrast, the model used to mimic ON-OFF sources in Internet traffic are said to be strictly alternating [5]. That is an ON-period is always followed by an OFF period, vice-versa. And at the level of individual source-destination pairs, the lengths of the ON and OFF-periods are independent and identically distributed. However, the ON and OFF periods are independent from one another, and their distribution need not be the same. In our simulations, we use actual data from Web traces to represent ON-times, and generate synthetic OFF times using fractional Brownian motion. 2.4 Examining ON-times or file transfer sizes Researchers in [1] showed that the distribution of transfer size for file sizes greater than 10,000 bytes can be modeled using a heavy-tail distribution. It follows that self-similarity or heavy-tailed property exhibited by Internet traffic is mainly determined by the set of available files in the Web. Today, most of the data transfered over the Internet represent multimedia files (i.e., images, text, video, and audio). Researchers in [5] report that although multimedia may not be the primary factor in determining the heavy-tailed nature of files transferred, it does increase the distribution tail weight. The tail weight for the distribution of file sizes between 1,000 and 30,000 bytes is primarily due to images; for file sizes between 30,000 and 300,000, the tail weight is caused mainly by audio files. Finally, from 300,000 and onwards, the weight is attributed to video files. To prove that multimedia is not solely responsible for the heavy-tailed nature of Internet traffic, the authors in [5] compare the distribution of available files in the Web with the distribution of Unix files. They conclude that the distribution of Unix files are much heavier tailed than the distribution of Web files. This conclusive remark suggests that even 16 with added multimedia contents, Web files do not dominate the heaviness in the tail distribution of transferred files: file sizes in general follow a heavy-tailed distribution [5]. 2.5 Examining OFF-times As mentioned in section IV-I, ON-times are the result of transmission durations of individual Web files; OFF-times, on the other hand, represent periods when a browser is not actively transferring data. While ON-times are mainly due to the transmission duration of Web data, OFF-times can be the result of a number of different causes: the client's machine (or workstation) may be idle because it has just finished receiving the content of a Web page. Therefore, before requesting the next content, it will interpret, format, and display the first component. In other cases, the client machine may be idle because the user is processing the information last received; or the user may not be using his/her machine at all. These two phenomena are called "active OFF" and "inactive OFF" times by the authors in [1]. The understanding of OFF-times distribution lies in their differences. Active "OFF-times" represent the time required by client machine to format, interpret, and display the content of a Web page. As a result, machine processing time tends to be in the range of ims to 1 second [1]. It is unlikely, however, for an embedded Web document to require more than 30 seconds processing time. The researchers in [1] assumed that inactive OFF-times are in general due to user inactivity. Thereby, they concluded that inactive "OFF-times" resulting from user inactivity are mainly responsible for the heavytailed nature of OFF-times. 2.6 Fractional Brownian Motion Fractional Brownian motion is a natural extension of ordinary Brownian motion. It is a gaussian zero-mean stochastic process, and is indexed by a single scalar parameter H ranging between zero and one [4]. Fractional Brownian motion (fBm) has very useful properties: fractal dimension, scale-invariance, and self-similarity [3]. As such, fBm represents a good model to describe non stationary stochastic processes with long-range dependence [4]. As a non stationary process, fBm does not admit a spectrum in the usual sense. However, it is possible to attach to it an average spectrum [4]. 17 Although non stationary, fBm does have stationary increments, which means that the probability properties of the process: (2-6) Bh(t + s) - Bh(t) only depend on the variable s. Moreover, this increment process is self-similar because for any a > 0, the following is true: (2-7) Bh(at)) = a x Bh(t) where '=' represents equality in distribution. A standard fractional Brownian motion has the integral representation: t2 Bh ( t 2 )-Bh(tl) = 1/(F(h+0.5)). tI f(t2 - s) h-0.dB(s) - f(tl -s) -0.5dB(s) 2-8) Ordinary Brownian motion is obtained from the standard fractional Brownian motion when the Hurst parameter h = 0.5. The non stationary property of fBm is manifested in its covariance structure [4] given: E(Bh(t)Bh(s))) = ((y2 )/(2))(t 2 h + 1s2hi - (t - s)2h1 ) (2-9) After manipulating the previous equation, the variance is Var(Bh(t)) = y2t 2 hJ (2-10) The self-similarity characteristic of fBm can be deduced from the previous equation with a scale transformation [3]. G~h 2 (ri) = 2h 2 r CTI2 (1) 18 (2-11) This result proves that fractional Brownian motion is scale-invariance and statiscally indistinguishable with regard to a scale transformation [3]. 2.7 Feedback Control System The reader is encouraged to reread this section for a better understanding of the ARCUP algorithm; it gives a brief definition of a control system. A control system can be viewed as a system for which the manipulation of the input element(s) results in a desired output(s). Two main features define a control system: first, a mathematical model that expresses the characteristics of the original system; second, the design stage in which an appropriate control mechanism is selected and implemented in order to achieve a desired system performance [6]. The control of a system can be performed either in open loop or in closed-loop. An open loop is a system in which the control input to the system is independent of its output. It is imperative that the system itself is not influenced by the system output. Open-loop systems are often simple and inexpensive to design. In some applications, however, it is important to feed back the output in order to better control the system. In this case, the loop is said to be closed. Therefore, in a closed-loop system the control input is influenced by the system output. The system output value is compared with a reference input value, the result is used to modify the control system input [6]. The mathematical modeling of feedback systems was introduced by Nyquist in 1932: he observed the behavior of open loop systems and sinusoidal inputs to deduce the associated closedloop system. His work was later improved and surpassed by Bode in 1938 and Evans in 1948. Their work constitutes the basis of classical control theory [6]. A basic closed-loop system contains: an input device, an output measuring device, an error measuring device, and an amplifier and control system. The latter manipulates the measured error in order to positively modify the output [6]. It is worth noting that there are two types of closed-loop or feedback systems: (a) A regulator is a closed-loop system that maintains an output equal to a pre-determined value regardless of changes in system parameters; and 19 (b) A servomechanism is a closed-loop system that produces an output equal to some reference input position without change in parameter values. The former is better suited to model the system presented in this paper. 20 This page left intentionally blank 21 Chapter 3 Leaky bucket algorithm Having established the framework that would allow the reader to understand our work, we are now in a position to introduce the first usage profile algoritm. The study of leaky bucket assumes that the traffic source behaves as a two-state on/off arrival process with an arbitrary distribution for the time spent in each state [9]. A source is considered to be on (or busy) when it is transmitting/receiving and off (or idle) when it is not. Technically, ON-times represent periods during which packets (or cells depends on the architecture in use) arrive at regular intervals while OFF-times represent periods with no packets arrival. In this chapter we introduce the concept of a leaky bucket, and analyze the results of the algorithm for different token rates and on different data sets. It is organized as follows: section I gives a definition of the leaky bucket concept; section II analyzes the algorithm for different token rates ranging from 10,000 to 25,000 bits per second; section III evaluates the performance of the leaky bucket algorithm for different data sets; and finally, section IV summarizes our findings. 3.1 Definition of a leaky bucket This mechanism has been utilized as a congestion control strategy in high-speed networks. It is often used to control traffic flows; ATM is a prime example. The ATM (Asynchronous Transfer Mode) technology is based on the transmission of a fixed size data unit. When an ATM connection is established, the traffic characteristics of the source and its quality of service are guaranteed by the network. The network enforces the admission control policies by using a usage parameter control [13]. The leaky bucket algorithm serves this purpose. The leaky bucket algorithm is used to control bursty traffic as well. An Internet Service Provider (ISP), for instance, can use a leaky bucket profile to shape its incoming traffic [8]. A simple leaky bucket is characterized by two components: the size of the bucket and the token replenishing rate. Tokens are generated at a constant rate and are stored in the bucket which has a finite capacity. A token which arrives when the bucket is full is automatically discarded. If the bucket contains enough tokens when packets arrive, 22 the mechanism allows them to pass through. That is, the user can burst traffic into the network at his/her allowed peak rate which is most of the time equivalent to the physical link capacity. After each transfer, the bucket is decremented accordingly. When the number of tokens is insufficient only a portion of the arriving traffic is immediately sent while the rest is queued. If the bucket is empty, all arriving packets (traffic) are queued or discarded according to the policy in place. The queued packets are serviced upon token arrival in the bucket. If the token replenishing rate is constant, the queued packets service rate is constant as well. Note, with large buckets, users can send bursty traffic in a short time period. The token replenishing rate, on the other hand, allows user to send data at constant bit rate for any periods of time. 3.2 Analyzing the leaky bucket algorithm for different token rates The leaky-bucket scheme has been utilized to monitor and enforce usage parameter control (UPC). Researchers in [8] show that a leaky-bucket scheme can be imposed on each on-off source as a usage profile. A profile is, therefore, defined by an initial number of tokens in the bucket and a token replenishing rate. The profile determines when the onoff sources (or users) can burst traffic into the network or should they send at the token rate. In this thesis, the bucket has an infinite capacity (i.e., no token is discarded). The amount of token accumulated by the bucket, however, is finite; it depends on the OFFperiods. Indeed, the lenghts of OFF-periods are bounded, thereby the amount of tokens in a bucket is bounded as well. In general, the transfer time for each file has a peak time and a token time component. The larger the file (i.e., the longer the on-period) the bigger the token component is likely to be and the slower the transfer. The longer the off-period the more tokens the bucket accumulates. The arriving file following such a period can be sent at the peak rate. The bigger the peak time component, the faster is the transfer. After a long ON-time, representing the transfer of a large file, the bucket contains few tokens. The total transfer time required to send an arriving file depends more generally on the history of previous ON-OFF times. If the arriving file is small, there is a high probability that it will be sent at the peak rate. Conversely, if the arriving file is large, there is a low probability that the bucket will have enough tokens: it will be processed at a speed near the token rate [8]. A drawback with this scheme is that the more heavy-tailed the distribution 23 of the ON-times is the more the traffic that has to be sent at the token rate. [8] shows that increasing the peak rate is not an effective way to solve this problem. In the following subsections, we analyze the leaky bucket algorithm for different token rate values. We will consider the normalized average rate (i.e., 10,000 bits/sec) and two of its multiple: token rates of 20,000, and 25,000 bits per second. These rates will constitute the focus of our attention. This procedure will allow us to determine the required token rate that will provide the least total transfer time for the user under consideration. 3.3 Token rate equals to 10,000 bits per second The initial value of 10,000 bits per second is chosen since this is the actual long-term average rate of the traffic to be controlled. When applying this token rate, most transfer times have two components: a peak time and a token time. The first fraction of the data is received at the peak rate, and the rest if any at the token rate. The smaller the transmission duration time the more dominant is the peak time. As mentioned earlier, the goal of the usage profile algorithm is to regulate users to their long-term contracted rate (token rate or target rate), allow burst traffic to be sent rapidly, and prevent abuse. In the scatter plot of Figure 3-1, the points that are lying near the X-axis represent small files that are sent at the 24 peak rate. On the other hand, the transfers that required thirty-seven, fourty-two, and r5 leaky bucket aig. with token rate = 10,000 bits/sec 4 5* 40 -* 3 5- 3 00 _0 25 - 0 E 20U) 15* 10 - * 5 - ** **4 nimh* 0 2 4 * *I **** 6 8 10 Data transferred in bytes 12 14 16 18 x 104 Figure 3-1: Leaky bucket with token rate 10,000 bits/sec forty-nine seconds are three examples of very large transfers. As a result, the token time dominate in each case. For example, for the thirty seven second transfer, the file size is 58,852 bytes with a peak time of 0.0928 second and a token time of 37.7 seconds. For the forty-two second transfer, the file transferred has a size of 90539 bytes with a resulting peak time of 0.3 and a resulting token time of 42.12 seconds. Finally, for the forty-nine second event, the file transferred has a size of 163711 bytes resulting in a peak time of 0.81 second and a token time of 48.32 seconds. In all three cases the results are much better than if the user were receiving strictly at the token rate. The reader can verify that for the larger transfer (i.e., file size = 163711 bytes), the total transmission time would have been 131 seconds had the whole file been sent at the token rate which implies that over one half of the data was received at the peak rate, and the rest at the token rate. In this case, the transmission duration time is dominated by the token time. 25 3.4 Token rate equals to 20,000 bits per second As mentioned earlier, the worst case transfer time is file size in bits divided by the token rate. In this case, we decrease the transfer time for two reasons: a faster token rate, and a larger number of tokens after a given OFF time which increases the number of bytes sent at the peak rate. By increasing the token rate, we decrease the transfer time: they are inversely proportional. However, increasing the token rate has the disavantage that it permits a user to "abuse" the profile by sending at a long-term steady state rate greater than the intended target rate. The duration of the transfer still has a peak time component and a token time component. In the scatter plot of Figure 3-2, the peak time dominates in most of the transactions. As an illustration, the two outliners occuring at 0.5 and 4.5 seconds have greater token times (see Figure 3-2). The first represents the file transferred for the 4. 5 leaky bucket aig. with token rate * = 20,000 bits/sec 4- 3. 5- 0 3- C 2. 5- 0 U) 2- C _ 1. 5* 0. 1 * 0 2 4 6 8 10 Data transferred in bytes 12 14 16 18 x 104 Figure 3-2: Leaky bucket with token rate 20,000 bits/sec duration of 0.5 second, and having a size of 16,031 bytes with a peak time of 0.117 and a token time of 0.4 second. The second represents the file transferred for the duration of 4.5 26 seconds, and having a size of 15,429 bytes with a peak time of 0.04 second and a token time of 4.17 seconds. 3.5 Token rate equals to 25,000 bits per second A token rate of 25,000 bits per second is chosen here based on simulation results. This rate is sufficient to allow this user to receive all files at the peak rate (i.e., achieves the lowest possible transmission duration time). That is, the user always accumulates enough tokens to receive data at the peak rate --token time is null. The transmission duration depends only on the peak rate. As an illustration, if we consider the 80,000 bytes file, we find the duration time to be 0.64 seconds. This result is compliant with the results in Fig- leaky bucket with token rate = 25,000 bits/sec * 1. 2* 1- .,2 0. 8 -_0 ~0 CO) 0. 6 - E (n) 0. 4- 0. 2- 0 2 I 4 I 6 8 10 Data transferred in bytes 12 14 16 18 x 104 Figure 3-3: Leaky bucket with token rate 25,000 bits/sec ure 3-3. Further increasing the token rate for this data set will not improve the delay duration. 27 3.6 Token rate equals to 8,000 bits per second In case of overload, that is when the normalized average rate of a user is greater than the token rate, the replenishing rate of the bucket is slower than the arriving data rate. The bucket is unable to accumulate enough tokens. Therefore, most of the files are received at the token rate. The transmission duration of the largest file goes from fifty seconds at 10,000 bits per second to 160 seconds at 8,000 bits per second. The leaky bucket algorithm severely penalizes overloaded traffic with heavy-tailed distributions. In the next chapter, we present a different algorithm that yields better result from a similar situation. 160 leaky bucket algorithm with token rate 40 = 8,000 bits/sec - 1 20 -- 10 0 - M 0 _0 ~0 80 -C U') E 6 0-- 40- 2 0- 0 2 4 6 10 8 12 14 Data transferred in bytes 16 18 x 104 Figure 3-4: Leaky bucket with target rate 8,000 bits/sec 3.7 Performance of the leaky bucket algorithm on different data sets Studying the algorithm for one data set is insufficient to draw a conclusion about performance: data set distribution varies from user to user. This section evaluates the performance of the leaky bucket algorithm for different users (or data sets). The objective in the 28 following analysis is to determine the minimum token rate required by each data set in order to achieve the lowest possible transfer time. For simplicity, we conducted this study of the algorithm on only four sets of data; we analyzed data sets goofy, daffy, gonzo, and pooh. From the scatter plot of Figure 3-5, we can infer that data set goofy attains its lowest leaky bucket wit token rate toer data set goofy. 9- = 25,000 bits/sec leaky bucket with token rate tor data set datty.* 45 = 35,000 bits/sec 4 735 67-- 5- 25 43-- 3 - 05 6 4 2 0 14 12 10 8 Data transferred in bytse wit token rate =25,000 leaydata suke ton set geezo. 1 0 2 bits/sec transferred leaky bucket with token rate for data set pooh, 0.6 - 1.2 3 Data X 105 6 5 4 in bytes X 10 55,000 bits/sec 0.5.0 0.4 -- 0.6 0.3- 0.4 02- 02 01 ~d 2 4 6 10 8 Data transferred in bytes 12 14 16 18 X 70 to 1 2 3 4 5 Data transferred in bytse 6 7 8 9 X 10o Figure 3-5: Maximum token rate required for each data set transferred duration time of ten seconds for a token rate of 25,000 bits per second; data set daffy reaches its lowest transferred duration time of five seconds for a corresponding token rate of 35,000 bits per second; data set gonzo reaches its lowest transferred duration time of 1.4 seconds for a corresponding token rate of 25,000 bits per second. Finally, data set pooh reaches its lowest transferred duration time of 0.7 second for a corresponding token rate of 55,000 bits per second. If we consider the peak rate (IMbits per second), our calculations yield transferred duration times corresponding to he results exhibit by the scat- 29 ter plot of Figure 3-5. The discrepancy in the token rate, in turn, can be attributed to the difference in these data sets distributions. Data set goofy reaches its maximum token rate sooner because its data set tail distribu300 250F data set daffy data set goofy 250 200 200 1t0 150 100 50 2 6 8 File sizes in bytes 4 250 12 10 1 14 2 X 10 3 File sizes in bytes 5 4 6 X 10 data set pooh data set gonzo 300 250 - 10 200 0 0 Eo 10 50 0111 0 7 3 01 8 X File sizes in bytes 1 2 File sizes in bytes 1o' 2 14 16 18 X 104 Figure 3-6: Data set distributions tion is less heavier (see Figure 3-6). As a result, data set goofy accumulates enough tokens that allows it to receive data at the peak rate given a smaller token rate. That is the opposite for data set pooh which has a very heavy-tailed distribution. Consequently, the required token rate is much larger. 3.8 Summary To summarize, the transmission duration time decreases as we increase the token rate. Though increasing the token rate (i.e., the usage profile) results in lowering the transfer delays, such a scheme is neither efficient nor economically appealing. To achieve a desir- 30 able delay, a user will eventually need a higher profile which will be more expensive. After all, this higher usage profile may be needed just because of a few large transfers. Running this scheme for different data sets shows that the heavier the tail distribution of the data set the greater is the required token rate necessary to achieve the desired minimum delay. As pointed out in [1], the leaky-bucket scheme performs well with Poisson like distnbution but penalizes heavy-tailed traffic (i.e., web traffic). The need for a suitable algorithm to regulate web traffic is eminent: the next chapter proposes and evaluates such an algorithm. 31 Chapter 4 Averate Rate Control Usage Profile Algorithm 4.1 Introduction This algorithm is build on the Time Sliding Window Tagging algorithm presented in [12]. There are two parts to this profile scheme. The first is a rate estimator, based on the Time Sliding Window estimator, which measures the average usage over some period. The second is a control or limit that is applied to the traffic so long as the actual sending rate, as measured by the estimator, exceeds the target rate. In contrast to the leaky bucket scheme, the controls look at each ON period in isolation; in other words, what happens in each ON period does not depend on the length of the immediate OFF period, or any other shortterm history. The control is only related to the output of the estimator. The estimator allows the algorithm to determine when a user has exceeded its target rate. In this case, the algorithm will enforce some pre-defined policy to discourage such a behavior. I propose a number of different controls, which involve reducing the allowed peak rate, and limitimg the number of bytes in any one ON period that can be sent at the peak rate. Further, instead of using packets to calculate the total number of bytes sent, I use file sizes. In this simulation, all control adjustments are made at the beginning of each file (each ON period) since the size of the file is known in advance. A practical scheme must make the decision incrementally since the duration of the ON period is not known. This algorithm maintains three local variables: Win-length that represents how much the past "history" of a user will affect its current rate; the average-rate that estimates the current rate of the user upon each ON period; Tfront that represents the time of the current file arrival. While Tfront and the average-rate are calculated throughout simulation, Win-length must be configured. [12] shows that this algorithm performs well for Tfront between 0.6 and I second. The core of the ARCUP algoritm is presented below: Initially: 32 Winlength = constant; Average-rate = 0; T_front = 0; Upon each file arrival, TSW updates its variables: FilesinTSW = average-rate*Winjlength; Newfiles = FilesinTSW + workload; average-rate = Newfiles/ (total-time - Tfront + Win-length); T_front = total-time; / time of the last packet arrival The goal of this algorithm is to hold the user to his/her long term average rate and to impose limitations on burst. Our challenge is to let burst through with little distortion while keeping the achieved average rate close to the nominal average rate. We accomplished these goals by keeping the current average rate as close as possible to the target rate; both average and target rate are expressed in bits per second. For each arriving file, the algorithm compares the current average rate with the target rate. If the average rate is smaller than the target rate, no action is taken; if the average rate is greater than the target rate, then some control is imposed to reduce the average rate. A typical average rate function is depicted in Figure 4-2. Since we want to achieve this goal regardless of changes in system parameters, we use a feedback closed-loop to control the system. In other words, the average rate is continuously measured and compared to the target rate. The difference (or error value) between them is used to vary the peak rate, which for this simulation has a maximum value of 1 Mega bits per second. 4.2 Uncontrolled average rate: The uncontrolled average rate represents the average rate of each user if no control mechanism is imposed. For each user, the corresponding average rate is determined by allowing the user to receive data strictly at the peak rate. A simple division between the total bits transferred and the total transmission time yields the uncontrolled average rate. The OFF-times for each data set is scaled to adjust the uncontrolled average rate to the desired value. Under these conditions, the user obtains its highest average peak rate (the highest 33 value that the average rate can reach) and its lowest transmission duration delay. In the following subsections, we will present several schemes that aim at controlling the average rate and by doing so we increase the transmission delay. This chapter defines and analyzes the same estimator scheme with different controls. x 11_ uncontrolled average rate 2.51- 2 CO 0) 1.5 (Z 1 0.5- 010 ]V~N~1VJ4 5 ~ThTh 10 \JV l NJ _0 '?V 20 15 25 I~j \I 30 35 Time in mn Figure 4-1: Uncontrolled average for data set gonzo Data set gonzo is utilized as the benchmark to evaluate the merit of each applied control technique. We will present the following methods: in section I, we analyze an uncontrolled average rate; in section II, we show a maximum data control method; in section III, we study a decrease and increase of the peak rate by a fixed factor approach; in section IV, we combine the two techniques used in sections II and III; in section V, we study the performance of the algorithm for different target rates; in section VI, we evaluate the performance of the algorithm for different data sets; and in section VII, we compare the results for running the algorithm at the peak, average, and target rates. 34 As an example, we consider a user with a target rate of 10,000 bits/sec which is not unreasonable for a user surfing the Web today. In this case, no control algorithm is applied on the user who completed all transfers at the peak rate. The resulting average rate function (Figure 4-1) stays above the target rate between six and seven minutes into simulation, and reaches a maximum value which is about twenty-five times the target rate. To reiterate, our objective is to keep the average rate as close as possible to the target rate. This control will become apparent to the user in terms of elapse time when downloading information. The smaller the target rate and the tighter the control, the more time it will take to receive Web data. The straight lines in Figure 4-1 represent very long off-periods while the spikes represent large files transfer. 4.3 Maximum data control The motivation behind this scheme is to prevent the user from receiving large files regardless of its current average rate. This variant does not use the rate estimator. It receives data at the peak rate except if the size of the Web data is greater than a pre-defined value. Files as big as 900,000 bits are suitable to evaluate the performance of different data sets in the context of this algorithm. The bigger the maximum data the smaller is the overall transmission time for a given data set. Here, we evaluate the algorithm for rather two small maximum data: 300,000 bits and 150,000 bits. With a maximum data of 300,000 bits, the user is less restricted. Therefore, the total transmission time to complete his/her transfers is less than in the other case. However, the long-term average rate is better controlled with a maximum data of 150,000 bits while the increase in the total transmission time is tolerable. In both cases, any file greater than the pre-defined maximum data size is process as follows: the pre-defined maximum data is serviced at the peak rate, and the rest at the target rate. If the current file size is less than the pre-defined maximum data, the user can always receive at the peak rate. In Figure 4-2, the long term average rate in both cases are well below the value obtained in the uncontrolled case (Figure 4-1). The improvement in controlling the long-term average rate is at the expense of additional delay. The time required by the same user to complete his/her entire transaction is increased by about four minutes. 35 X 10 12 10 - max data 0 a) 300,000 bits 8 C,) (I) 15 max data a) 150,000 bits 6 a) a) 4 2 0 0 5 10 15 20 Time in mn 25 30 35 40 Figure 4-2: Maximum data effect 4.4 Decrease and increase the peak rate by a fixed factor For this version of the ARCUP algorithm, the user is allowed to receive data at the peak rate until its average rate surpasses its target rate. At the occurrence of such an event, the peak rate is decreased by a pre-defined factor (in this case, we set the pkrate = 0.8*pk-rate). This decrease continues until the average rate drops below the target rate. At each packet arrival, the ARCUP algorithm determines how to adjust its peak rate. We have chosen this approach instead of a per time interval criterion since the ON periods represent when the user is active. Figure 4-3 shows all the cases where the peak rate has to be dropped. Figure 4-4 depicts the changes occuring in the peak rate; it also expresses the dynamism between the average, the target, and the peak rates. A spike in Figure 4-3 corresponds to a drop in the peak rate in Figure 4-4. In this instance, the longest period of 36 drop occurs between twelve and seventeen minutes into simulation; between twelve and twenty-two minutes approximately, the peak rate is kept below its maximum value. x 105 2 r- 1.8 - 1.6 - 1.4 CD, S1.2 Cz a ) CU a") 0.8 0.6 0.4 F 0.2 0 0 5 10 10, 1520253 15 q 20 25 30 35 40 Time in mn Figure 4-3 : Data gonzo with target rate 10,000 bits/sec The peak rate is increased by a predefined factor as well (in this example, we set peak rate = 1.2*peak rate). After a drop around five minutes into simulation (Figure 4-4), the peak rate increases and remains constant until about twelve minutes into simulation. The behavior of the peak rate within that interval is due to the fact that the average rate remains below the target rate over that same interval (Figure 4-3). 37 x 105 9- 87- CZ 1- 26- 32- 01 0 5 10 20 15 25 30 35 Time in mn Figure 4-4: Change in the peak rate 4.5 Varying the peak rate and maximum data control This section is somewhat a hybrid of section II and section III. This variant of the ARCUP algorithms provides the best results; thereby, it is utilized for the remaining of this thesis to evaluate different case studies. In addition to the maximum data control, the peak rate is dropped whenever the average rate is greater than the target rate. This control occurs regardless of files' sizes. Analyzing this variant of the ARCUP algorithm for the same maximum data used above, we observed a better control in the average rate with a slight increase in the total delay. The maximum value reaches by the average rate in this case is about 80,000 bits/sec compare to 200,000 bits/sec in the previous case (Figure 4-3). In return, the total delay is about three minutes extra. 38 We present the results for two maximum values data: namely, 300,000 and 150,000 X 10 8 max data = 300,0 00 bits 7 6 Cl, 5 max data = 150,000 bits CZ )4 a) 3 2 1 ~v1 0 0 5 10 15 20 25 Time in mn 30 35 40 45 Figure 4-5: Maximum data and peak rate control bits. In both cases, the algorithm restricts the user from receiving data at the peak rate once the file size exceeds those values. This restriction takes effect even when the current average rate is less than the target rate. With the maximum data of 300,000 bits, the longterm average rate is slightly bigger with a total delay of 40 minutes. The long-term average rate is more restricted with the smaller maximum data while the incurring delay is about the same. Figure 4-5 shows the result for these two cases, while Figure 4-6 displays their relationship with the peak rate. The reader can observe that the peak rate drops whenever the average rate is greater than the target rate; it increases when the target rate is smaller, or remains constant. In Figure 4-5, for instance, when the maximum data size allowed equals 150,000 bits, the peak rate momentarily drops from 1 Mega bits per second to 10,000 bits per second after seven minutes into simulation. This drop occurs because 39 the average rate is greater than the target rate within that interval. Consequently, it forces x 10 10 - 9 max data = 150,000 bits 8 7 U -D 6 max data = 300,000 bits C a) 0) 5 4 0 0 5 5 10 10 15 15 25 20 25 20 Time in mn 30 30 I I 35 40 45 Figure 4-6: Variation in the peak rates the average rate to go below the target rate. The peak rate remains below its maximum value during much of the simulation. It momentarily climbs back to it nominal value after thirty minutes into simulation due to the re-adjustment in the average rate. The peak rate remains at its maximum value (i.e., IMbits/sec) for about two minutes thereafter. This observation can be explained by the fact that the average rate remains below the target rate within that interval. On the other hand, when the maximum data size allowed equals 300,000 bits, the result is a little bit different. The reader should notice that the drop in the peak rate is more acute when the maximum data equals 300,000 bits. By relaxing the constraint on the maximum data size, the user obtains a greater average rate (see Figure 4-5) . Since the peak rate is dynamically controlled, a greater increase in the average rate results in a greater decrease in the peak rate (see Figure 4-6). 40 4.6 Performance of the ARCUP algorithm for different target rates Is the ARCUP algorithm sensitive to the target rate? Obviously, different users are bound to have different profile requests. In a quest to determine the best profile for a user given a sample of his/her Web transactions over a period of time, I evaluate the ARCUP algorithm for different target rate values. For simplicity, we restrict ourselves to just three values, namely: 12,000 bits/sec, 10,000 bits/sec, and 8,000 bits/sec. For this analysis, I concentrate on the transfer time required for two files when different target rates are applied. As an illustration, I focus on the scatter plot of figure 4.7-4.9 to analyze the transmission times resulting for a file of 20,000 bytes and a file of 60,000 bytes. First, I analyze the results for the case when the target rate equals 12,000 bits/sec. For the first file, the resulting transmission duration time is approximately two seconds using the ARCUP algorithm average rate; if it were processed at the nominal average rate ( 10,000 bits/sec), the resulting transmission time would have been sixteen seconds. Further, this file size is less than the pre-defined maximum data of 300,000 bits. As a result, most of the transfers are done at a rate that is close to the peak rate. For the second file, the nominal average rate (i.e, 10,000 bits/sec) would yield a transmission duration time of fourty-eight seconds while the ARCUP algorithm yields approximately eight seconds. Since the second file size is greater than the pre-defined maximum data, its transmission transfer time contains a peak rate component and a target rate component. In the scatter plot of Figure 4-7, the dotted points near the x-axis represent the files sizes less than 41 300,000 bits. All those files are processed at a rate that is near the peak rate. The rem aining dotted points represent the larger files that are processe d at a slower rate. 20 C target rate = 12,000 bits/sec * * 2 0- 1 * 5- E '-1 CZ 0* * * * 5- n*-2 4 6 10 8 Transfer size in bytes 12 14 16 18 x 104 Figure 4-7: Data set gonzo at 12,000 bits/sec Second, I analyze the results for the case when the target rate equals 10,000 bits per second. By lowering the target rate, we obtained an increased in the maximum delay per transaction. In this analysis, I consider the same two files mentioned in the previous case. The same amount of time is required for the smaller file (see Figure 4-7) since it is less than the pre-defined maximum data allowed. However, there is an increased of approxi- 42 mately three minutes for the second file . Third, by lowering the target rate to 8,000 bits/ 45 40- Varying peak rate with maxdata = 300,000 bits and target rate =10,000 bits/sec 35- 30C.) * e Cd) 25- ** Cn ** 20F- *** 15- 10- 5 0 0 2 4 8 6 10 Transfer size in bits 12 14 x 105 Figure 4-8: Data set gonzo at 10,000 bits/sec sec, the ARCUP algorithm becomes more restrictive (Figure 4-8). The individual transfer rate for each file increases as well as the overall transmission time. We notice that the 20,000 bytes file still requires approximately the same amount of time while the 60,000 bytes file requires twenty seconds. In addition, there is a slight increase in the transmission time of the files that are less the maximum data. These are the files with an average and a peak rate component. In this case, the average rate reached the target rate much sooner; it has the effect of reducing the obtained average rate; though, it increased the transmission times. 43 To summarize, variations in the target rate affect the obtained average rate ( total data r_ I I I I target rate = 8,000 bits/sec 4 5- 4 0- 35C) C) * 30- 0 Z3 25** 0 Ud) 2 E (n) 0-* 15-* 0 * ** 10 - * 0 2 4 6 10 8 Data transferred in bytes 12 14 16 18 x 104 Figure 4-9: Data set gonzo at 8,000 bits/sec transfer over total transmission duration) as well as the transmission duration time. If the target rate is set to be greater than the nominal average rate, the user obtains a lower response time per transfer. On the other hand, if the target rate is set lower than the nominal average rate, (which is 10,000 bits/sec for data set gonzo) the user obtains greater response time. Below is a table of obtained average rate for different target rates and different maximum data. 44 4.7 Effect of maximum data size on obtained average rate Increasing the maximum data size can be viewed as limiting the type of files that a Table 4-1: Effect of maximum data size on obtained average rate Maximum data Obtained average rate (bits/sec) Nominal average rate (bits/sec) Target rate (bits/sec) 100,000 bits 8,780 10,000 8,000 300,000 bits 9,084 600,000 bits 9,248 900,000 bits 9,460 100,000 bits 8,933 10,000 10,000 300,000 bits 9,220 600,000 bits 9,4708 900,000 bits 9,652 100,000 bits 9,045 10,000 12,000 300,000 bits 9,304 600,000 bits 9,525.5 900,000 bits 9,675.6 user can download with no additional delay. A typical case can be to allow the user to download, for instance, hypertext, images, videos, but not MP-3 files. Table 4-1 and Figure 4-10 illustrate the case of a fixed target rate but different maximum data. The total transmission duration time improves with an increase in the maximum data. Figure 4-10 shows that by increasing the maximum data allowed more files are being received near or at the peak rate. It illustrates the characteristic of the ARCUP algorithm to restrict large files. 45 max data 300,000 bits max data= 100,000 bits 25 25 - 20 20** 9 .15 0 E 95 10 10* .45 5- 2 4 6 8 10 Transfer size in bytes 14 12 max data = 600,000 bits 16 11 0 6 4 * 8 10 Transfer size in bytes max data 25 - 12 14 16 12 14 16 X 10 1a 900,000 bits 25- *- 20 2 20 - 5- S15 - .0 1 0- to- 5 -- 5- 0 2 4 6 10 Transfer size in bytes 8 12 14 16 18 . * * * ** Mt.**p . 2 X 10 4 * , 6 * a 10 Transfer size in bytes 18 X 104 Figure 4-10: Transfer for data set gonzo with different maximum data 4.8 Effect of maximum data on transmission duration Table 4-2 shows the results of running the ARCUP algorithm at different maximum data values. These results are not taking into consideration OFF periods. In other words, they solely represent the activity periods (i.e., ON periods) of data set gonzo at a target Table 4-2: Effect of maximum data on transmission duration Maximum data (bits) Total transmission duration (secs) Uncontrolled duration (secs) 100,000 290 20 300,000 221 20 600,000 150 20 46 Table 4-2: Effect of maximum data on transmission duration Maximum data (bits) Total transmission duration (secs) Uncontrolled duration (secs) 900,000 115 20 rate of 10,000 bits/second. As observed, the total transmission duration decreases as we increase the maximum data value. This feature will provide ISPs (Internet Service Provider) and/or network adminitrators the flexibility to adjust a user's profile as desired. 4.9 Algorithm performance for different data sets Studying the algorithm for one data set is insufficient to draw a performance conclusion: data set distribution varies from user to user. This section evaluates the performance of the ARCUP algorithm for different users (or data sets). The objective in this following analysis is to determine the algorithm's performance for different data sets. This study involves four sets of data: goofy, taz, daffy, and pooh (The names of the actual users were removed for privacy rights). For each data set, we first consider the uncontrolled average rate followed by the controlled average rate. The target rate and the nominal average rates in all cases are 10,000 bits/sec, and the maximum data is 300,000 bits. The ARCUP algorithm performs relatively well in each case. Table 4-3 shows the achieved average rate for each data set. In each case, we give the user's uncontrolled average rate followed by his/her conTable 4-3: Achieved average rate User name Long-term average rate(bits/sec) Achieved average rate (bits/sec) Daffy 9,993 9,155 Gonzo 10,122 9,048 Goofy 10,406 9,467 Pooh 10,430 9,260 Taz 9,883 8,864 trolled average rate. In the uncontrolled case, the average rate has the highest spikes and 47 the smallest total transmission delays. In the controlled case, the long-term average rate is much lower, but there is an increase in total transmission delay. For instance, data set pooh's average rate swings in the vicinity of the target rate while the transmission time delay increases considerably for the controlled case (Figure 4-16). In the other cases, the algorithm controls the average rates with far less additional transmission delay (Figure 48). As an illustration, the achieved average rate for data set daffy is reduced considerably in the controlled case with a small increase in the total transmission delay (Figure 4-12). x10 4.5 4 - 3.5 F 3 CO, 2.5F- 2 0) 1.5 1 0.5 0 AI 0 5 VI \ 10 -j 15 20 Time in mn 25 Figure 4-11: Uncontrolled daffy 48 30 35 40 x 104 8 7 6 55 (D5 co 3 2 1 0 0 5 10 15 20 25 Time in nr Figure 4-12: Control daffy 49 30 35 kI 40 45 x 105 5 4.5 - 4 - F 3.5 C-) 3 Cu) CO) 2.5 - CD 2 1.5 1 0.5 'Ni\l [VJ\Ilf 0 0 \/41u&- 5 o\I .j W Ij WufU' 10 Al It, YLWWMA- 15 IL KUV.jr 20 Time in mn L--.t. I I[Akk1, L J~~ul l'\fj1 U 25 Figure 4-13: Uncontrolled goofy 50 30 1 1N --4 1h IJAWW 35 I 40 x 104 12 10F c-) cz 2 - diU I IlLiIN44 .1 0 Ik 0 5 10 15 11 K qjqj rv 25 20 Time in mn 30 Figure 4-14: Controlled goofy 51 J11. 35 IN 40 45 SX 105 2.5- 2a) C', 1.5) a) 1-- 0.5- 0N 0 5 15 10 Time in mn Figure 4-15: Uncontrolled pooh 52 20 2 121 10 8 a-) a) 6 4 2 ITWO M L --- U L I I 0 0 5 10 15 20 Time in mn Figure 4-16: Controlled pooh 53 25 30 X10 5- 4- C 3 - 0 a, 2- 0 20 40 60 Time in mn 80 Figure 4-17: Uncontrolled taz 54 100 120 X 10 6- 5- 4 -- Z)3 -- 2-- 0 0 20 40 80 60 100 120 140 Time in mn Figure 4-18: Controlled taz 4.10 Running the algorithm at the peak, target, and average rates The transmission time is given by the ratio between the size of the file being received and the actual transfer rate. The transmission time for each file received has two boundaries. Considering the scatter plot of Figure 4-19, the transmission time is limited above by the target rate and below by the peak rate. This section investigates the duration of transferred files for running the algorithm : 1) at the peak rate, 2) at the target rate, and 3) at the average rate. An illustration should facilitate its understanding: when allowed to continuously receive files at the peak rate, the maximum delay observed by the user is 1.4 55 second per transaction. This result is accurate because the user's largest transaction is 181 16 o peak rate * target rate A average rate * 14 -A 12 AA Cl) 10 E I- i8 AA - 6 - 4 A AA 2 0 O'O1m 0 0.5 nc) 1.5 1 Transfer size in bytes n- 2 2.5 X 104 Figure 4-19: Average, peak, and target rates 180,000 bytes at a speed of 1 Mega bits per second. When constrained to continuously receive at the target rate, the maximum delay is 120 seconds per transaction. Again, we can corroborate this result because the largest transaction is still 180,000 bytes at a target rate of 10,000 bits per second. For a user that is receiving data at the average rate, the observed delay in transmission time is enclosed between the two previous cases. In this case, each transmission time has a peak time as well as a target time component. As a result, they are not processed any faster than the peak rate nor any slower than the target rate. 56 This page left intentionally blank 57 Chapter 5 Conclusion, Applications, Future Work and This chapter contains a summary of our findings, suggests a possible application of the ARCUP algorithm, and proposes an idea for future work. Further, it contains references, source codes, and a sample of the data set used in simulations. Since the two schemes presented in chapter 4 and chapter 5 are different, we presented the graphical results for each one of them in the most comprehensive way. However, one will be mostly interested to know how much the flows are delayed by each scheme respectively. We answer this question by evaluating both algorithms against a common metric. We first look at the amount of time required for a user to complete his/her transactions when no control is applied. We then computed the amount of time required for the same set of data when either scheme is applied. The results are inserted in the following section. 5.1 Conclusion This paper analyzed the leaky-bucket algorithm and showed that is inefficient in controlling heavy-tail traffic. This analysis is supported by [8] as well. Table I shows the results from running the leaky-bucket versus the proposed algorithm for different data sets at the standard usage profile (i.e., target rate = 10,000 bits/sec). The target rate in the case of the ARCUP algorithm is equivalent to the token rate in the leaky-bucket algorithm. From this table, one case deserves some elaboration: data set goofy. Data set goofy (Figure 1-1) is relatively poisson distributed, therefore the leaky-bucket scheme provides a transmission delay that is similar to the ARCUP algorithm. The leaky-bucket profile works well for short-range dependence traffic -- predictable type of traffic like poisson 58 model. In all the other cases, the ARCUP algorithm outperforms its counterpart. Table 51: Table 5-1: Leaky bucket vs. ARCUP algorithm (time in mn and data in Mega bits) User name or data set Total data transferred Uncontrolled Leaky-bucket TSW algorithm Taz 67 113 140.5 127 Gonzo 19.7 32.4 40 34.6 Goofy 22.7 36 40.3 40 Pooh 22.6 20.8 24 21.7 Daffy 13 37.7 43 42 These results, are obtained by running the ARCUP algorithm with a maximum data equals to 600,000 bits and a target rate of 10,000 bits/sec. These results can be further improved by increasing the maximum data value. In closing, the simple leaky bucket profile works well for short-range dependence traffic -- predictable type of traffic like Poisson model. The ARCUP algorithm is more suitable for long-range dependent or self-similar traffic. It outperforms the leaky bucket algorithm especially when the total files transferred is considerable. More studies can confirm whether or not this algoritm (ARCUP) does penalize Markovian traffic. 5.2 Application In the current Internet model, ISPs (Internet Service Providers) lease their bandwidths from the backbone suppliers. The latter own high capacity links like OC-1, OC-2, and OC3. They are considered as the source of bandwidth capacity. Corporations, institutions, and individual users, in turn, lease their bandwidth from the ISPs. To deter end users from using more than their contracted bandwidth, Internet Service Providers can build a usage profile for each source. They will have to provision enough bandwidth to carry traffic for all users. For Web (bursty) traffic, they assume that not all users send at peak rate at the same time. In a setting like a university, ISPs can provide a profile to the LAN administrator. The administrator will then repartition profiles locally according to users' needs. In a computer laboratory, like the Athena clusters here at MIT, all workstations will be 59 assigned the same profile. Businesses, for example, will be able to discourage their employees from making excessive usage of the Internet by assigning them lower usage profiles. Let's say an employee profile is such that he/she can download five files every ten minutes. If this employee tries to download seven files in ten minutes, the algorithm can be calibrated to complete this transfer in twenty minutes. Such a policy will encourage the user to stay within his assigned profile. In addition, the usage profile algorithm can be built with added features than can warn the user when exceeding his/her profile. As an incentive, the user will not be penalized if he/she voluntarily slows down. The usage profile mechanism will contribute to avoid congestion over the Internet. 5.3 Future work The two algorithms presented in this thesis can work in tandem with mechanisms that can differentiate the type of traffic before hand. In future work, one can explore the possibilities of designing a hybrid algorithm that can regulate properly all types of traffic. Further, for both algorithms, we assumed that the length on the ON periods are known a priori which is contrary to reality. In future work, one can explore the merit of a scheme that can determine the proper adjustment to the peak and the average rate based on the instantaneous length of the ON period. An inquisitive mind can even look into a different algoritm that will produce better results than the one presented in this thesis. 60 5.4 References [1] M. E. Crovella and A. Bestavros, "Self-Similarity in World Wide Web Traffic: Evidence and Possible Causes", in: IEEE/A CM Transactionson networking, Vol. 5, No. 6, December 1997. [2] Walter Willinger, Murad S. Taqqu, Robert Sherman, and Daniel V. Wilson, "Self-Similarity Through High-Variability: Statistical Analysis of Ethernet LAN Traffic at the Source Level", in: IEEE/ACM Transactions on networking, Vol. 5, No. 1, February 1997. [3] Modeling Mammographic Images Using Fractional Brownian Motion, Signal Processing Research Centre, Queensland University of Technology. [4] Patrick Flandrin, "Wavelet Analysis and Synthesis of Fractional Brownian Motion", in: IEEE Transactionson Information Theory, Vol. 5, No. 38, March 1992. [5] W. Willinger, V. Paxon, and M.S. Taqqu, "Self-Similarity and Heavy Tails Structural Modeling of Network Traffic In a Pratical Guide to Heavy Tails: Statistical Techniques and Applications", R. Adler, R. Feldman, and M.S. Taqqu editors, Birkhauser, 1998. [6] S. A. Marshall, "Introduction to Control Theory", 19978. [7] Gennady Samorodnitsky and Murad S. Taqqu, "Stable Non-Gaussian Random Processes, Stochastic Models with Infinite Variance", pp. 318-320. [8] Ian Je Hun Liu, Bandwidth Provisioning for an IP Network using User Profiles. S. M. Thesis, Technology and Policy, and Electrical Engineering and Computer Science, Massachusetts Institute of Technology, Cambridge, Ma. 1999. [9] David D. Clark, "An Internet Model for Cost Allocation and Pricing": in Internet Economics, McKnight, L. and Bailey, J., editors, MIT Press, 1997. [10] Kevin Warwick, "An Introduction to Control Systems", Advanced Series in Electrical and Computer Engineering,Vol. 8, 2nd edition. [11] Larry L. Peterson and Bruce S. Davie, "Computer Networks, A Systems Approach", 2nd edition. [12] Wenjia Fang, Differentiated Services: Architecture, Mechanisms and an Evaluation, Ph.D. Dissertation, Presented to the Faculty of Princeton University, November 2000 [13] F. Nicola, Gertjan A. Hagesteijn, and Byung G. Kim, Fast-simulation of the leaky bucket algorithm. 61 Appendix A Source codes and Data sample A.1 Generating OFF times /* Author: Pierre Arthur Elysee */ /* Date: 6/4/99 */ /* Description: This module generates the off times use in our simulations */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <math.h> #include <time.h> /* all macros */ #define seps "" /* separation symbols for parsing */ #define MAXNUMBERCHARS 200 /* number of characters expected per line */ /* This function generates random numbers */ double rand-gen(num) /* generate a uniformly distributed random number from [0,1] */ unsigned long *num; { double random; 62 *num = (*num * 16807) % 4294967295; random = ((double) *num) / 4.294967295e9; return (random); I void mainO { /* variables */ FILE *fp-in, *fp-out; char inputname[20], outputname[20]; char buff[200]; /* hold one line of data at a time */ float word[30], duration; int nw, onsources, offsources, fbm = 1; double k = 0.0, y=0.0, time on=0.0, timeoff=0.0, savetimeon=0.0, save time off=0.0; double starttime=0.0, endtime=0.0; char *w; char buffl [80], buff2[15], buff3[15], buff4[15], buff5[15], buff6[15]; unsigned long seed = 2345677; long int offset; printf("Enter input file name:\n"); scanf("%s", input-name); strcpy(output-name,input-name); strcat(output-name, "_out"); printf("input file name: %s \n",input name); printf("output file name: %s \n\n",output-name); 63 /* durantion value is entered here */ printf("Enter duration value:\n"); scanf("%f", &duration ); starttime = clockO; /* beginning of simulation */ /* testing input and output file for existance */ if ( (fpjin = fopen(input-name,"r") ) == (FILE *) NULL) { printf("Couldn't open %s for reading.\n", input-name); exit(1); I if ( (fpout =fopen(outputname,"w") ) == (FILE *) NULL ) { printf("Couldn't open %s for writing.\n", output name); exit(1); I else { strcpy(buff3,"Time on"); strcpy(buff4,"Time off"); strcpy(buff5,"Source index"); strcpy(buff6,"Fbm_index"); /* output formatting */ sprintf(buffl., "%s %s %s %s", buff3, buff4, buff5, buff6); fputs(buff 1, fpout); /* write line to file */ fputs("\n", fpout); /* add return character to line */ fp-in = fopen(input_name,"r"); 64 /* Read input component list file stream */ while (fgets(buff, MAXNUMBERCHARS, fpjin) != 0) { nw =0; w = strtok(buff, seps); /* find first word */ while (w) { /* parsing input line */ word[nw++] = atof(w); w = strtok(NULL, seps); /* find next word */ I /* calculating number of sources on and off for each FBM */ onsources = (int)(word[7]/word[1]); offsources = (int)(word[3] - on-sources); /*testing Beta and alpha values */ if(word[21] <= 0.0 || word[23] <= 0.0 || word[13] <= 0.0) { sprintf(buff 1, "%s %d", "Invalid BetaHigh1 or BetaHigh2 or Alphal for FBM:", fbm++); fputs(buff 1, fpout); fputs("\n", fpout); continue; I /* calculating time-on and timeoff for each source within an FBM */ while(on-sources) 65 {. while(duration > timeon){ /* y is a uniformely distributed random variable between (0,1) */ do{ y = rand-gen(&seed); // y will never be either zero or one }while(y==0.0 11y==I.); /* calculating on/off times */ timeon = word[21]*pow((1/y-1), 1.0/word[13]); /* time in seconds */ timeoff= word[21]*pow((1/y-1), 1.0/word[13]); /* time in seconds */ timeon = timeon + savetimeoff; timeoff = time-off + time-on; savetimeoff = timeoff; if(time-on > duration ) continue; /*time-on has exceeded simulation time*/ /* writing results to output file */ strcpy(buff 1, "\0"); /* initializing buffer to null */ sprintf( buff 1, "%f %f %d %s%d%s", timeon, timeoff, onsources, "A", fbm, "Z"); fputs(buff 1, fp-out); /* write line to file */ fputs("\n", fp-out); /* add return character to line */ }/* end while duration */ timeoff= 0.0; timeon = 0.0; savetimeon =0.0; savetimeoff= 0.0; onsources--; /* get next source */ 66 } /* end while onsources */ while(off sources) {while(duration > time-on){ /* y is a uniformely distributed random variable */ do{ y = rand-gen(&seed); /* y will never be either zero or one */ }while(y==O.O 11y==1.0); /* calculating on times */ timeon = word[23]*pow((1/y-1), 1.0/word[13]); /* time in seconds */ timeoff = word[23]*pow((1/y-1), 1.0/word[13]); /* time in seconds */ timeon = timeon + savetime_off; time-off = time-on + timeoff; savetimeoff = timeoff; if(timeon > duration ) continue;/*timeoff has exceeded simulation time */ 1* writing results to output file */ strcpy(buff 1, "\0"); /* initializing buffer to null */ sprintf( buff1, "%f %f %d %s%d%s", timeon, timeoff, offsources, "A", fbm, "Z"); fputs(buff 1, fpout); /* write line to file */ fputs("\n", fpout); /* add return character to line */ } /* end while duration */ offsources--; /* get next source */ timeoff = 0.0; 67 timeon = 0.0; savetimeon = 0.0; savetimeoff = 0.0; }/* end while offsources */ fbm++; /* get a new FBM from file */ }/* end outer while */ fclose(fp-in); fclose(fp-out); endtime = clocko; printf("-------------------------------------\n"); printf("Simulation has been completed successfully.\n"); printf("The total running time is: %e seconds\n",((double)(end-time - starttime))/ CLOCKSPERSEC); } /* end else */ }/* end main */ A.2 This module represents the leaky bucket's scheme source codes /* Author: Pierre Arthur Elysee */ /* Date: 2/4/2000 */ /* Description: This module represents the leaky bucket scheme. It calculates the total time requires by each source to complete their data tranfer. */ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <math.h> /* all macros */ 68 #define seps "" 1* separator symbols for parsing */ #define MAXNUMBERCHARS 200 /* number of characters expected per line */ #define inibucket 0.0; /* initial content of bucket */ typedef struct int w; struct NODE *next; } NODE; void mainO { /* variables */ FILE *fp-in, *fp-out; char input-name[20], output_name[20]; char buff [200]; /* hold on line of data from input file */ char buff 1[200], buff2[20], buff3[10],buff4[20], buff5[10],buff6[20], buff7[20]; NODE *ptr head, *ptr-tail, *ptr walk, *prev-pos, *temp; float offtime; float pkjtime = 0.0; float tkntime = 0.0; /* peak time and token time */ float total-pk time = 0.0; float totaltkntime = 0.0; float totalofftime = 0.0; float totalontime = 0.0; float prevontime = 0.0; float period-on, periodoff, prev-offtime; float tknrate = 10000.0; /* replenishing rate of token in bit/sec */ 69 float pkjrate = 1000000.0; /* bit/sec */ float totaltime = 0.0; /* time requires by each source to comple their transfer */ float bucket = 0.0; /* bucket capacity */ float totaldata = 0.0; float datainbits = 0.0; int workload = 0; int ontime = 0.0; int typ-offtime = 0; /* in sec */ int newid = 0; int old_id = 1; int flag = 1; int i = 0; char *tmp; printf("Enter input file name:\n"); scanf("%s", input-name); strcpy(output-name,input-name); strcat(output-name, "_out"); printf("input file name: %s \n",input-name); printf("output file name: %s \n\n",output-name); printf("please enter token rate:"); scanf("%f', &tknrate); /* testing input and output file for existence */ if ( (fp_in = fopen(inputname,"r") )== (FILE *) NULL) { printf("Couldn't open %s for reading.\n", inputname); 70 exit(I); } if ( (fp-out = fopen(output-name,"w") ) == (FILE *) NULL) { printf("Couldn't open %s for writing.\n", output-name); exit(1); I else { strcpy(buff2,"typoff-time"); strcpy(buff3,"total data"); strcpy(buff5,"totalpktime"); strcpy(buff6,"total tkn_time"); strcpy(buff7,"totaltransfer time"); /* output formatting */ sprintf(buffl, "%s %s %s %s %s", buff2, buff3, buff5, buff6, buff7); fputs(buff 1, fp-out); /* write line to file */ fputs("\n", fp-out); /* add return character to line */ fp__n = fopen(input-name,"r"); while (fgets(buff, MAXNUMBERCHARS, fpin) != 0) { typ-offtime = atoi(strtok(buff, seps)); workload = atoi(strtok(NULL, seps)); data in bits = 1.0*workload; 71 if(typofftime == 0) typ-offjtime = 4; /* since of time off can not be zero */ bucket = bucket + tkn rate*typ-off time; totaldata = totaldata + datainbits; /* calculating peak time and token time */ if(data_inbits > bucket) pktime = bucket/pk-rate; datainbits = datainbits - bucket; bucket = pktime*tkn_rate; /* replenishing bucket for transfer duration*/ tkntime = datainbits/tknrate - bucket/tknrate; bucket = 0.0; } else pktime = datainbits/pkjrate; tkntime = 0.0; bucket = bucket - datainbits; } /* calculating total transfer time */ total-pk time = totalpktime + pktime; totaltkntime = totaltkntime + tkntime; totaltime = total_pk_time + totaltkntime + typ-offtime + total-time; sprintf(buffl, " %d %d %f %f work-load, pktime, tkntime, total_time); fputs(buff 1, fpout); /* write line to file */ fputs("\n", fp-out); /* add return character to line */ }/ end while 72 %f", typofftime, /* taking care of last source */ /*spnntf(buffl, " %d %f %f %f %f", typ-offtime, total-data/8, total-pk-time, totaltkn time,total-time); fputs(buff 1, fp-out); write line to file /*fputs("\n", fp-out); add return character to line */ fclose(fpjin); fclose(fpout); }// end else }// end main A.3 This module represents the uncontrolled ARCUP algorithm /* Author: Pierre Arthur Elysee */ /* Date: 9/24/00 */ /* Description: This module calculates the total time requires *1 /* by each source to complete their data tranfer. It uses an average estimator skim along with */ /* real web traces. send av/target partion of data at target and l-av/target at peak rate when average > target, increase otherwise. Report the transfer time for each transaction*/ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <math.h> /* all macros */ #define seps " " /* separator symbols for parsing */ #define MAXNUMBERCHARS 200 /* number of characters expected per line */ 73 #define ini_bucket 0.0; /* initial content of bucket */ typedef struct { int w; struct NODE *next; } NODE; void mainO { /* variables */ FILE *fp-inI, *fp-in2, *fp out; char input_name1 [20], input-name2[20], output-name[20]; char buff[200]; /* hold on line of data from input file */ char buff 1[200], buff2[20]. buff3[10],buff4[20], buff5[10],buff6[20], buff7[20]; char *w, *tmp; NODE *ptr head, *ptr-tail, *ptr walk, *prev-pos, *temp; float offtime = 0.0; float pkjtime = 0.0; float total-pk-time = 0.0; float pk-rate = 1000000.0; /* initial pkrate */ float targetrate = 10000.0; /* initial target-rate */ float avrate = 0.0; /* peak time and token time */ float prevontime = 0.0; float period-on, period-off, prev-off-time; float totaltime = 0.0; /* time requires by each source to comple their transfer */ float totaldata = 0.0; float datainbits = 0.0; 74 float Win-length = 0.6; float T_front = 0.0; float Files_inTSW = 0.0; float Newfiles = 0.0; float alpha = 0.05; float word[30]; /*contain data from input file */ int ontime = 0.0; int period = 1; int workload = 0; int maxdata = 600000;/* representing maximum allowed file sizes */ int typ-offtime = 0; /* in sec */ int old_id = 1; int flag = 1; int i = 0; int nw = 0; printf("Enter input file name of data:\n"); scanf("%s", input-name2); strcpy(output-name,input-name2); strcat(output-name, "_out"); printf("input file name: %s \n",input-name2); printf("output file name: %s \n\n",output-name); if ( (fpin2 = fopen(inputname2,"r") ) == (FILE *) NULL) { printf("Couldn't open %s for reading.\n", inputname2); exit(I); 75 } if ( (fp-out = fopen(output-name,"w") ) == (FILE *) NULL) { printf("Couldn't open %s for writing.\n", output_name); exit(1); I else { strcpy(buff2,"typ-off-time"); strcpy(buff3,"total data"); strcpy(buff4,"pkjrate"); strcpy(buff5,"av-rate"); strcpy(buff6,"target_rate"); strcpy(buff7,"totaltransfer time"); /* output formatting */ /* sprintf(buff1, "%s %s %s %s %s %s", buff2, buff3, buff4, buff5, buff6, buff7); */ fputs(buffl, fp-out); /* write line to file */ fputs("\n", fp-out); /* add return character to line */ /* open file with data transfer info */ fp-in2 = fopen(input-name2,"r"); /* reading data from input file */ while (fgets(buff, MAXNUMBERCHARS, fpjin2) != 0) { 76 nw =0; w = strtok(buff, seps); /* find first word */ while (w) { /* parsing input line */ word[nw++] = atof(w); w = strtok(NULL, seps); /* find next word */ } typ-offjtime = word[0]; workload = (int)(word[1]); datainbits = 8*workjload; /* data to transfer in bits */ pk-time = datajin-bits/pkrate; } totaltime = Tfront + typ-off-time + pkjtime; /* calculating the the average transfer rate */ FilesinTSW = avrate*Win_length; Newfiles = Files_inTSW + workload*8; avrate = Newfiles/ (total-time - Tfront + Win_length); T_front = total-time; // time of the last packet arrival total_pk_time = totalpkjtime + pkjtime; sprintf(buffl," %d %d %d %d %d %f %f", typ-offjtime, work-load, (int)pk-rate, (int)avrate, (int)target_rate,total-time/60.0 ,total-data); fputs(buff 1, fpout); /* write line to file */ fputs("\n", fp-out); /* add return character to line */ I 77 fclose(fpin2); fclose(fpout); }// end else }// end main A.4 This module represents the uncontrolled ARCUP algorithm /* Author: Pierre Arthur Elysee */ /* Date: 9/24/00 */ /* Description: This module calculates the total time requires */ /* by each source to complete their data tranfer. It uses an average estimator skim along with */ /* real web traces. send av/target partion of data at target and I-av/target at peak rate when average > target, increase otherwise. Report the transfer time for each transaction*/ #include <stdio.h> #include <stdlib.h> #include <string.h> #include <math.h> /* all macros */ #define seps "" /* separator symbols for parsing */ #define MAXNUMBERCHARS 200 /* number of characters expected per line */ #define inibucket 0.0; /* initial content of bucket */ typedef struct { int w; struct NODE *next; } NODE; 78 void maino { /* variables */ FILE *fp-inl, *fp-in2, *fpout; char inputnamel[20], input-name2[20], outputname[20]; char buff[200]; /* hold on line of data from input file */ char buff 1[200], buff2[20], buff3[10],buff4[20], buff5[10],buff6[20], buff7[20]; char *w; NODE *ptr-head, *ptr-tail, *ptr walk, *prev-pos, *temp; float offtime; float pkjtime = 0.0; float totalpk time = 0.0; float pk-rate = 1000000.0; /* initial pkrate */ float target-rate = 10000.0; /* initial target-rate */ float avrate = 0.0; /* peak time and token time */ float prevontime = 0.0; float period-on, period-off, prevoffjtime; float totaltime = 0.0; /* time requires by each source to comple their transfer */ float totaldata = 0.0; float datainbits =0.0; float Win-length = 0.6; float T_front = 0.0; float Files_inTSW = 0.0; float Newfiles = 0.0; float alpha = 0.05; float word[30]; /*contain data from input file */ 79 int ontime = 0.0; int period = 1; int work-load = 0; int maxdata = 300000; int typ-offtime = 0; /* in sec */ int old_id = 1; int flag = 1; int i = 0; int nw = 0; char *tmp; printf("Enter input file name of data:\n"); scanf("%s", input-name2); strcpy(output-name,input-name2); strcat(output-name, "_out"); printf("input file name: %s \n",inputname2); printf("output file name: %s \n\n",outputname); if ( (fp__n2 = fopen(input-name2,"r") ) == (FILE *) NULL) { printf("Couldn't open %s for reading.\n", input name2); exit(1); } if ( (fp-out = fopen(outputname,"w") ) == (FILE *) NULL) { printf("Couldn't open %s for writing.\n", output~name); 80 exit(1); I else { strcpy(buff2,"typoffjtime"); strcpy(buff3,"total data"); strcpy(buff4,"pk-rate"); strcpy(buff5,"av-rate"); strcpy(buff6,"target_rate"); strcpy(buff7,"totaltransfer time"); /* output formatting */ /* sprintf(buffl, "%s %s %s %s %s %s", buff2, buff3, buff4, buff5, buff6, buff7); */ fputs(buff 1, fpout); /* write line to file */ fputs("\n", fp-out); /* add return character to line */ /* open file with data transfer info */ fp-in2 = fopen(input-name2,"r"); /* reading data from input file */ while (fgets(buff, MAXNUMBERCHARS, fpjin2) != 0) { nw = 0; w = strtok(buff, seps); /* find first word */ while (w) { 81 /* parsing input line */ word[nw++] = atof(w); w = strtok(NULL, seps); /* find next word */ } typ-offtime = word[O]; workload = (int)(word[1]); datainbits = 8*work-load; /* data to transfer in bits */ totaldata = totaldata + datainbits; if( typ-offjtime == 0) typ-offjtime = 3; if(data in bits > max-data) { pkjtime = 0.2*datainbits/targetrate; pkjtime = pkjtime + 0.8*datainbits/pkrate; datainbits = 0; } if(av-rate > target-rate && datainbits != 0) { pkjrate = 0.8*pk rate; pkjtime = data_in_bits/pk-rate; } else if(data in bits != 0) { pkjrate 11.2*pk rate; if(pkrate > 1000000.0) pk-rate = 1000000.0; 82 pkjtime = datainbits/pk_rate; } totaltime = Tfront + typoffjtime + pktime /* calculating the the average transfer rate */ Files_inTSW = avrate*Win_length; Newfiles = FilesinTSW + workload*8; avrate = Newfiles/ (total-time - Tfront + Win_length); T_front = total-time; // time of the last packet arrival total-pk time = total_pktime + pktime; sprintf(buff 1, " %d %d %d %d %d %f %f", typofftime, workload, (int)pk rate, (int)av-rate, (int)targetratetotaltime/60.0 ,total-data);// totaltime/60.Opktime fputs(buff 1, fpout); 1* write line to file */ fputs("\n", fp-out); /* add return character to line */ /* readjusting peak rate */ } while(fgets(buff,......) fclose(fpin2); fclose(fp-out); }// end else }// end main 83 A.5 Sample of data used in simulations taz 797447407 13352 "http://cs-www.bu.edu/" 2299 0.969024 taz 797447408 940773 "http://cs-www.bu.edu/lib/pics/bu-iogo.gif' 1803 0.629453 taz 797447409 941884 "http://cs-www.bu.edu/lib/pics/bu-label.gif' 715 0.326586 taz 797447476 527498 "http://cs-www.bu.edu/students/grads/Home.html" 4734 0.494357 taz 797447611 924579 "http://cs-www.bu.edu/lib/icons/rball.gif' 0 0.0 taz 797447639 206997 "http://www.cts.com/cts/market/" 9103 1.751035 taz 797447641 152490 "http://www.cts.com/cts/market/marketplace.gif' 20886 1.143875 taz 797447643 428176 "http://www.cts.com/cts/market/dirsite-icon.gif' 2511 0.590501 taz 797447644 349043 "http://www.cts.com/art/cts.gif' 1826 0.613857 taz 797447650 507322 "http://www.cts.com/~flowers" 318 0.599902 taz 797447651 164916 "http://www.cts.com:80/-flowers/" 3044 1.556657 taz 797447652 850779 "http://www.cts.com/~flowers/thumb.gif' 9256 2.654269 taz 797447679 477564 "http://www.cts.com/~flowers/order.html" 2865 0.743227 taz 797447680 342368 "http://www.cts.com/-flowers/thumb.gif' 0 0.0 taz 797447723 595449 "http://www.cts.com:80/-flowers/" 0 0.0 taz 797447727 341012 "http://www.cts.com/cts/market/dirsite-icon.gif' 0 0.0 taz 797447727 348313 "http://www.cts.com/art/cts.gif' 0 0.0 taz 797447735 567056 "http://www.cts.com/-vacation" 320 1.312128 taz 797447736 930528 "http://www.cts.com:80/-vacation/" 1168 1.020013 taz 797447738 92121 "http://www.cts.com/-vacation/logo2.gif' 5485 2.669898 taz 797447741 441969 "http://www.cts.com/-vacation/boxindxl.gif' 3828 0.682333 taz 797447742 561155 "http://www.cts.com/~vacation/boxindx2.gif' 3786 0.701705 taz 797447743 760975 "http://www.cts.com/~vacation/boxindx4.gif' 3936 0.757089 taz 797448806 36925 "http://worldweb.net/-stoneji/tattoo/mytats.html" 2898 0.181871 taz 797448806 314167 "http://worldweb.net/-stonej/tattoo/my-tats.gif' 7806 0.254429 taz 797448807 108445 "http://worldweb.net/-stoneji/tattoo/dragon.gif' 10878 0.313795 taz 797448808 675031 "http://worldweb.net/~stoneji/tattoo/cavedraw.gif' 3710 0.231060 84 taz 797448809 849560 "http://worldweb.net/-stoneji/tattoo/sm-arm.gif' 3198 0.297034 taz 797448823 624434 "http://worldweb.net/-stoneji/tattoo.html" 0 0.0 taz 797454701 532335 "http://www.hollywood.com/rocknroll/" 1837 0.880347 taz 797454705 926469 "http://www.hollywood.com/rocknroll/buzz.gif' 3841 0.400525 taz 797454706 750284 "http://www.hollywood.com/rocknroll/quote.gif' 3888 0.410293 taz 797454707 571245 "http://www.hollywood.com/rocknroll/sound.gif' 4181 0.425392 taz 797454708 420418 "http://www.hollywood.com/rocknroll/video.gif' 4000 0.437737 taz 797454709 257421 "http://www.hollywood.com/rocknroll/sight.gif' 3512 0.419571 85