Selfish Routing on the Internet Quan Chu, Brent Wasilow 1. Abstract With the ever-increasing growth of networks, specifically the Internet, it has become invaluable to be able to understand, analyze, and model networks appropriately. Networks are inherently large, fast growing, complex, lack regulation, and perhaps more importantly, deal with human interaction. In order to properly analyze networks we must apply probabilistic approaches, namely game theoretic tools and behaviors. This refers to the concept that because humans are interacting amongst other components/humans on a network they are inherently selfish and interact in such a way that provides the largest benefit for themselves. This can be related to a set of people playing poker. Each player has a unique strategy that benefits themselves while taking into account the other players in the game. What we are most interested in is the cost of being selfish in network routing. We define the cost as the probabilistic upper and/or lower bound of the user routing on the network. Specifically, we define two overall models: the task allocation model and the routing flow model. The task allocation model deals with the processing of tasks on machines, in this case packets. The routing flow model deals with routing packets on the network. It is important to note that while these models seem different they tend to blur together as the processing of packets in turn effects the routing of the packets and vice versa. We ultimately show that selfish routing on a network, specifically the Internet, has clearly defined upper and lower bounds as dictated by the specific scenario in question. 2. Introdution Networks are an ingrained product of the world's socio-economic structure. Therefore, it is important to understand proper network analysis techniques as a tool to predict, optimize, and better provide security. There are two approaches to network analysis: the classical model and the game theoretic model. The classical model is concerned with simplifying the network in order to better analyze its characteristics. Generally, this takes on two forms: regulation and/or cooperation. Regulation can be thought of using the typical example that users must use specific protocols in predefined and exact ways on controlled and/or known components. Cooperation refers to the idea that users on the network are all coordinating their efforts in order to provide the most optimal solution. What is important to note is that neither of these assumptions is realistic. The protocol definitions are not exact and it is not uncommon for packets to be flagged in intrusion detection components since protocols can be used in manners as defined by the user. Cooperation is also extremely unlikely. Users on a network, especially one as vast as the Internet, are generally not aware of the other users, hardware, or general components they will be interacting with. Moreover, users are only concerned with their optimal cost of operation: speed, price, etc. Therefore, it is wholly unrealistic to analyze current networks in this fashion; note that this places perspective on the fact that classical analysis is perfect for analyzing simplistic and heavily restricted networks. The more current trend in network analysis is not concerned with assumptions, but instead with looking at network communication as realistically as possible. This approach focuses on probabilistic methods since we cannot account for every possible scenario or observable situation; the hidden markov model is a perfect example of this. Specifically, researchers have focused on applying game theoretic models to better understand routing and communication on a network. The connection is evident when we look at typical user behavior on the Internet. For example, a user may connect to a web server in order to shop online or download information/data for personal reasons. We notice that in order for the user to access this web server a router must appropriately route the packets along a specific route. The router is aware of other routers along the way, but it is not aware of the traffic on those routers or the traffic that is currently in route. Therefore, the router will act selfishly in whatever manner that it can. This is the method of analysis that we will focus on as it provides a modeling process for network characteristics that relates to realistic scenarios. To better understand this survey, we will introduce the following background materials in Section 3: Nash equilibrium and the coordination ratio. These two terms are critical for understanding the cost of selfish routing on a network. Section 4 will provide a brief discussion of the two types of models that will be discussed throughout this survey paper: the task allocation model and the routing flow model. Section 5 will explicitly define the cost of selfish routing for the task allocation model. Similarly, Section 6 will perform the same analysis but for the routing flow model. Finally, Section 7 will provide a comprehensive conclusion. 3. Background This survey on selfish routing on the Internet makes use of two important concepts: Nash equilibrium and the coordination ratio. The Nash equilibrium is defined as the state for which no user sees a benefit by changing their strategy. This can be illustrated with the following simple example. Suppose that two hunters, A and B respectively, take a trip into the woods in order to catch some food for the evening. There are two options for food: a deer or a rabbit. Obviously the payout for a deer is considerably higher than that of a rabbit, assuming a payout of 10 for the deer and 1 for the rabbit. However, the difficulty of successfully catching the deer is also much higher in comparison to the rabbit. We can assume the following four scenarios: (1) hunter A hunts the rabbit while hunter B hunts the deer, (2) hunter A hunts the deer while hunter B hunts the rabbit, (3) both hunter A and hunter B hunt the deer, and finally (4) both hunter A and hunter B hunt the rabbit. What we notice is that both scenario (3) and scenario (4) are Nash equilibria. The reason is such that both hunters are on equal playing ground. Neither hunter wants to change their strategy because it will give the other one an advantage. You may notice that in this scenario it would be possible to change scenarios by coordinating their efforts and therefore possibly benefit. However, in reality players in a game do not have full awareness of every aspect, and coordination is generally not a feasible option. This is important because in a network system we may be aware of other users on the network but we may not be aware of their strategies or perhaps other components and their specifications (e.g., processing time, bandwidth, algorithms, etc.). The connection can be made that a user will play their most optimal strategy while trying to keep the other users in mind. This is the definition of selfish operation and is the aspect we look at most in proceeding sections. The reason for the focus on the Nash equilibria in a system is that this state of operation is the one that system's generally coalesce toward. Naturally, a set of users/players in a scenario, when acting selfishly, will merge toward scenario (4). However, it is important to note that this is not always the case and more optimal Nash equilibrium can still be achieved probabilistically. However, the general model for the lack of cooperation states that the users will converge toward the suboptimal Nash equilibrium. This is the sole focus of this paper and is partially defined as the coordination ratio. We define the coordination ratio as the ratio between the suboptimal Nash equilibria and the social optimum. The social optimum is defined as the most optimal scenario when the users in the system coordinate. Ultimately, this value gives us the ratio between non-cooperative behavior and cooperative behavior. Therefore, we define this as the cost of being selfish. We are concerned with this value as it will define the upper and lower bounds for the cost of a given network scenario. A key concept to note is that in certain situations we can show that the cost of coordinating from selfish behavior to cooperative behavior may not be the best option and therefore selfish routing can ultimately provide an optimal cost function. 4. Traffic Models The purpose of this paper is to analyze the coordination ratio of traffic in networks. There are two general approaches: (1) the task allocation model and (2) the routing flow model. The task allocation model considers the problem of allocating scheduling tasks or the processing of packets to a node on the network. It is important to note that this model focuses on not being able to split the traffic and therefore it must be completed as a whole on a single node. More specifically, this model focuses on the congestion that is provided throughout the network due to users routing/processing their packets selfishly on the network. The second traffic model we will focus on is the routing flow model. This model focuses on the problem of physically routing packets through a network that consists of nodes and links. In this model we do not assume the constraint of the previous model and therefore allow traffic to be split amongst nodes and links. It should also be noted that although these two traffic models appear to be distinct concepts, they in fact tend to blur together. This can be demonstrated by the concept that the coordination ratio, or the cost of being selfish, is dependent on the processing on the part of the nodes, as well as the route chosen. Therefore, both models will show similarities in their explanations. Section 5 will discuss the task allocation model and proceed into Section 6 on the discussion of the routing flow model. 5. The Task Allocation Model We can begin by defining the Load of Machine j. Specifically, this can be defined using the equation below: It is important to note that w refers to the weight of the packet k on machine j. This number is arbitrary and can be anything. Sj refers to the processing speed of machine j. We notice that this equation is intuitive in that we sum the weights of all of the packets on our node and divide that value by the processing speed of machine j. If we have five packets with equal weights of one and a processing speed of one, then it will take five units of time to process the packets on machine j. Next, we can define the social optimum as previously discussed in Section 3. The social optimum is defined using the equation listed below: In an intuitive sense, this equation states that we choose the maximum processing time of our machines in a given scenario. We choose the maximum because if we have two machines processing in parallel, we cannot complete the job until the slowest one has finished. We then select the minimum processing time out of all of the possible scenarios for processing. This will give us the smallest cost of operation. We define this as the social optimum because we are using a deterministic model, which allows us to coordinate all of the different scenarios and choose the least costly one. It is important to note that we define this problem as NP-hard as it is a modification of the partition problem, that says how do partition packets into a machine so that the sum of one partition is equal to the sum of the other. We can illustrate this equation more concretely using an example. Specifically, imagine two machines with equal processing times of one. Assume that we have 10 identical packets that need to be processed on these two machines in parallel. Out of all of the scenarios that are possible (e.g., placing 2 packets on one machine and 8 on the other, etc.), the most optimal solution is to place 5 packets on each machine. What we previously discussed outlined the concept of deterministic modeling. That is to say, that we know the potential outcomes and are able to coordinate to find the lowest cost, specifically the social optimum. However, what happens when we take into account non-cooperative and hidden behavior. Specifically, we know that users do not care about others and instead perform their own operations. Each user knows their machine and what they want to do, however they are usually unaware of the vastness of the rest of the network and more importantly the specifics of the hardware. Therefore, we focus on analyzing the probabilistic aspect of network traffic. We will define this to be our Nash equilibrium and specifically look at the worst-case scenario. We can now define the first probabilistic value in our task allocation model as illustrated below: We define this value to be the expected load on machine j. What this means is that given a machine j, there are specific probabilities that a packet will be placed on machine j. This probability is multiplied by the weight of each given packet, to which each is summed together and averaged based on the speed of the machine. We can illustrate this more specifically using an example. Given that we have two packets that need to be processed on two machines, we know that each has a ½ probability of appearing on a specific machine. We know that in a deterministic model we would have full access to the scenarios possible in our network configuration. Given this example, we know that if want to place a packet on our machine there are only two possibilities: thee other packet is on that machine already, or it is not. Because we do not know we define this probabilistically as we would say a coin toss. We do not know for certain. In fact we know are only 50% confident. Therefore, we say that the expected load on machine j in this scenario is 0.5. If the packet was being processed on the machine then the load would be 1.0. If it was not being processed then the load would be 0.0. Therefore, we can say that the expected load is 0.5. Now that we have defined the expected load on machine j, we can now define the expected cost of placing that task on the machine. Given the previous scenario, we know that the expected load on machine j was 0.5. If we actually add our packet to the machine then we must add the cost of processing our packet. Therefore, the total expected cost would be 0.5 + 1.0 = 1.5. This is depicted by the equation below: Specifically, we are defining the expected cost of adding task i on machine j as the expected load on machine j plus the cost of task i. We are now able to define the worst-case Nash equilibrium given our previous definitions. We can define the worst-case as the maximum of all of the potential expected costs that are possible for our situation. This is defined using the equation below: Therefore, as defined in the previous scenario, our worst-case Nash equilibrium is 1.5. This is the cost of being selfish in a network in which we must probabilistically define what is happening. Specifically, we can see that of the potential scenarios, we have defined the most probable outcome and the one that is also the worst. If both packets of traffic are placed on the same machine then we have a processing time of 2. If one is placed on the first machine and the other is placed on the second machine then we have a processing time of 1 since they process in parallel. Therefore, we can see that the probabilistic outcome is roughly 1.5. If we expect our load on machine j to be empty, that is 0.0 then adding task i to machine j would dictate a processing time of 1.0. This would be the most optimal scenario, for which we previously defined this as the social optimum. It is also the most optimal Nash Equilibrium as both users would see no incentive to change their strategy. However, this is unlikely to be the case. Now that we have defined the worst-case Nash Equilibrium, we can define the coordination ratio, or the cost of being selfish. This is defined as below: The coordination ratio is defined as the maximum value of all of our potential scenarios for a system over the social optimum. This will provide us with the worst-case scenario and generally most likely cost of operation when users are being selfish. It is important to note that the worst-case coordination ratio can be defined explicitly depending on the organization of the network. Specifically, we have defined that 2 identical machines processing a given number of packets has a coordination ratio 0f 3/2. What is also important to note is that we can define the coordination ratio for 2 machines, not necessarily identical, as having a value of at least 1.618 (i.e., the golden ratio). Moreover, we can further define an arbitrary number of machines processing an arbitrary number of packets to have an upper bound of logm/loglogm. This is a very common upper bound and is illustrated most famously by the balls in bins problem. This states that we can define a bound for placing n balls in m bins. 6. The Routing Flow Model In this second part of the paper, we will discuss about another model of the costs function. We start out with defining the model that we use for this flow function, we assume that each network user control a negligible fraction of the traffic and modeled as a flow. Each user in the network can be, for example, a packet in a small LAN network travelling through the network interfaces. Let N = (V,E) be the network with V nodes connected by E edges. It is a directed network with source and destination pairs {si, ti}. The network itself is multi-edges with no self-looop. The sets of paths form si to ti be denoted by Pi. The flow function is f : P→R≥0, for a fixed flow f and for any edges in E, we can define fe = Sum(π∈P:e∈P)fπ. The traffic rate r denotes the amount of flow to be sent from node si to ti. A flow f is feasible if for all i in k, Sum(fπ)=ri. Also, we can assume that each edge e in E has a load-dependent lacenty function L(e). L(e) is nonnegative, differentiable, and also nondecreasing. The latency of a path π with respect to flow f is defined as the following function: Thus, the cost function C(f) of a flow f in N will be the total latency inccurred by f and will be defined by: Next, we will be talking about flows at Nash Equilibria. Because we have assumed that each agent in the network carry a negligible fraction of the overall traffic, we can also assume that each of them chooses a single path of the network. The strategies using in finding Nash equilibrium will be pure strategies. For any flow f that is feasible for instance (N, r, l) is at Nash equilibrium if it holds the following condition At Nash Equilibrium, there is not existed any user that has an incentive to switch paths. This means that switching to another path from an existing path will not bring any benefits to that user even though the current path might not be the best one out there. However if an user switch to another path, he will suffer more loss rather than gain anything. When flow f is at Nash Equilibrium, the cost function will be defined as followed. Basically the cost of flow f in Nash Equilibrium is the sum of all the time it takes for all the paths in flow f. Now we can look at the similarities as well as differences in Optimal flows and the flows at Nash Equilibrium. First we have to look at the Optimality condition. A feasible flow f is an optimal solution when the cost function of f is also minimal. We also need to assume for each edge e in E is convex so that the cost of f is convex. Theorem 42.9 The latency function l(x) is standard if it is differentiable and the cost function is convex on [0,∞]. Many but not all latency functions are standard. The optimal flow can be found in polynomial time using convex programming. For any instance (N, r, l) with continuous and nondecreasing latency functions admits a feasible flow at Nash Equilibrium. The cost of all the flow f (which is at Nash equilibrium) will be equal to each other. Thus explaining why no network user has the incentive to switch to another path. Next, we will talk about the unbounded coordination ratio. Without making any assumption about the latency functions, the coordination ratio will be arbitrary large. Let’s look at the Pigou’s example in which there is a source node s, the sink note t, and the latency function l1(x) = 1 and l2(x) = x^p This is a simple network with only 2 nodes, and 2 paths from the source to destination. Each path has its own latency function. Let us consider the instance (N, r, l) we can allocate flow q on the 1st link and 1-q flow on the 2nd link. This scenario will result in the cost of that flow to be q*1 + (1-q)* (1-q)^p = q + (1-q) ^(p+1). The flow at Nash equilibrium will allocate the entire flow to the 2nd link and the cost will be 1. If any positive amount of flow has been allocated to the 1st link. Then we had l2(f) > l1(f) = 1, which is l2(f) > 1, this contradicts the assumption that not the entire flow was allocated to the 2nd link. If we apply the above theorem 42.9 mentioned earlier, we can find the optimal flow using the marginal cost function. The marginal cost for l1*(x) is still 1, the l2*(f) is (p+1)x^p. At Nash equilibrium we have to allocate positive flow to both links, and the latency of both flow on both links will have to be the same. Therefore, for optimal flow we allocate 1 - (1+p) ^ (-1/p) to the 1st link and the rest to the 2nd link. The minimal cost of the optimal flow for instance (N, 1, l) is From the cost function we can obtain the coordination ratio to be This coordination ratio will tends to grow to ∞ with increasing p. Therefor, the coordination ratio can be really large even for polynomial latency functions. Next we will discuss about the routing in arbitrary networks. Let’s consider the simple network in Pigou’s example with p = 1. at Nash Equilibrium we can allocate the entire flow to the 2nd link and in optimal flow, we can allocate ½ flow to the 1st link the ½ flow to the 2nd link. Therefore, the flow at Nash Equilibrium has the cost of 1 and the optimal flow has the cost of ¾. The coordination ratio is 4/3. Among all the networks, the worst coordination ratio is when the networks consisting of two nodes connected by a parallel link, sometimes with two or more. Let a class L of latency functions to be standard if it contains a nonzero function and each function l in L is also standard. In many cases the coordination ratio can become the worst possible coordination ratio. The anarchy value that makes the coordination to be the worst one can be defined as follow where λ is a solution for the marginal latency function Theorem 42.11 Class L is diverse if for each positive constant c>0 there is a latency function l in L with l(0) = c and class L is nondegenerate if l(0) <> 0 for some l in L. a network is called a union of paths if the network can be obtained from a network of parallel links. Next we can look at some coordination ratios for some basic classes of latency cost functions. For linear function in the form ax+b as we see in pigou’s example, the worstcase coordination ratio is approximately 4/3 ~ 1.333. For quadratic function ax^2+bx+c, the worst-case coordination ratio is approximately 1.626. The polynomials of degree < p as we see in the unbounded case, the worst-case coordination ratio is (omega) (p/ln(p)) which tends to grow to infinity as p grows larger. Applications of theorem 42.11 can be the analysis of the coordination ratio for large classes of latency functions. In the previous section, we made an unrealistic assumption that there are an infinite number of users, which control a negligible fraction of the over traffic. In reality, this number is finite and each user must select a single path for routing in the network. As the result, the model of unsplittable flows not only the coordination ratio maybe unbounded but also the bicriteria bound. A flow at Nash equilibrium can be arbitrarily more costly than the optimal instance by increasing the amount of flow each agent must route. By having the coordination ratio and the cost function be the base for our analysis of how much loosely we can allow the traffic to be allocated in a particular networks. We can also use these functions to analyze how fair or unfair can optimal routing can be rather than having them in Nash Equilibrium. The overall cost for optimal routing will be much better than the one of Nash Equilibrium. However, network users need to sacrifice some of their own performance to improve this overall cost. In the paper, Roughgarden defined the unfairness by u(N, r,l) as the maximum ratio between the latency of a flow path of an optimal flow for (N, r ,l) and a flow path at Nash equilibrium for (N, r, l). For standard latency functions a better bound is achievable. Prior discussion shows that the coordination ratio can be arbitrary large. There are many ongoing researches on how to design a network that limit the coordination ratio to be relatively small. The question is whether it is possible to modify a given network so that we can obtain the best possible flow at Nash equilibrium. It is well-known that removing the edges from a network may improve the performance. The Braess’s Paradox explains this phenomenon. Given a network with n nodes, edge latency functions, a sing source, rate of traffic. For networks with continuous, nonnegative, nondecreasing latency. In his paper, Roughgarden showed that ther eis no (n/2 - e) approximation polynomial time algorithm unless P=NP. On other hand, there is an n/2 approximation polynomial time algorithm for this problem; for linear latency cost functions, the bound is 4/3. 7. Conclusion Given the complexity of current networks, specifically the Internet, it is important to understand how to analyze user behavior on a network. Specifically, we discussed the probabilistic approach of analyzing the worst-case scenario of selfish behavior. We defined this as the cost of not worrying about other users in the system and only acting such that you receive the best benefit for yourself. This is in contrast to the most optimal solution, which is brought about through coordination. We are however more interested in realistic approaches for which selfish behavior is at the core. We defined two separate models for analyzing traffic on a network: the task allocation model and the routing flow model. Both provided insightful information on the bounds for the cost of being selfish. That is to say that depending on the situation, the price of being selfish is not necessarily expensive in comparison to trying to coordinate behavior, which comes with its own costs.