>> Nikhil Devanur Rangarajan: Hi everyone. It's my pleasure to welcome Jelena Marasevic from Columbia University who has worked on fair packing problems and fast distributed algorithms. How are you Jelena? >> Jelena Marasevic: Okay. >> Nikhil Devanur Rangarajan: She's here until Friday in case you haven't met her yet, send an e-mail. >> Jelena Marasevic: Thank you for the introduction. I'm a PhD student at Columbia University. This is joint work with Cliff Stein and Gil Zussman. The work is on fair resource allocation and I will explain very soon what it is. Let's start first with some applications. When it comes to fair resource allocation, the most well-known problems are those coming from network congestion controller, network crate control. Indeed, this is where fairness problems have been really mostly studied. But there are many other applications where we care about fairness. More recently there are problems in resource management and resource allocation in data centers. You can pose almost any operations research problem as also a fair resource allocation problem. I'm showing healthcare scheduling, but there are also other instances of problems, like air traffic control or allocation [indiscernible] organs et cetera. One instance where fairness problems arise and it is not immediately clear that there exists a connection, our market equilibria problems and I will actually make this relationship more clear as we get towards the end of the talk. Let me start with the example where we care about fairness to a different extent. This is a classical example that is very often used in classes and communication that works. The setting is we have one very long path that uses n-1 capacitated links and we have n1 short paths shown here in orange that use only one capacitated link and each link has a unit capacity. The question we want to ask is how do we allocate flows in this network? How do we allocate rates to these routes? One thing is if we are doing anything reasonable we would do the same rate to each of the short routes because they just see the same constraints. They see the same condition. One objective we may want to have is just to maximize efficiency. In this case, efficiency means maximize the sound throughput. Take the most out of the network. If we were to do that we would give zero units of flow the long row and one unit of flow to each of the short routes. This gives us n-1 efficiency but it is very unfair especially the long row user was actually paying something to get something sent through this network. On the other hand if we really wanted to be as fair as possible, one thing that is not too difficult to see is that in that case we would just give everyone half a unit of flow. But what happens in this case to our efficiency is that we have lowered it down by a factor of about 2. >>: When you say maximum fairness, what is that? >> Jelena Marasevic: I know it's a little bit vague right now, but you can see here that you're actually using all of the links here to the maximum extent and you're giving everyone an equal amount of flow. The third type of objective that we can think of is if we are very fair, efficiency goes down. If we are very unfair our efficiency is better, so maybe we should want to look at some trade-offs. Intuitively, the long routes is using too many links. It uses too many resources, so maybe they should get something proportional to what they use. In that case we could assign one over n units of flow through the long row. n-1 units of flow to the short routes and our efficiency would actually at least asymptotic we close to the maximum efficiency. If we were to somehow parameterize fairness on a scale from zero to infinity then talking about the previous problems that I've shown you, the first problem should really get zero, because it has no fairness guarantees. The second allocation should be somewhere in this vicinity because it was as fair as possible and the third one should be somewhere in the middle. This is just like intuitively speaking. The questions of measuring fairness or measuring inequality are actually not very new. Back in 1997 there were questions asked about measuring inequality but in terms of inequalities of income distributions. The problem there was to rank different income distributions. The parameter also was called inequality aversion parameter. The functions that appear in this kind of work are a little bit curious and I'll tell you why on the next slide. 30 years fast-forward in the mean of network congestion control there is a definition of alpha fairness. What is nice about this definition is the lemma that that says there exists a family of concave objectives that we can maximize and actually reach this alpha fair resource allocation. These functions are the same functions as I have shown you on the last slide. They may not be very easy to remember. The way to think about them is as functions whose derivative is 1 over x to the alpha. A very high level intuition is that as x goes to 0 a derivative goes to infinity so you really want to push all the allocations away from zero. As alpha gets larger you really push the allocations away from zero to a larger extent. For the three examples that I mentioned at the beginning, the first example is actually the case of alpha equals zero. It is called utilitarian. In this case we only have linear objectives. The second case is known as max-min fair. Allocation which is the most utilitarian way of adding resources and the third example is known as proportional fairness and it happens when alpha equals one. The intuition about trading off efficiency and fairness is not only an intuition. There has actually been work in the last couple of years that quantifies this trade-off under different metrics. The problem that this talk is about is alpha fair packing. It is a class of problems where the feasible region is a polytope determined by positive linear constraints. The problems with such feasible region have been really extensively studied for linear objectives. But not as much, at least with convergence guarantees for this more general objective. These objective functions are concave so in centralized manner we know how to solve this in polynomial time. The focus of this talk is to look into distributed algorithms that have asynchronous updates. This is something that arises very often in practice and even if we didn't have a really distributed setting we could parallelize computations and get really fast algorithms. Yes? >>: If alpha is zero this would imply the maximizing of the sum of the laws, not the sum which on the last slide. >> Jelena Marasevic: If alpha is zero… >>: [indiscernible] >> Jelena Marasevic: When alpha equals one you are maximizing the sum of the logs. Alpha equals zero is the linear objective, yes. The main result that I will talk about is an epsilon approximation algorithm is very robust because it has many nice properties. It can run in distributed fashion. It allows asynchronous updates. It only reacts with the current state of the network. It makes local updates. It can start from any initial state, which also means that it is full tolerant and it can allow for a constant number of variables and constraints, insertions or deletions. The convergence time of the algorithm is poly-logarithmic in the input size and polynomial accuracy parameter epsilon. I will tell you more exactly what the convergence time is later after I introduce some notation. What I should point out here is that this is probably asymptotically that the dependence you should expect for this type of algorithm because at least for linear programming in this setting there are lower bounds. Looking at related work, when it comes to maximum fairness there has been a lot of work. One thing to note here is that when we have maximum fairness these problems are not anymore convex optimization problems. Where we get is a multi-objective problem where we're looking at the whole specter. These problems typically have more combinatorial structure. At the other end of our line there is working on packing linear programs. Most relevant to this talk is the work by Awerbuch and Khandekar. But of course if we have linear programming we can only support linear objectives. There is no straightforward extension of these results to the alpha fair setting. In terms of work on the network congestion control there has been a lot of work in the last almost 20 years. Most relevant to this talk is work by Kelly. What is interesting about this line of work is that there is no guaranteed convergence time as a function of input. The convergence time that is shown for these algorithms, they are usually continuous time algorithms, is that you reach the optimal solution after some finite time, but there is no guarantee that this happens in polynomial time as a function of input. More recently, there has been work on network utility maximization. This work can solve the problem I talked about after some scaling, but in general leads to convergence time that is at least linear in the input if not even polynomial. Talking to Nikhil, we have actually observed in one of the cases, actually alpha equals one, is equivalent to the problem of market equilibrium in Eisenberg-Gale markets with utilities and there has been work from Stalk [phonetic] 2013. What we observed there is the dependence on epsilon of this type of algorithm is better than this work, but the dependence on input is worse, so it is at least linear, whereas here it is only logarithmic. I will move now to talking about model in some of the preliminaries. First of all, this just makes the analysis easier. It's a very tender thing to do in linear programming just to scale all of the constraints how you scale them. First, you divide both sides of each constraint by the righthand side to get one on the right-hand side. You divide all of the constraint matrix elements by the minimum nonzero element and you scale all of the variables by the same amount. What you get when you do that is that actually in the scale problem any nonzero entry of the matrix is at least one. What this gives you is that for any feasible solution your variables must be between zero and one. Does everyone see this? Why can we do this? We actually showed that this preserves the approximation guarantees. If you want to scale back you will have the same approximation guarantees and the scaling is really just for the purpose of the analysis. The algorithm can be around on the original instance. >>: When you say approximation, do you mean the objective as well as a constraint? >> Jelena Marasevic: What I will show later is that when the algorithm runs, the constraints never get violated unless they were initially violated. But after polylog number of rounds they are always satisfied. What I mean is that the guarantee on the objective function is the same, so if multiplicative you get the same multiplicative. If additive you get the same additive. The model of distributed computation of… I will talk here in terms of sellers and buyers just for people who have worked in some market problems it will be easier to capture the main idea. For every variable we say that there is a node associated with that variable and we can call it a buyer. For each constraint there is also a node and we can call it the seller. Between the variable and the constraint there is an edge if the variable participates in that constraint, if it has a nonzero coefficient for such pairs. And adds a coefficient over an edge equal to the coefficient with which this variable appears in a corresponding constraint. The type of information that nodes have is every buyer, but we need on the variable side we need upper bounds on the global information. The information that this collected in each round is the price of the constraint variables. It would actually be enough just to collect the relative congestion. The variables collect information only from those constraints in which they participate. They don't need global information. For the constraint our seller sets the price that is a function of the global problem parameters and of the relative slack or relative congestion. I'm calling it relative in the non-scale problem it would be relative. Here it is just [indiscernible] slack. Is this model clear? If we were in natural congestion control problems, what this would mean is that we would need to in each round each node would need to collect the relative congestion on each of the links it uses on their path. Important for the analysis are certain KKT conditions. Just to get clear what the notation is with each constraint, we will associate Lagrange multiplier I will just refer to duel variable. If you ride the KKT conditions they are just a standard thing. You have primal feasibility, dual feasibility, complementary slackness and the fourth condition you get when you maximize the Lagrangian. This fourth KKT condition will actually be the most important one for the algorithm. Let me tell you what the algorithm is, but before I get there I want to give you some intuition. I'm writing at the top the KKT condition that I said would be the most important one. There are two algorithms that seemingly won't look so similar. On the left is the algorithm by Kelly. It's all from 1998. It is for a particular instance of alpha fairness. It is for alpha equals one, known as proportional fairness. The algorithm is a continuous time algorithm. How it works is you describe all of the updates in the network by a system of differential equations. One thing to notice here is that the updates over the variables are actually guided by any slack in this KKT condition. The dual variables are chosen as some unspecified monotonically increasing function, so for a dual variable i is the left-hand side constraint i. In linear programming we need some proper initialization. We start with some reasonable solution. This algorithm actually has discrete updates. What happens in each round is that the dual variable is set as exponential function of this relative slack of the corresponding constraint. What the algorithm does whatever the KKT condition on the topic is not satisfied, it makes multiplicative updates to get closer to making it satisfied. This is not completely multiplicative because there is a stepped increase delta j affects [indiscernible] are very small, because if they were zero we wouldn't be making any progress. When there is a decrease there is a multiplicative decrease. The convergence of these two algorithms was shown, and for the first one there was no really dependence on the input side. It's just a finite time conversion. A certain potential function was used called the [indiscernible] function. It's just a bounded monotonically increasing function. If you look at the potential function for the linear programming the first term looks similar and the second term not so much. In linear programming is quite standard to choose dual variables as exponential function of this constraint slack. If you use a similar idea for alpha equals one here, if you choose an exponential function here and you plug this in to the potential function you get a function of the same form. This was one of the first observations that we made when we started working on this problem and this somehow gave us the intuition that we can get good conversions to discrete updates with something that looks similar to the linear programming algorithm. The algorithm is indeed very similar to the linear programming algorithm. We don't need a real initialization. We only need to restrict each variable to some domain between delta j and one. If the variable for any reason goes outside of this domain we just put it back. The choice of the dual is only slightly different. We have some c in front of the exponent. One difference does not seem so important, really, is that when we make a decrease it is not always multiplicative. Because we are setting this lower threshold, we may be making a decrease that is smaller than multiplicative, whereas, in linear programming is always at least as large as multiplicative. This actually raises one challenge in the analysis. One thing to notice here is that for linear programming the variables are between zero and one. In the algorithm here they will be between some delta j and one. >>: Since the difference between putting the max here versus there is whether to bump it up to delta in the current step or the next step. [indiscernible] logarithm down a little bit, but in the next step will be back [indiscernible]. >> Jelena Marasevic: Yeah, but there is a reason why you cannot really do a step increase here. Because your KKT condition looks different, what you can show here by the stepped increase is that the value of the left-hand side does not change significantly. It changes by some small multiplicative factor, however, regardless of how the variables update. In this case you would lose that. >>: [indiscernible] >> Jelena Marasevic: You will make a step increase, but your entries here are allowed to become as small as possible. It doesn't mean they will go below delta. They will go back. >>: [indiscernible] because in the next up they will go back? >> Jelena Marasevic: Not necessarily. They can keep going down. >>: [indiscernible] this is a standard one [indiscernible] right? I don't understand. I just want to [indiscernible]. I don't know this paper by [indiscernible]. Did they give a [indiscernible] analysis for it? >>: [indiscernible] >> Jelena Marasevic: They get a very robust algorithm. They get all of these properties of self stabilization. They don't really get self stabilization but they get statelessness. They get solution is always feasible as long as the algorithm runs. >>: If you give up they will give you something about the average. Really you don't want [indiscernible] >>: [indiscernible] [multiple speakers] [indiscernible] >>: But still, that's what I'm saying. That's the basic [indiscernible] for the last 20 years. [multiple speakers] [indiscernible] >>: Maybe you will tell me later on why, I guess, but this… >>: That is I think one of the things that may be was [indiscernible]. >>: [indiscernible] maybe the constraint… >>: No. >> Jelena Marasevic: It gave somehow the right intuition what is going on in terms of the potential function. I don't remember… >>: [indiscernible] >> Jelena Marasevic: It is not. In this case you have the scaling here for the linear rhyming and for more a more general output it is not really as well kept. >>: [indiscernible] case of [indiscernible] comes from max min, right? Can you rewrite… So everything here is, can be rewritten in terms of derivatives of the soft max of the [indiscernible] in some sense? I mean the exponential potential subsets. Maybe we can talk about this later. >>: I was wondering, maybe it was being [indiscernible] [multiple speakers] [indiscernible] >> Jelena Marasevic: It was just an intuition. The algorithm, once again, the way to think about it is we are trying to satisfy this KKT condition which is duals as the exponential functions of the constraints. We look at the value of the left-hand side of the KKT condition. If it is somewhere around the right-hand side, this is just a fraction of epsilon. If it is close enough we don't do anything. If it is far enough, if it is like much smaller then we increase the xj to get closer in the interest of multiplicatively. If it is larger we decrease xj multiplicatively unless it goes below this threshold we have set. I will quickly tell you what the algorithm parameters are. They are a little bit complicated. I don't think that anyone should think about them for the rest of the talk too much, but just if you want to get the sense of what they look like. Some notation, this is really what you would expect from the notation, so w max is the maximum weight; w min is the minimum weight and A max is the maximum element of the matrix. The parameters are delta j's are really complicated things. There is a motivation for the choice of delta j on the next slide. We actually proved some lower bound that is each component of this allocation vector takes. But we end up choosing something that is much looser or at least [indiscernible] looser for technical reasons later. >>: I just wanted, for [indiscernible] also was one, right? >> Jelena Marasevic: It was zero. It was zero so you didn't even have this xj. >>: So in that case what are you setting your delta j to be? >>: Is a polynomial? What is it? >> Jelena Marasevic: You cannot go all the way down to linear programming with this. You need to be a little bit bounded away from zero, because the lower down goes to zero. >>: [indiscernible] in the parameters of your input, are they exponentially like… >> Jelena Marasevic: You are polynomially bounded but there is a dependence on the one overall for, so off I cannot go all the way down zero. That's one catch. So the c that multiplies the exponent is conveniently chosen. If you look at delta j, the only difference in terms of j is just this wj. When you raise the whole thing to the alpha and you divide by delta j you get the same thing. Cap is just one over epsilon times the log of the input. Gamma is epsilon over 4 and theta that determined this multiplicative updates that are one plus theta one minus theta are conveniently chosen so that the left-hand side of the KKT condition does not change too much. It changes by a factor of one plus minus gamma over four. We actually prove that if some x* optimally [indiscernible] alpha fair packing then each element affects is bounded from below as a function of the input. Of course, you don't need to grasp like all the lecture here in this equation. If you tried plotting this as a function of alpha, it is actually a continuous function. I don't know if the bound is tight, but the bound changes quite dramatically between zero and one. When alpha is greater than one there is much less change and as alpha goes to infinity you get something that is roughly 1 over n a max square. And i is number of nonzero elements in ith constraint. Just talking about the order of A max here. Let me move to the more fun part, convergence analysis. I'll give you a very high level overview of what happens. The first thing we show is that if we started from an infeasible solution we get a feasible solution fast. If we were at the feasible solution already, the algorithm will not make the solution infeasible in any round. The second condition we get for free just by choice of the duals. The third condition for complementary slackness show with holds in approximate and aggregate sense after some polylog number of rounds and that is actually sufficient. These first three KKT conditions are in some sense preliminary this. The most work happens about showing some things about this fourth one. The way the proof of convergence works we choose a bounded nondecreasing potential function and you won't be surprised what it is. Then we define some stationary intervals and show that if we are in a nonstationary interval, then the potential increases significantly. If we are the solution is epsilon approximate. The first lemma says if we started with a feasible solution, we remain feasible and I want to go just quickly over proof because it is relatively simple. It appears in a similar form in Awerbuch and Khandekar this one and something that will appear in two slides from now. What I want to point out if I gave you the parameters you could go line by line and get the same proof. But one of the challenges is really finding these parameters and making them work. The proof works as follows. You select the first round in which the solution becomes infeasible. Some notation, we just denote by x0x right before the updates that made it infeasible the next one right after the update. Now the only way the solution could have become infeasible by the way the algorithm works the only constraints that we could violate our the packing constraints. This became greater than one. For this to happen at least one variable that appears in the constraint has got to increase. How would it otherwise have gotten larger than one? For the variable to increase we just by the way the algorithm works, we had to have this for the KKT condition. From one round to another the way that multiplicative updates are chosen, this term can increase by a factor of at most one plus gamma over four. Combined with the previous slide this gives you something that is actually even strictly less than omega j. On the other hand, if you want to bound this from below, you just select one term and then you choose the term in which you have the dual that corresponds to the constraint that got violated. Since we did the scaling Aij since it's nonzero is greater than or equal to one so we can take it out. We have that xja must be greater than or equal to delta j, again, by the way the algorithm works and we just write out the yi's there the way we chose the this is where we need the delta j not to be zero. This thing is at least the wj. The constraint got violated, so this is greater than zero so the whole thing is greater than wj and we got the contradiction. >>: [indiscernible] delta j. That's why it's not a problem in your program because alpha is [indiscernible]? >> Jelena Marasevic: Yeah, because it doesn't show up there. The next lemma shows if we started with an infeasible solution we reach a feasible solution relatively fast. I don't want to get into all the details of the proof but just how it works is after at most one round if this effect is what violated the feasibility it becomes positive after at most one round. So the only thing that could get violated, that remain violated our the packing constraints. The things we show here is that none of the variables that appear in this constraint decrease, so they cannot bring this down. All the variables that appear, this should be actually increase. None of the variables that appear increase. All of the variables that appear in there are greater than one over n a max decrease. Since they decrease they are large enough to decrease multiplicatively after just log base one minus theta. In one or n max they will get down below one. So combined with previous lemma we get that after some point the solution is always feasible. Another thing to point out is that you don't really have this in linear programming and it is not too difficult to show it that there you need to start from a feasible solution to remain feasible. For the complementary slackness our third KKT condition we show that after some polylog number of rounds we get that these codes in approximate and aggregate sense and so the actual all the KKT conditions are written for each yi, so just noticed that this is written over the sum of yi's and it is approximate. So what is left to deal with is just the famous for the KKT condition. Let me tell you what the potential function is. This is just a reminder of what the KKT condition that I was mentioning was, what the algorithm was and what [indiscernible] are. The potential function is just a more general version of what we had for the algorithms I gave us intuition. The intuition about why this potential function makes sense is that if you look at partial derivatives with respect to xj's and you just group this conveniently when you get here is just a slack of your fourth KKT condition. That's what guides the updates. If some xj increases it must be because this term is actually positive, so the potential function increases. If some xj decreases, it must be because this term is actually negative and the potential function increases once again. The idea is that whatever you do, wherever updates you make throughout the algorithm execution, potential function never decreases. The main idea for the rest of the proof is since we have the fact that each xj is bounded in some interval, what you can show is that there are bounds also for the potential function. They may be polynomially large, this gap may be polynomially large even exponential in alpha, but it is bound. The algorithm makes updates as long as at least one KKT condition is not approximately satisfied. If you want to analyze this as long as algorithm makes updates it may take a very long time before algorithm stops making updates. Actually the convergence is well the algorithm may have actually converged before it stopped making updates. The type of the convergence that we get is that after at most polylog number of rounds at least one round holds and epsilon approximate solution and the total number of rounds where we don't have and epsilon approximate solution is bounded by the same term. >>: [indiscernible] forever, so [indiscernible] you ran for this long and then after this [indiscernible]? >>: So you cannot make anything over anyone particular around? >> Jelena Marasevic: Yes. I mean you can also ask a question why don't we stop after we reach this state. So if you are running algorithm in parallel you could, but here you don't have a global coordination. >>: What do you mean by [indiscernible] >> Jelena Marasevic: You'll see in the next slide. For alpha less than one it's one plus epsilon multiplicative in this many rounds. We just need to make epsilon small enough. For alpha equals one w is sum of weights times epsilon. And this many rounds here why is it one minus epsilon alpha? In this case the objective is actually negative always. I intentionally didn't put alpha in the convergence time bound. You will see in the next slide that you don't really want alpha to be very large or at least you shouldn't expect for a very large alpha to have a very fast algorithm. If you look at what these functions look like for different values of alpha, this is why there are three proofs for these three cases of alpha. When alpha is zero it is just a linear function. As alpha increases to one this function, you know, it becomes a little bit more curved. The gradient becomes larger and larger like closer to zero. As you get really close to one it goes all the way up to the infinity. So it has the same shape as alpha equals one but it is translated all the way up to infinity. For alpha greater than one, again, you have the same shape of the function as in alpha equals one but as you approach alpha from above this function goes all the way down to minus infinity. What happens and the video will show alpha increasing from something close to 1 to 100 in steps of one. What happens is this function becomes really, really steep really, really fast. When alpha equals 100 it almost looks like a step function. This is the reason why at least using some of the conventional methods you wouldn't really expect to have a very fast algorithm because at least for the first quarter methods you need either that the gradient is bounded or that the gradient doesn't change too much. I'll do a quick proof sketch for alpha equals one. I should have mentioned that some of the preliminary results are in archive. They don't contain this part. This is in part one is the reason I am talking about this. But they will be posted relatively soon. A heads up, I will assume I start from a properly initialized solution. I don't really need this. It is possible to extend the proof to start from any initial solution, but I just don't want to complicate things too much. Since potential never increases, where he start from is actually the minimum potential. Delta j is really small so you have to have a huge slack in each of the dual variables. You will get something that the order some of these weights times log. The maximum potential happens where is bounded from above by x when x is one. What you get in that case is just zero. The total increase in the potential is some of the weights times log. To get the convergence bound if you want we need to show that between each stationary interval the increase, we will actually here have stationary rounds, that the increase is this w possibly times some polynomial in epsilon and possibly over some polylog of the input. This is what we are shooting for. One thing that we need to show is, I won't go over the proof, but I have mentioned at the beginning that there is a problem when we are not making multiplicative updates. I will call the variables that they are close to delta j. If we were to make decrease we wouldn't do it by a multiplicative one minus theta. I would pull those variables small. The other variables I would pull large. What the next lemma says is that the increase in the potential due to decrease of small variables is dominated by the increase in the potential due to decrease of large variables. So whatever small variables do is dominated by what the large variables do. Is this clear? I'm stating that this one lemma is at least two just to give you the actual idea. For the increase in the potential we show the following results. x plus is just those xj's that increase and we have like the same notation for before and after an update. We showed this results so let me just tell you that here if we have a large gap in our KKT condition, if the gap is at least one +2 gamma we would have gamma times this increase. We will see it soon. Let me remind you that the sum of weights is capital W. We define a stationary round in this way. Why? One thing we show is that actually these stationary rounds are going to give a large potential increase. If a round is nonstationary, in case this first part of the definition is violated, we need both to hold for the definition to be valid. What happens then is that we just get that the potential is greater than or equal to w over tau. I will remind you what tau is in a bit. It's the second condition doesn't hold then just by combining these two things we get that the increase is actually gamma times w. I will remind you that tau is just the order of log squared over epsilon squared. What we get in the nonstationary round is that we have indeed a large increase in the potential and if you recall what our whole increase in the potential was combining these two things, you will actually get polylog in input over polynomial in epsilon. This is just the main idea. Is it clear? Another thing that I'll tell you is how we show that the solution is actually epsilon proximate. We show that by looking at the duality gap we show that in each stationary round this is bounded by some constant times epsilon times capital W. The right-hand side term is bounded by using approximate complementary slackness which was one of the preliminary lemmas and the second part was the stationary round definition. The left part is bounded by using part one of the stationary round definition and actually a lower bound than this term that I won't get to going to, but this is just what the main idea is. In the rest of the time I guess I have only a few more minutes. I just want to get to the relation between these sorts of problems and some of the markets. I expect most of you know what Fisher markets are, but I will nevertheless go over it. In Fisher markets we have buyers on one side and goods on the other. There are bars are index A by j and goods by i. The xij is the amount of goods i allocated to a buyer j. Every buyer has some money. It gets some utility over a bundle that gets allocated to them and there are some prices of goods. The markets equilibrium of these problems is captured by EisenbergGale conducts program that looks kind of similar to our alpha equals one case. Eisenberg-Gale markets were introduced in 2007 by Vijay Vaziriani. They are just a generic generalization of Fisher markets where a buyer may be interested only in a subset of goods and for some subset of goods may want goods in some specific ratios. These are actually just the markets that solve Eisenberg-Gale type convex program which is just a more general convex program than the previous one. If you want to interpret alpha equals one case as a market you could do it by choosing linear utilities. We have that each buyer wants only a specific subset of goods and in specific ratios. I will borrow and interpretation from Vaziriani which is building a product. Here there is a number of goods and one buyer maybe wants to make a cake here. The buyer needs the goods in specific ratios, maybe one third should be flour, one fourth eggs, one fourth sugar and whatever is left is cherries and the buyer doesn't really want eggplant. They want to make as many cakes as possible but they need goods in specific ratios. Of course there are other buyers more interested in other subsets of goods. For me the easiest way to look at the connections between these three problems is by looking at network flow problems because this is actually where the alpha equals one came from originally. If you want to interpret these problems as network flow problems the problem is as follows. For each variable you have a source and sync pairs and a fixed number of paths and you fix flow over paths in specific ratios. Your location is the total flow between the source and sync pairs. Eisenberg-Gale markets are a more general version of this where, again you have a source and sync pairs but you can split flow over the paths anyway you like. For linear utilities you would be fair in terms of the total flows, but you can have some other utilities. The Fisher markets are a special case of EisenbergGale markets but not the problem above, where once again, you can split the flows in any way you like over the paths but only one edge her path is actually capacitated, whereas, in these two cases there can be arbitrarily many edges that are capacitated. To summarize, I have talked about a fast, distributed and very robust algorithm for the class of alpha fair packing problems. This problem arises in many applications. Some open problems are whether these techniques could apply for ongoing Eisenberg-Gale markets. This could be important, for example, for the development of automated online markets. Another question is whether some of these techniques apply to or extend to other types of convex problems. So that's it. [applause]. Are there any more questions? I think I'm exactly on time even though we started 3 minutes later. No? Okay. Thank you.