Document 17865194

>> Yuval Peres: Good afternoon, so a lot of us study various objects on graphs [indiscernible] and so with its particles, random walkers, spins. But this time we’ll hear about bandits on graphs from the expert Nicolo Cesa-Bianchi, please. >> Nicolo Cesa-Bianchi: Thank you, I’m mostly going to talk about experts as well. [laughter] This is a, some work I’ve done in recently. It’s, I’ll start from scratch so you will not need a lot of background knowledge to understand this. We’ve been talking about sequence of decisions. We will only deal with the nonstochastic or of the sarian models. This is the traditional prediction with expert advice or nonstochastic bandit model. The model is very easy to specify. Its very bare bones so you have just K actions that you have to, and each one of this action let’s say will give you a loss when you select it. But you can’t avoid it. You have to at every time step in your sequence of decisions you have to decide which action to pick. You will incur a certain loss associated with your decision. We’ll assume that everything is a nonstochastic so there is some assignment of losses to actions over time. There’s some numbers L I T and we’ll assume that these numbers are just in the unit interval for simplicity. These are, this is the loss for playing action I at time T. Okay, the idea is that some unknown but deterministic process has laid down this sequence of losses. Okay, for each time step or for each action. This is action I ranges from one to K and T is the time which is discrete. Okay, and you don’t know, you have no previous knowledge about this so you don’t have any prior about this sequence, this metrics of losses that are assigned to the actions. Basically the goal of the player which is playing this game is to pick at every time step T. The player has to pick some action I T in the set of actions and will incur the loss associated with that action, okay. The player will typically use a randomization to perform this choice. The, at time T the choice of the action I’s of T of the index of the action will be based on some previous observations which refer to losses of actions that have been observed. Okay, so we are to make this precise. We get to these, to the observation, the so called observation models. Okay, so there are two very well known, well studied observation models which are the experts and the bandit model. Let’s see if at every time step I pick I T and observe, okay I have two options here. In the experts model I observe the entire L 1 T, L K T. I observe the entire vector of losses associated to all the actions at that time step. Okay, so I pick action I’s of T and incur the associated loss but I get to see all the losses of all actions. This is called experts model. Think of investing in a number of assets. You can see the performance of all the assets not only the assets that you have invested money on. Okay in the bandit model I only get to observe the actual loss that I incur, okay. Again whatever I, depending on the observation model if it’s an experts game whenever I play I T I know everything about the past. I know all past active losses. If I play a bandit game whenever I play I T I will only know the past losses that I incurred. But I don’t know anything about the losses of actions that I didn’t pick in the past. Okay here in this kind of nonstochastic setting where everything is deterministic in the, let’s say apart for the internal, possible internal randomization of the player. The measure of interest is regret and this is defined by the sum over a certain number of ten steps, so let’s say capital T of the loss incurred by the player minus the loss of the best action up to the same ten steps. Since we are assuming that this, the player might be randomized we will put an expectation over here. Okay, so we, including the possible randomization of the player we are interested in minimizing the difference between the commutative loss incurred by the player, which selects a certain sequence of actions, and the best possible, the smallest possible loss that a player could have incurred by playing consistently the same action. Okay is that clear? Actions have a bounded loss, so clearly this thing; this quantity can grow at more, at most linearly with time. Because actions have constant, may have constant loss at most. Anything interesting can be said whether, when I can control this difference in, so that it grows sub-linearly with time. There’s a famous, there’s kind of a let’s call this quantity. By the way let’s call it RT for regret. We know precisely up to constant factors what happens in the experts and the bandit model. What are the best possible regrets against any possible sentiment of losses to actions over time? We know that in the experts case the regret will go like square root of T with the log of the number, let’s call it sorry K, the number of experts. The best possible regret in the bandit case will also grow with the rate as respect to time I will have a worse dependence on the number of actions. This is just because I’m observing one K-ith of the information I observe in the experts model. Okay, so now this is sort of a very nice and clear picture. We know that these rates are tight up to constants. Okay, now I want to show you some model which is somehow interpolates between two, these two extremes. Okay, so I would like to do so by introducing an algorithm which is a generic algorithm that is able to achieve both these two regrets rates up to logarithmic factors. I’m willing to pay a little bit more here. Let me weaken this by including an additional logarithmic factor here. This is still good up to logarithmic factors we know it’s just a little bit worse. But now we have a single algorithm that with a slight modification is able to achieve these two regrets according whether it is run in the experts or in the bandit observation model, no. This is algorithm is a variant of edge of X tree. It’s pretty easy to explain so the probability we just have to specify the probability. It is going to be a randomized player so we’re going to have to specify the probability of picking an action at time T, given the past observation. Okay, so we denote by this, the sigma reduced by the past observations. It’s going to be something obvious, something trivial in the expert model because this will be just all the past vectors of losses. Because of the bandit what I observe really depends on the outcome of my, on the realization of my random variable, so my random selections here. Okay, so we denote this by P I T. This is going to be proportionate to E to the minus Eta, L I T minus one. Okay, this L hat here is an estimate of the past commutative loss of each action I. I’m going to pick action I with the probability which is exponentially small in an estimate of the loss that this action suffered in all past steps. We’re going to an overwhelming probability of picking the best action according to our loss estimates. But we also give some a non-vanishing probability to, of picking an action that didn’t perform the best in the past. Okay, so now what is this? This is L I P minus one is simply the sum S one to T minus one of L I S hat. These are instantaneous estimates of losses. These defined like this equals the L I, S divided by Q I S times the indicator function of L I S observed. If the, according to the observation model I at time S I do observe the loss of action I then I will estimate that the loss of the action using this ratio where Q, I should say what is Q I S? Q I S is simply the probability of observing that action the loss of that action. Probability of L I S observed given the past. >>: I have a question. >> Nicolo Cesa-Bianchi: Yes. >>: Are the L I T, so for fixed I L I T is completely unrelated over different T values T? >> Nicolo Cesa-Bianchi: Yes, it’s completely unrelated. The idea here is that if you give me completely random arrangement assignment of losses. Then no matter what action I play I will essentially make, I will, it won’t make any difference, because everything is random. This regret would be, is going to be really small. >>: But it seemed to me like in the bandit model you, if you only know what, it seems like you can’t. Maybe I’m missing something. It seems like you can’t get any information… >> Nicolo Cesa-Bianchi: Yeah, it seems, right, right it’s quite surprising that you actually can. Now I will give you a little bit of an explanation here. >>: Like what would your strategy be if for example all of the L’s are one except for one hidden one that’s zero? >> Nicolo Cesa-Bianchi: Okay. >>: [indiscernible] round all them are one except for one is zero and… >> Nicolo Cesa-Bianchi: Yes, yes. >>: I just, I guess I couldn’t imagine, so the… >> Nicolo Cesa-Bianchi: Yes, okay, if… >>: No, no, no the optimum you only compare, you compare to your fixed action on the… >> Nicolo Cesa-Bianchi: [indiscernible] >>: Okay, okay, yeah… >> Nicolo Cesa-Bianchi: You know if there’s one random, one non-zero trend then it’s easy somehow because I will… >>: You’re only companions somehow to term your word is trying to… >> Nicolo Cesa-Bianchi: I’m comparing, I would compare it with a fixed column of this metrics. >>: Sorry, yeah… >> Nicolo Cesa-Bianchi: Right, so if there’s one, just one which is consistently no zero loss then sooner or later we’ll solve the identified. If it’s random doesn’t matter. If there is some structure I should be able to pick it up even though I don’t observe everything even the bandit model. Okay, yeah but it’s perfectly, yes, thanks for asking this question. This is the probability and right. >>: Another question. >> Nicolo Cesa-Bianchi: Yes. >>: Nicolo what is the distance between the Q I S and the P I T? >> Nicolo Cesa-Bianchi: Okay, I’ll tell you in a moment. This is the next thing. Okay, so first of all okay let’s see. >>: Is there a… >> Nicolo Cesa-Bianchi: Okay, so Q I S for instance we can say Q I S is going to be… >>: It’s also current it’s also observed… >> Nicolo Cesa-Bianchi: One… >>: Oh, just I see, so the bandit problem could be the same one and it would be all one… >> Nicolo Cesa-Bianchi: Sorry, okay, so what is the probability observing a loss of a certain action? It’s going to be one in the experts case because I do observe everything by definition. It’s going to be the same probability of picking that action. Because in the bandit model, because there I only see what I pick, okay. Here we have a, and another thing, and now you can see that this definition is clearly gives you the expectation for any fixed I S, the expectation of, now I should say, should write S minus one. This is going to be exactly the correct. Because I am dividing, I’m putting here this indicator function and when I take the expectation for a fixed I then this will be the probability of observing the loss which is exactly this and will cancel. I get an unbiased estimate of that. Alright, so now I, so now you see that this is; now you see. Okay, this a very specific observation, I mean these are two specific observation models, right. I observe everything or just observe what I would pick. In general you might be willing to run these algorithms with using different observation models, okay. I will work to get these observation models. For instance you might get observation models from graphical information associated with the actions. I’ll come to that in a minute. But let me just say two words about, a few words about the analysis of this algorithm. Okay, so how do I go about proving something like that for this algorithm here? Essentially I want; the proof is actually quite short. But I don’t want to spend most of my time on it. I’ll just tell you that analysis okay, so the key of it, the key to the analysis is to look at first of all the, this weights here which are the weights assigned into actions. Okay, the, these weights when I normalize them will get me the probabilities with which I pick actions here, okay. Then I look at the sum of weights at a certain time step. Then I look at the ratio between let’s say normalization, these are the normalization factors of these weights which get me the probability. I look at the evolution over time of this quantity over here. This is again a potential function that allows me to analyze the evolutional let’s say effectively gets me a way of controlling this notion of regret. I will give you just some little hint about the analysis. Once you can prove deterministically, deterministically I mean consider any realization of the random choices of the algorithm as given by these probabilities. The algorithm is playing according to these probabilities and it will have a certain realization of actions that are selected up to time T, okay. Now I want to tell you something about this sequence of actions. It’s not really easy to prove something like this sum over time, sum over actions of P I T L I T hat, smaller than equal. Then the cumulative estimated loss of any action T. I will be interested in the index of the best action for the horizon I’m looking at because, so K will be little k, let’s call it J. J will be the index of the action achieving this meaning over here, okay. Then I can plus, so this is basically very simple geographic manipulation of the, of this quantity here, assignment of a time. I’m just using a very easy second order tailored expansion in order to linearize the exponential function over there, and very little less. It’s basically a geographic manipulation. I get to, okay, so this is sort of a basic equation which I get to very easily starting just from the analysis of this quantity here. This holds deterministically for any sequence of actions by the player. It’s at the basis of all analysis of these exponential weighted algorithms. Now what I can do I can take expectation with respect to the random, with respect to the distribution of these random variables. I know the distribution is this. I already know that these are unbiased estimates of the losses. I can also very easily see that the second moments of these estimates are easily controlled by this, one over S. Can you read over there, still okay? Basically just because the definition of the, it’s various to see because of the boundedness of the loss because of the definition of the loss estimates. I can prove, sorry this is going to be an inequality I can prove this. Okay, now if I take expectation here I’m using the unbiased in this and that inequality here what I get is the following. Okay, so I still have expectation here because these are random variables. Okay, P I P, L I T, so the hats go away because estimates are unbiased. Also hats go away this is the sum of losses so it just by linearity the hats go away. This is a J and this is sorry capital T. Okay, this is a constant and I have something here. Okay, I made a mistake here I had a P I T which I forgot here, okay. What I get here is P I T divided by Q I T. Okay and this is also a random quantity because these are determinate, are random functions of past observations in general. Okay, now good. Okay, now here is just, this is just the cumulative performance of the players. This is the average loss of the player which is playing according to this probability distribution. This is summed over time. This is like the cumulative loss of the player. We can call it S of I, okay, is this quantity over here. This is the, we can adjust big J to be the loss of the best action in the time horizon. I get mean J and J T; I just pick the best one. Then I have L and K over Eta and then it’s very easy to get, to see here what happens. I have going in the experts model Q is one. Here I have a sum of probabilities, so this is one I have a sum probabilities which is one and I have sum of a time. This quantity here becomes Eta hats T in the experts case where T is the horizon, I’m summing up. In the bandit case well Q is P, P over P is one, so I get an Eta hats T K because I am summing at one K times which is the number of actions. Okay, now by picking Eta in order to trade off these two terms, experts case, so this is again expert, and this is again the bandit. I exactly get these two bounds over here, okay, good. Now this is the first part of the story. Now again I would like to play a little bit with this observation model. What, suppose now that the actions have similarities, so again suppose that there is like, so my actions are maybe like this. There is some graph, so these are my K actions. There are some similarities among them. The edges of the graph indicate similarities within action. Maybe no I, actions are ads that I display on some web page. Whenever I put an ad and I get some information about the revenue that that ad got of the click through probability. I will also know that similar ads would have gotten a similar loss or similar gain, okay. Now I can assume that whenever I play a certain action, suppose now that I play this action over here. Okay this is I T, now I only get to see, I don’t see everything. I see a little bit more than what I actually played. I also see the losses of actions that are in the neighborhood of the action I picked, okay. >>: In your example in your other [indiscernible] more people that I get maybe my loss exactly but Jason’s loss in some, has some… >> Nicolo Cesa-Bianchi: I could have like a random signal that is, yes, that’s correct. But this is a statistic free talk. I won’t have any, I don’t have any, I won’t have any randomization in the model, okay. But definite is true; indeed we have examples of that. But that’s just you know as a philosophy of this talk let me. Okay, so now it’s kind of easy to, of course this generalizes both models, no. In the experts case we have a click. Everything I pick gets me to see everything is okay. Okay, so the neighborhood of every node is the entire graph. Okay, in the bandit case I have an empty graph, okay. Whenever I don’t have any edges so whenever I pick something I only get to see that. I general I can have anything in between. Now the question is how should the, I expect to see some scaling here that interpolates between the two. Okay, so what kind of scaling should we expect here, yes? >>: [indiscernible] graph takes you in through the process… >> Nicolo Cesa-Bianchi: Not necessarily, not necessarily, I can’t comment about that. Let’s, for the sake of concreteness we can assume that for now that the graph is fixed and maybe unknown, you don’t know that the beginning, okay. How would you go about playing this game? Okay, playing this game you just don’t do any of the algorithm is good, so you just use the algorithm. You can, what is now Q I T, so Q I T is the probability, and a call is a probability of observing L I T given the past. This is going to be equal to the probability of picking that particular guy, plus the sum of all the J in the neighborhood of the guy, of the probability of picking them. I will observe the loss of this if either I pick this or if I pick any action in the neighborhood of this guy here. Okay, so now excellent, so we are basically done in a sense that all that’s left to do, all that’s left to do is to study this quantity here, because this quantity will determine the final regret. Okay, so how does this quantity because you know we had two easy cases. In bandit experts were easy, no work you just put P and simplifier, you put one and sum, nothing to do. But in general if you have a general observation model then it, you’ll have a little work to do, no. Let’s see how it looks like. Let’s look at one of these guys. Just for a specific T so it can drop the index. I have P I divided by P I, plus sum. This is sum over I, sum over J in neighborhood of I of P J. P one, P K is a probability assignment over the vertices of the graph. Okay, so now how big can this be? How can I, so one way to look at this is to take the counting measure just as for the sake of clarity. If I put the counting measure there I get something completely communitorial. This is one divided so it’s one over K and I just simplify the K throughout. I get just the, sorry, the size of the neighborhood. Okay, this is, okay, so give me any graph an oriented graph. What is the sum of the reciprocal of the neighbor of the degrees plus one, okay? Okay, so this is a well known result. The result is and this, they tell you that it’s upper bounded by the independence number of the graph. It’s the largest subset of the vertices of the graph such that no two vertices in this subset have an agent common, okay. >>: This corresponds to a particular algorithm where you label… >> Nicolo Cesa-Bianchi: Yeah, so how do you prove these things? This is actually easy and fun to prove. Let’s start from, we start from, okay give be any graph and let’s prove this upper bound over here. What I do is I, let’s call a Q zero this quantity here, I one over one plus an I, okay. Now let me get I zero is going to be the vertex which has a smallest neighborhood. Okay, now I’m splitting the sum to zero. I’m splitting it to the sum over, okay let me do like this. Let me consider, I want to consider the graph I zero and I want to cut a hole in the graph. I want to take out I zero and the neighborhood of I zero. I cut this out from the graph. I zero the neighborhood and all the dangling edges, okay. Now the sum is split in two parts. What is left I call it Q one. Q one is what is left of Q zero in the sum when it took away I zero and all the vertices in the neighborhood. Plus what I took away which is I zero union and, let see I zero, let me get this right. Okay, so a sum over J, okay, and a sum one over one plus size of N J. Okay, so by, can you see if I write down here? >>: No. >> Nicolo Cesa-Bianchi: No, okay, so this is a forbidden area, it is a no-fly zone. Maybe I’ll write down here, okay. What happens now? Well you see that this kind of unfortunate so let’s look at the quantity here, for this quantity here, sorry, I wasn’t planning. This quantity here is going to be okay this guy, I zero is the one with the smallest neighborhood. It’s in the sum. I can replace every term of the sum with the term, corresponding term with I zero because at the smallest denominator so it’s the largest term. I have sum over J in I zero union neighborhood of I zero, okay. Then I have one divided one plus size of the neighborhood of I zero. Okay, so now you see that I have a constant here. This term is just equal, the sum is just equal to one because I summing exactly this neighborhood of I zero plus one terms that are constants equal to the one over one plus neighbor, size of neighborhood I zero. Okay, so now I know that this is at most Q I plus one. Okay, so this is Q one, sorry Q one plus one. I, and now I recurse on the remaining graph. I take again the I with the smallest degree in the remaining graph, in the graph with the hole. I take it out and I again I can write that this is at most what is left plus two. Okay, how many times can I take out, can I repeat this process? I take out the vertex and all the neighbors okay. I can do it exactly at most independence number of the graph. Because this is the largest number of edges that, vertices that, the largest subset of vertices that won’t have, that I won’t remove when I remove any of them, subsets, okay. You can see this? >>: [inaudible] >> Nicolo Cesa-Bianchi: Yes. >>: I’m sorry you finished yet? >> Nicolo Cesa-Bianchi: Yes. >>: One alternative is when you once started the way to put in the independence second just label the graph within independent uniform variables. Take those vertices that have a max, that are local max amount that they are [indiscernible] all the neighbors. When you do that the extracted size is exactly… >> Nicolo Cesa-Bianchi: Yeah, there are many ways… >>: That gives you… >> Nicolo Cesa-Bianchi: There are many ways of doing it, yeah, many… >>: But just the randomization gives it to you immediately as an equality, because the expected size of the cube label… >> Nicolo Cesa-Bianchi: We, okay, I was planning to use this proof a couple of other times. That’s why I’m using this specific, I’m sure there are yes, yes definitely different points. >>: Yes [indiscernible]. >> Nicolo Cesa-Bianchi: Okay, so this proof is for the counting measure. But you can generalize it for any of the three measure of the graph, okay. Now this gets you something which is, will get you, this quantity now will be bounded by the independents number. Now in the regret the proof will immediately give you that the regret was K [indiscernible] T alpha L N K. You may immediately see that in the case of the click the independents number is one, this is one and I recover the experts bound. In the case of the empty graph the independents number is the number of vertices because I don’t, everything, everybody’s independent of each other. I get, I’ll get a K here which is the upper bound for the bandit. Okay, so this is, nicely interpolates between the two. Okay, now in the remaining time I want to take a look at a different a more general situation which is, okay, which is what happens when I have directions on the graph? This can very well be no because in, so let’s assume that you have a situation in which, okay. You know it’s like the example I make usually is they say you’re going to buy a game console. You get the recommendation for buying like high definition cable, okay. If you’re interested in the game console it’s likely that you’ll need a high definition cable in order to view the games very nicely. The other way around is less likely. If you buy a high definition cable maybe you don’t have video console, you have some underneath, okay. There are directions now and did I show [indiscernible] of course are reducing further the information that you get, okay. Now in which sense what is the observation model here? The observation model is that whenever I pick some action I only observing the losses of the actions in the out neighborhood. I am observing the loss of the action I pick and of all of the actions that are pointed to by edges of the actions I pick. Okay, so I won’t see this anymore because the edges point in the wrong direction. Okay, so now the, I can write again I can just, okay; I can just revise this definition here. Now the probability observing the loss is just, we can just put a notation here, just the sum. It’s the probability of picking that action plus the sum of the probability of picking actions that are in the neighborhood of this. What is the probability to discern this loss? Either I pick this or I pick any action that has an edge pointing to me, okay. This is now, I have reduced the information and I would like to know how, what’s the correct regret rate? By the way in the previous case where we saw that the regret was scaling with independence number of graph we, that’s tight for any graph. For any graph you can provide a matching lower bound for the game played on that graph that corresponds to the… >>: Independence number. >> Nicolo Cesa-Bianchi: To the independence number, okay so that’s sort of a variant of the standard bandit proof. Okay, so let’s see how do I do this? First of all I hoping okay maybe you know I can still prove something like that. Where here I just put something bigger which is, sorry, something smaller which is the in the neighborhood. However, there’s a counter example that rules out the possibility of getting exactly the same kind of behavior. Okay, so I, so let’s see if I have a directed graph like this it’s, its total order. Okay, so actions and now I have just like this and I have K actions, so total order over K actions. Now I have a probability assignment of, which is exponentially small on the, so let’s say we number things one to up to K. Then I have probability I is two to the minus I. I have a very small probability of picking an action that observes, that gives me the total visibility, okay. If I pick this action I have solved the loss of everybody else. It’s like in the experts case if I pick this action I’ll absorb the loss of this specific action, it’s like the bandit. Okay, so if I have this bad probability assignment it’s easy to see that the, this sum over here, so there sum I want to K P I T, P I, this P I over Q I. This is going to be K plus, K minus one divided, K plus one divided by two I believe. But if I look at the, this is in the, if I drop the orientation of the edges I get a click. Okay, so the independence number of the edge without orientation is one. But this quantity here is bounded by something linear in the number of the edges. I cannot hope in general unless I make any sum assumption. I can offer to bound the D, the same quantity I had here now restricted on the in neighborhood with independence number of the graph, dropping orientation of the edges. Okay, so the problem here is that, well one might, one could blame several you know components, several ingredients of the system. One might decide to blame the fact that I have big, too high variants in my loss estimates. One way to reduce the variants in the loss estimates is to introduce bias. An easy fix to this problem here is to change it to alter a little bit my loss estimates. My loss estimates now will be tailored to the fact that I, my graph has orientation. I can expect situations like that where my standard estimates won’t work. Now I do a biased loss estimates. This biased loss estimates so let me just use the same notation here. Will be just the same as before, L I T divided by Q I T indicator function of L I T observed. But now I will add a little bit of bias down there, okay, in order to keep those things down. Now I’m under estimating the true losses. This is a negative bias. But I can control the variants in a good way. Essentially if I redo the proof I did before for the original estimate I get something very similar. I get, once I take expectation the regret that quantity over here if I use these estimates here is bounded by something like this, plus gamma. Then I have this same, sorry, I have the expectation here. I’m almost done and then I have the L I T hat squared the P I T. Okay, so the only difference here is that I have, okay let me write this actually as it should be, so P I, T divided by Q I T plus gamma. Okay, so I get a very similar relationship that controls, very similar quantity controlling the regret in this previous case. I have to deal with this gamma here, okay. You see here gamma is playing essentially the same role as Eta which is the one I had in the, the parameter I had in exponent of my, for my probabilities, although here I should find a way of dealing with this quantity here. Okay, so now in, I can prove for any choice of gamma I can upper bound this with something which is again the twice the independence number of the graph plus a logarithmic factor, which depends on several things including gamma of course and alpha. Okay, so you see here that the price I pay in order to, well I can still control this quantity here in terms of the independence number as I did before. But have just an additional logarithmic factor which will depend by, which will depend on this gamma term, this bias term here I have introduced. The way this can be proven it’s also interesting. I would like to show it to you and I think I have the time. It’s not going to be long. Again, I will make a proof for the counting measure. The proof for the counting measure is kind of silly because if I know that I have a uniform measure I know that I, these things are ruled out, so I’m kind of, but this is essentially the essence of the proof is already there. Then by introducing this bias term I can, I’m able to generalize the proof to arbitrary measures over the graph, okay. Because this gives me enough control on the denominator. Okay, so let’s see this proof. I want to, it’s again it’s completely communitorial question. I have an oriented graph, a directed graph, and I want to control. I can’t find things, okay, you want to control the sum over all the vertices over one, one, one, over one plus the size of the in neighborhoods of these vertices, okay. This is the, like that gamma zero uniform measure. Then there is a sort of technicalities to generalize it in order to get this upper bound here. How do I prove this? >>: What’s inequality end point… >> Nicolo Cesa-Bianchi: Yes, you’re right. I just did this right inequality. The inequality will look like this. It’s again without the, if the graph, if I drop orientation I have a neighborhood I just know that it’s alpha. If I have orientation and just consider the in neighborhood then I have twice alpha at times the log factor here. Okay, so now let’s see how the proof goes. Okay, now I’m going to, again I’m going to pick a sequence of vertices as before. I’m going to take out the I zero which is the, this time is the one with the largest in neighborhood. Oh, that’s correct, yes, I take I zero out and it recurse on what’s left. Okay, so I’m just taking a vertex out, all the edges, not the neighborhood just the vertex. Okay, so it’s again as before. But now I just take out this guy here and they left the neighborhood intact. Okay, and, alright so let’s see let’s reason a little bit about this. Now I want to take, I want to relate to this with the independence number of the graph without orientation on the edges, without directions on the edges. This is the maximum so definitely this is going bigger than the average. Okay, just because I just I picked the maximum. Okay, so the average equals to the number of edges divided by K, sorry, the sum here equals to the number of edges. Because I’m counting, I’m not counting twice because I have directions I’m only counting in. If I sum over all that, I just get the [indiscernible] of the number of edges divided by K. Now drop orientations, if you drop orientations you might instead of counting twice this guy you just count it once. But this will only reduce the number of edges. I’m still going in the right direction in this chain of inequalities. Now pretend I’m dropping all orientations at this stage. Then you use what Turan’s theorem. Turan’s theorem relates the density of [indiscernible] undirected graph that with the independence number. This gets you exactly K divided by two independence number of the graph minus a half. It’s, usually it’s not written this way but I just wrote it for convenience like this. Okay, so now I can essentially do a risk, write a recurrence as I wrote before. This is going to be, I’m looking at my quantity of interest here, so this quantity here. I can split now in two parts. The part that I have there which is, so this is going to be at most, okay at most because this is smaller, this is going to be at most one over one plus neighborhood of I zero, plus the rest I different I zero, one plus one plus nine, okay. It’s not minus one this is minus. Okay, so now just plug that in, I just plug this lower bound on the size of the, in the neighborhood of I zero. That gets me an upper bound because I have a denominator and this is alpha plus K. Then I have the same thing over here. Okay, now I recurse and I just took out one vertex from the graph. I have a new graph there with a smaller number of edges, a smaller number of vertices. I keep going so I pick again at the vertex with the largest in neighborhood. I take it out and I keep on going like this. At the end what I have is that the quantity over here. I write this I have sum over I one over one plus N I minus is more than equal than sum over I two alpha A K I K two one, I believe. The first time I have K vertices. The second time I will have K minus one and I go down till that, okay. I just get to alpha then I sum this harmonic sum over here and I get exactly at most one minus one on a K L. Okay, so essentially, right, I can take this proof generalize it a bit in order to get a control of this term which will include, will be a little more sloppy, a little more loose. I have a case [indiscernible] and everything but I can’t deal with, I can handle a better probability assignments. Okay, now if I just tune it, properly tune it in gamma properly and the tuning of it in gamma will be the same. I can get a bound which is exactly, which is the same form of a regret bound I had before. Again it will depend; it will again depend on the independence number. But it will have additional log terms, log factors that will be length K and also length T because gamma will be tuning in terms of one over T, one over square root of T. This is essentially even in the undirected case I can still get a control on the regret for a [indiscernible] directed graphs by using two tricks by using essentially just one simple trick which is adding a little bias to the sum. Then with the proper control on this quantity which is really the key quantity that rules the regret here. Essentially get the same result with just a different log factors. This is basically the message. This is, I found this interesting because it gives sort of a nice way of blending between the experts and the bandit model. Yes? >>: Can I ask a question? >>: Yeah. >> Nicolo Cesa-Bianchi: Sure. >>: In the directed graph model we could draw the self, the little edges pointing to myself explicitly, right, just in the picture. >> Nicolo Cesa-Bianchi: Okay. >>: I could also omit some of them. Would that break the proof if I don’t point to myself? >>: Yes. >>: So where can you point… >>: No, I assume it would be… >> Nicolo Cesa-Bianchi: If it’s disconnected you mean? >>: I just don’t have a, I don’t review my own loss… >>: [indiscernible]… >>: Everybody has at least one incoming edge but it’s not for myself. >> Nicolo Cesa-Bianchi: No, if you don’t see your own loss it’s a problem. >>: Yeah, show me where it breaks then. [laughter] >> Nicolo Cesa-Bianchi: Well, well there’s a, this is a, there’s a lower bound in which you have a worse dependence on the, on time T to the two thirds. It’s sort of a revealing game. You play a good action but you don’t see it. In order to see something you have to play a bad action. >>: No, I knew this but I, you I thought it had proved to be that I knew wrong. Where, can you show me where the proof breaks down? >> Nicolo Cesa-Bianchi: I, where the proof breaks down… >>: These Q’s are always bigger than P’s. >>: Right. >> Nicolo Cesa-Bianchi: The Q will always include the P. This is a… >>: [inaudible]… >> Nicolo Cesa-Bianchi: This is… >>: If I promise that Q is bigger… >> Nicolo Cesa-Bianchi: You don’t have a one here this is what you mean. You don’t always have a one down there which means that you don’t always observe. Yes, that’s pretty crucial to it. I mean I… >>: If I promise that Q’s always bigger than P its okay? >>: There’s a minus or half there that seems like it might be where… >> Nicolo Cesa-Bianchi: Oh, Q is always bigger than P. If Q is always bigger than P yes because I’m a failsafe to the bandit situation, that’s correct. >>: Okay, so I don’t need an edge to myself… >> Nicolo Cesa-Bianchi: If you can’t always ensure that the probability of serving a loss is always at least as big as probability of picking the action that corresponded to that loss then it should be okay. >>: I see. >> Nicolo Cesa-Bianchi: Yeah. >>: But then it would be hard to guarantee specifically there’s one really great guy who’s probability shoots up and that’s it and you’re done, and he is not. >> Nicolo Cesa-Bianchi: Yeah, I mean it’s, I mean all these arguments don’t assume anything about the probabilities. This, all these things hold for, I mean all the communitorial stuff holds for any probability assignment. It doesn’t matter what target I run. >>: Okay. >> Nicolo Cesa-Bianchi: Okay, so this is something that if you start assuming something about the behavior of the probability, so the algorithm then it gets really, really tricky. You may also imagine alternative observation models in which you basically you know what you observe the pair really depends maybe on the realized losses. If you pick an action and you, that action has zero loss you don’t observe anything. But if your action has a big loss then you get to see something else. You may also, I mean there is some of that and the, these graphs don’t have to be fixed as I said in the beginning. You actually don’t need to know the graph in advance. Suppose the graph are varying over time then instead of having a dependence on the constant. Instead of having a dependence on the T alpha we’ll have dependence on sum over T alpha T, on the sum of the K square root, square root, on the sum of the independence numbers of the sequence of graph. I can have the observation model changing over time. I don’t need to see it before hand. Before hand I just need my probabilities one depend on the observation model. I can pick it blindly and then sums what tells you okay actually this is what you observe. I don’t need to; I don’t even need to observe the entire graph in order to update my probabilities. In order to run the algorithm I just need to observe the graph up to the second neighborhood of the action I picked. Because I need to update the probabilities for all actions whose loss I know. This is up to I need to know the probability of picking. If I pick this guy I need, I will observe the loss of this guy. I need to calculate the probability of observing this loss which is a function of the neighborhood of this guy. I need to know some vicinity of the graph of the action I picked but not entire. Okay, I’m done. Okay thanks for your attention and patience. >>: [indiscernible] [applause]

Document 17865194

Related documents

Products

Support

Document 17865194

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib