36993

36993 >> Eric Horvitz: It's a pleasure to have Adish Singla with us today. Adish is a long-term Microsoft person. He actually was at Bing Search as a senior development lead for over three years before going off to get his Ph.D. with Andreas Krause at the learning and adaptive systems group, which I think in part borrowed its name from its adaptive systems group here given Andreas his advisor was an intern in our group. Actually co-located in time and space with A.J. Kamar when she was an intern here. And Andreas went off to ETH. And so we consider Adish a grandson intern. That's what we call interns of interns of the past. And maybe we'll call the great grandsons soon. What do you think? Anyway, Adish is here to speak about learning and incentives and in crowd powered systems. He's done incredibly insightful work in human computation and crowd sourcing and incentive mechanism design, submodular optimization, among many other interesting projects with applications and education, fairness in how bicycles are returned in bicycle sharing settings in cities, photo sharing applications. So it's really -- he's taken insights from quite a rich set of application areas and with each one, we find new insights and ideas. So Adish. >> Adish Singla: Thanks a lot, Eric, for the introduction. And thanks, everyone, for joining the talk. So, I'm Adish from ETH Zurich. And in this talk, I'm going to talk about crowd-powered systems and specifically the role of learning and incentives. So let's begin by looking at the spectrum of crowd-powered systems that we have. For instance, stack overflow, which is a social [indiscernible] for questions in computer programming. And it has over four million users and about 5,000 questions that are answered every day. So broadly speaking, this is an informational system which is driven by the content of the participants. And then we have significant science projects where volunteers around the world help conduct scientific research. On Galaxy Zoo alone, over millions of images have been annotated so far by these volunteers. And on the other side, we have marketplaces such as Mechanical Turk where workers perform micro tasks in return for monetary incentives. And platforms like uproar, enable, outsourcing of the job to see a huge talent pool of over ten million freelancers around the world. And an emerging paradigm is the location-based services, especially enabled by the growth of smartphone usage. For instance, on Waze, users can share realtime traffic information with each other. And Uber and Airbnb are perhaps the two recent examples of crowd powered systems that have entered our daily lives. So this is a snapshot of only the success stories. Of course many more systems developed, built, but unable to attract participants. So one of the key ingredients for the success of this crowd powered systems is really the active and effective participation of the users. So the key research question that we're tackling a work is how to incentivize the users to improve the overall effectiveness of these systems? Okay? So let's begin by looking at some of the challenges in learning about the incentives for the participants in these systems. For instance, if you take a very simple example of marketplace like just Mechanical Turk, a simple caution is simply how to price the task. Right? The requester has some kind of budget and time constraints and over pricing the task would lead to ineffective users budget whereas underpricing would lead to task starvation because nobody would be willing to perform that task. Right? And similar challenges arise in learning skills and expertise of the participants and then matching them to tasks which are engaging for them. Right? And generally, we can think about these challenges as modeling and learning the user preferences and understanding the kind of tradeoffs that they make when they contribute to the system. So I'll focus on one specific project for the rest around 30 minutes so that you can better appreciate some of the challenges that we have been dealing with. And specifically, we're going to look at bike sharing systems and how the operators can incentivize these users to balance the bikes across the system. Okay? So what are bike sharing systems? Well, it's an emerging paradigm of sustainable urban transportation. So what you can do is you can pick up a bike from one part in the city very conveniently and efficiently, go to another part of the city, and drop off the bike. And you can have these stations at parts of the city which are difficult to reach by cars for instance and is cheap and green alternative to other modes of transport, especially car sharing. Right? So there are a bunch of nice advantages and this increasing worldwide adoption. Right now, there are over 700 cities around the world serving close to one million bikes. And in Seattle, there's a pronto bike sharing system which started operation in October 2014 and currently serving 50 stations and over 500 bikes and the plan is to grow this to over 2,000 bikes in the next few years. So any guesses where is the largest bike share system in the world? Like to make a guess? >> Zurich? >> Adish Singla: Actually, Zurich doesn't have a bike sharing system. So the largest one is actually in east China. It has amazingly large number of 3000 stations and 60,000 bikes and they have dedicated lanes throughout the city for these bikes. The second largest is in Paris. There we have basically bike sharing system with 2000 stations. And the largest one in North America is tied between Montreal, Chicago, New York, and Mexico City, which have around 450 stations and around 6,000 bikes. Okay. So, how do people use these bike sharing systems? As you would imagine, as a user, you decide to go somewhere. So what you would do is you would pick up a bike from a nearby station. Drive to a station close to destination and drop off your bike. Right? And what could happen is that a bunch of users who are picking up the bikes from the same station and as the stations have a limited capacity, some stations may get empty. And similarly, of course, what could happen is that a bunch of users who are driving to the same destination station and given the limited docks at the station, some stations may get full. So empty and full stations look like this. And basic, if you are at an empty station and you want to pick up a bike, you won't find any. And at a full station, it will be hard for you to drop off a bike. And as a user, you would get disappointed. You would have to go to another station and may not use the system again. So as a system operator, this is something you would really like to avoid this situation. Okay? So what could be a way to solve this problem? Well, one very obvious one is to move the bikes from full station and put them back to the empty stations. Right? And this is one way how it is done. So this -- the truck in this picture is from our collaborators in Mainz, Germany. So which basically moves from station to station and drop the bikes from full station to the empty station. Okay? So this is a solution. However, some of this is not so much in the green concept, in the spirit of green concept of bike sharing systems, right? It's an expensive operation. It may be difficult to do truck-based repositioning, especially during peak hours and at the inner parts of the city. And of course, it leads to traffic congestion and pollution. Right? So what could be an alternative? Well, thinking of this in terms of crowd sourcing and thinking of this as a crowd powered system, one alternative way could be if we could get some of this users of the system itself to move these bikes. So the key research caution that we ask here is how we can engage it's users in this repositioning process to help solve this imbalance problem. Okay? So let's briefly look what this engagement would look like. So the idea is very simple. So let's say a user is going to drop off a bike at some station and there's a station close by which is actually empty. So now, if we could somehow convince the user to go to this empty station and do a little bit of extra walking, then it's useful, it's good for the system because this station is not empty anymore. Right? And similarly of course, a user wants to pick up a bike and there is a full station nearby, so instead, we would like that user to go to that full station to pick up the bike. Okay? So that's our idea of engagement and how we want to kind of incentivize users for this extra walking, extra that they have to put. Okay. So there have been a bunch of recent work in trying to figure out these incentives in these bike sharing systems. One recent one is basically what is used by Paris. Velib bike sharing system. They have something what is a non-monetary incentive. So if you are going to some elevated stations, so they have premarked problematic stations for instance which are at higher altitudes, so users get 15 extra minutes when they're going towards that station. And the idea is that to bring more bikes to these problematic stations. Okay. So what we have been doing in our work is a little bit more adaptive and more dynamic. First of all, we offer monitoring incentives. So when user actually execute our task, the money actually goes into their accounts directly. So that acts as a stronger incentive than just having extra minutes. Then we have more dynamic system where we decide based on the current demand of the system as well as future forecast which stations we should offer these incentives for. Right? And we also have learning policies which can interact with the users and based on user feedback, we can update what kind of price offers to make. And finally, our goal is to maximize the overall quality of service. Okay? So one accepted notion of this overall quality of service is to minimize no service events. So what is no service event? Basically if a user wants to pick up a bike or drop off a bike but is unable to do so because the station was empty or full. So that's a no service event. Okay. And our goal is to minimize this to improve the overall user satisfaction. So let me tell you a little bit about the overall architecture that we have been developing in this context. So it looks like this. So first of all, we have the hardware components, what is the rental stations and the bikes, which is being managed by the bike sharing operator. So in this context, we are collaborating with this company MVGmeinRad which is located in Mainz, Germany. And then we have a virtual infrastructure which basically keeps track of user accounts, makes some forecast about the future traffic as well as keep track of stations and bikes and this forecast is then also used by the repositioning policies. And this virtual infrastructure is managed by a spin off electric field based in Zurich. And what we have been developing on top of this is first of all, we are developing a component incentive scheme will basically decide or reason about which stations we should offer incentives based on the current load in the stem as well as a pricing mechanism to learn based on interactions with the users as well as also it has APIs to communicate with the rest of the infrastructure to keep track of the latest system. And then we have a user more link component which essentially reasons about how a user would respond to the incentives we offer. And then we can make informed decisions based on this component. And on top of this, what we have develop is a smartphone app which basically allows us to interact with the users so what it would do is user would open up this app to look for nearby stations. Also it would show some specific stations where users can get these offers. So let me say a little bit more about each of those components now. So let's start with users. So users are associated with the location which is basically their current location where they want to pick up the bike or drop off the bike and we have a very -- quite simple model which basically assumes that there's a maximum distance beyond which users are not willing to walk. So this is what we call D max. So some kind of psychological variable. Let's say users are never going to accept to walk more than 500 meters. So we can only make our first within this radius of 500 meters. And users have some private cost. So that's the sort of effort they have to put to go to an alternate station. That's given by the C sub-U. And this cost is sampled from some underlying distribution with [indiscernible] and this is the cost curve that we would try to learn by interactions with the users. >> [Indiscernible] be fair and not too personalize costs correct? >> Adish Singla: >> Okay. Yes. That would be useful [indiscernible]. >> Adish Singla: >> Yes. The probability of [indiscernible]. >> Adish Singla: >> I will just comment on this in 30 seconds. And with D max, that's a hard -- [indiscernible]. >> Adish Singla: >> Yes. So here, yes. [Indiscernible] assumptions move on it's how we okay it. >> Adish Singla: Getting back to this -- some of more realistic models actually. So here, what is happening is, yeah, we infer the location from the app. So D max is something as a fixed threshold and this is something that we inferred from the bike sharing operators and we take one fixed max for this population. And affix something that is learned by the mechanism. So few simplifying assumptions and also as kind of first experiment to deal with fairness so we don't have any personalization aspect in this so there's one unified F that we assume for the whole population. And some simplifications that we make I'll come back to those later that we are not considering context for instance time, weather, or the actual walking distance required. And we can extend some of this results to this setting which I'll mention later. But in this ->> [Indiscernible] is the standard [indiscernible]. standard for all [indiscernible] or just ->> Adish Singla: yes. >> [Indiscernible] the So this is [indiscernible] and for the whole population, [Indiscernible]. >> Adish Singla: Yes. >> Also the D max question. Is the idea of D max, it's the distance between stations? Or sit a distance -- so I could imagine that if my true destination is someplace you don't know about, then the distance that I'm willing to walk is unknown, right? >> Adish Singla: Yeah, so distance is something that how much you would be willing to walk generally. So let's say we never want to make offer more than where you guys have to walk more than 1000 meters. So this is independent of the locations of the stations. >> But the distance that I'm willing to -- the distance that I will have to walk is a function of where I'm going, are which you don't know. >> Adish Singla: Yeah, so this is -- so given you have picked some station where you are going anyways, this is the alternate station or recommendation that I'm making. So basically, I know your current station, and I want to say yes to you an alternate station. So that station I would never suggest to you if you have to walk more than 500 meters. This is only for the part of this incentives or recommendations. >> I see. So if my destination is here, the station is within 500 meters of there and then I have -- you recommend that I walk 400 meters the wrong way and I would actually have to walk 800 meters to get where I want to go. >> Adish Singla: So we don't care about the actual -- we don't know about the true destination. But this is only what the preferred station and recommended station. >> So what is the main motivation for users to use bike sharing? of going places where public transportation is not available for? exercise or they just like this way of going? Is it kind Or to >> Adish Singla: So few things. So first of all, basically, the first 30 minutes of bike sharing is free. So basically provides you very fast and flexible where you can just pick up the bike and go somewhere and it's free for 30 minutes. And that essentially covers most of the span of the city that it provides. >> [Indiscernible] which of the -- which of the bike sharing schemes? >> Adish Singla: For most of the bike sharing schemes. All over the world, first 30 minutes are free. So there's one annual membership that you pay. >> So it [indiscernible] the model's the same throughout the world, the first distance free. >> Adish Singla: Yes. And then you have basically hourly rate after that. So that's actually a strong incentive to just pick up the bike and go somewhere and I mean, if you have around 200 stations located in the city where it is, you can always find some bike. And I think it's also kind of a lifestyle question that people are preferring to go this way compared to cars or public transport. >> I wonder if there's a way to combine this method with public transportation to expand the D max radius in some cases. I know that people prefer biking but imagine there are two stations and they're easily, you know, it's not a walking distance that people would actually take a bus to go there. >> Adish Singla: >> Yes. Would you consider that to increase your radius? >> Adish Singla: It could be, but I think in most settings, the stations are actually quite close. So in a sense that you always have a few stations within this radius. So stations are quite dense actually. And we wouldn't -- I mean, basically that would definitely be possibilities to extend this in more scenarios, but this is the simple setting where we say so users don't have a lot of burden so they basically say this is where I'm going, there's an alternate location just ten minutes walking so I can just go there. Okay? He so, basically that's about the users. I'll get back to some of these questions again later as well. So this is about the users and now let's look into all the stations. So for station S, what we have is the total capacity of that station. That's the number of bikes that the station can hold. And there's some current load, that is how many bikes are there at that station at this time. And then we have some near future forecast of how the load is going to change let's say next two hours. So based on this, what we do is we classify stations as near empty or near full, that is there is some high chance that these stations would get empty or full in near future and these are the problematic stations that we want to deal with in this time window. Okay. So the way it looks like is something like this, right? Let's consider this pickup scenario and let's say a user appears at this location L sub-U. Okay. So by default, we assume that user is going to pick to the nearest station. Okay. So that's again a model assumption that user is going to pick to the nearest station. And what we would do is now, we would look at this radius that we think user would walk if we would provide him some incentives and we will look at the problematic stations in that radius. Okay? Now, what we would do is we will give users some incentives if you find some problematics station in that radius. Okay? So this is what user would see from kind of the app when he would open up the screen. Okay. So that's essentially answers this question of how do we pick which stations to offer incentives and when. Okay. So now the next question is how do we decide this price offers. So I'll discuss that. And so now, essentially given that we are fixed which stations we are picking, now the question is how do we offer this incentives, what price should we offer, and of course, we want to do this under some kind of budget constraints. And the way we view this is that we have some kind of continuous budget allocated to the system. Let's say every two hours, system allocates us €200. And then we also have some notion of number of users we would interact with this -- in this time window. So let's say based on the traffic forecast, we know how many users we're going make these offers to. And our goal is to make these offers so that we can maximize the number of acceptance in this case. Okay? >> Can you raise the price or are you just -- you're only -- >> Adish Singla: It's going to fluctuate until it kind of learns good price to offer. But this kind of a learning mechanism where to offer a price and then user is going to accept and reject, and on that, we -- if he's always getting rejection, then we probably raising the price. >> what is a it's going then based end up But in principle, you can make the price [indiscernible] station? >> Adish Singla: Yes. Yes. So that's where the budget constraint comes in. Basically, you have some kind of constraint that you cannot just change the prices as you would like. >> You have some partner. Is the partner allowing you to increase prices beyond what they're already charging? >> Adish Singla: So this is just the offers that they're getting. they're getting this money. >> So [Indiscernible] negative. >> Adish Singla: So we are not doing that. Yes. So we are basically -- >> [Indiscernible] reducing user prices, right? prices. >> Adish Singla: >> So we are just giving them money. You were discounting potentially the negative -- >> Adish Singla: >> You're not allowed to raise Yes. [Indiscernible]. >> Adish Singla: Right. >> I thought your original question, I was wondering, you can't actually raise prices. >> Adish Singla: Yeah. So you can actually raise prices as well. That's something that Uber does for instance. Sometimes if there's a huge traffic, then it will charge you 1.5 times. But the question -- the thing in bike sharing system is that user are kind of used to this free pricing. So the first 30 minutes are free anyways. So basically ->> So your mechanism is just lowering prices. >> Adish Singla: We are offering them -- so there's no price. giving them incentive. >> We are just Okay. >> It's interesting, I mean, clearly there are a large number of possible de signs that you could overlay in these [indiscernible]. You could actually offer people free trips between stations, just take a bike and go, bike now available. Free bike to go to this station. >> [Indiscernible]. for this trip. You're a platinum rider and your better bike is free >> For this trip. And get bikes back. So [indiscernible] designs and go beyond. And my guess is you came up with some really good assumptions and you have some really interesting mechanisms to go along with it. >> Adish Singla: Yeah. So basically, I think, what we could to assume is that there's no price. I mean, the riding is free because that's -30 minutes is free throughout. And so what we are saying is if you would ride a particular route, we are giving you some extra money. So that's the offers. Okay. So now, the specific pricing mechanism that we're using is what we proposed a little while ago. And this was proposed in the context of micro task platform and I'll tell you a little bit more about this pricing mechanism. So this -- to be explicitly clear, this price mechanism is offering incentives to users. It's not changing the pricing of the bike rental. Okay? So the way it would work is something like you know when a user appears and we have to make some price offer, we would make some offer based on what we have learned so far and then user has some private cost. So these are strategic users. They have some private cost and if our offer is more than our private cost, they would accept it and then they would essentially do the assigned task of picking up the bike from specific location and otherwise, they would reject the offer and then basically they would just execute their default choice. Okay. And our goal is to somehow interact with the users and be able to learn about this underlying price distribution, what is a good price to offer. Okay? So I'll tell you a little bit about the execution of this mechanism. So the main thing that we are trying to learn here is this cost curve F. Okay? So for now, let's say this cost curve is known to us and let's see what we could do if this cost curve was given to us. Now, let's consider one scenario where we have unlimited budget. So where we don't care about the budget constraint, we can make as huge offers as we want, but we have a constraint on the number of users we would interact with. In that case, the number of acceptance is essentially given by this blue cove. What it means is if you offer a particular price P, the probability of acceptance is this FP and then you multiply this with N, so that's number of acceptance. Let's consider the other extreme where we have a budget constraint but we have unlimited number of people we can interact with. In this case, the number of acceptance is actually given by this [indiscernible] B upon P. Putting these two together, we have basically a budget constraint and the constraint of number of people that we interact with that kind of dictates this underlying price in this time window of two hours that we are interacting with. >> I don't understand the BRP curve and how it matches with your X and Y access. >> Adish Singla: Basically you have unlimited of number people you can interact with. If you would offer a particular price P, what is the number of acceptances you can get? That's essentially this B over P. >> I see. >> Adish Singla: Okay? And if we know the cost curve the optimal fixed price to offer is given by these [indiscernible] so this is for a fixed B and N. This would be the price that system would like to offer in terms of a truthful mechanism. Okay? And then what we want to do is we want to learn this cost curve based on the interactions and what would learning mean here is that we can essentially offer a particular price point, 7, and then based on the number of acceptance or rejections we get we get some kind of mean incentive this value FP along with some kind of confidence intervals. >> So I'm confused by use the word truthful. their own bids. It didn't sound like posting >> Adish Singla: Yeah. So if a poster price mechanism, we make them offer, they say yes or no based on this. >> They're not revealing anything. own price. Truthfulness is about revealing their >> Adish Singla: So here basically -- yes. So [indiscernible] price mechanism so we don't ask them the true price but the non-truthful non strategy setting would be where we know their true price. >> So this is -- so optimal means you're using all of your budget? >> Adish Singla: Yes. Under this constraint over the number of users. >> Okay. So but okay. So solution of suboptimal if you either don't spend your budget or you spend all your budget and don't get as many people as you could have? >> Adish Singla: Mm-hmm. So you could either end up overpaying recruiting less people or too few efforts and everybody would reject and you won't be able to recruit. So optimal is where you're kind of making the right price offers so that you can just recruit enough number of people under this budget and constraint over end. >> Well, okay, so what's confusing me is suppose there's no empty or full stations anywhere. Right? We're saying it's optimal still if we're trying to incent people to go places they didn't want to go. >> Adish Singla: N is zero anyway there. N is the number of people that we hope to interact with that we are trying to recruit. >> I see. >> Adish Singla: So we have two constraints: budget and number of people. Together they would define what is the optimal strategy. >> [Indiscernible] the number of operations you want to do to balance it out. So it's not really number of people. It's the number of tasks you want to solve. >> I see. But the problem is those tasks -- N is an aggregation of a bunch of tasks. So one of the tasks may be going through a bad neighborhood, right, so the cost on this particular one is going to be really high. Adish Singla: Yes. So that's kind of simplifying assumption that we have. So we consider this almost kind of uniform, all the neighborhoods so we are not doing that distinction. That would be something like considering context to account. If you are in this neighborhood, maybe the prices should be higher. So I'll get back to that actually to kind of extend this where we can kind of vary these prices across different neighborhoods. >> Is P a function of the distance though? >> Adish Singla: >> P is the function [indiscernible] distance. It is or it's not? >> Adish Singla: It's not. >> So that's an aggregate too. >> Adish Singla: Yes. >> So coming back to Chris's question about truthfulness, does truthfulness mean that whenever you offer a person a price that is higher than their personal cost, they would accept it. >> Adish Singla: Yes. >> So they don't have any incentives to strategize that, okay, I'll wait for [indiscernible]. >> Adish Singla: No. No. But it's basically, since we don't know their private cost so you may end up paying more. So we make an offer, they say yes or no, and then we pay the price we offered them. Whereas in non-strategic setting we would really know their underlying cost and just give them that value. Okay? So what does this learning mean here is different prices, right? As we offer a more times, we can get kind of estimate kind of confidence intervals. And then prices as doing the process of learning experiment with different prices to get as some kind of confidence intervals. >> essentially just experimenting with particular price, and if we offer it of the curve here as well as some we can experiment with different the algorithm would be essentially some kind of mean estimates as well You can do this across all your stations right now? Adish Singla: Yes. >> [Indiscernible] station and intrinsically getting the distribution of where people go and so on. >> Adish Singla: >> Yes. So this is across all stations and all the users. Interesting to look at the separate stations -- >> Adish Singla: Yes. More of a cross stream kind of thing as well. >> [Indiscernible] top of a mountain for example difference that occurs. >> Right. It's really going to be a function of the parent, right? >> Yeah. But the comment is that but the distribution would tend to go from bike stations with the current -- the attentions are, for example. That could be cash advice by station location. Could be time of day location. [Indiscernible] their patterns and so on, but something across everything is fine to start. >> Adish Singla: And I will actually briefly mention about this taking context into account. That's the more realistic setting we can really optimize. So here, basically what is happening is more of it has a nice connection to this multi-arm bandit where basically we want to experiment with different prices and kind of learn utility. The key challenge here is additional notion of budget constraint and this different notion of prices of the actions which is not usually present there. So the caution here is from a learning point of view is that we want to do some kind of exploration exploitation but under this budget constraint dictated by this red code here. Now, what does exploration mean here? So that's something we would like to offer a price for which we have huge uncertainty at this point. So we can kind of [indiscernible] the uncertainty and learn more about this underlying code but what exploitation mean here is where our current predicted mean intersect this red code so that's something that you would do as kind of this gritty action or the exploitation. Now, how do we trade off this explore exploit dynamic? We use the ideas from online learning. What we do is use the optimistic estimate of the current predicted mean given by this upper bound dotted line. We dissect it with the red code and that's what our mechanism would propose. Okay? So this is essentially what is also being done in multi-arm bandit to use the optimistic estimate to decide which action to take. The key challenge is more the budget constraint coming into picture which makes the analysis quite tricky. Okay. So I'll quickly jump into some of the theoretical results and then go into the experimental results. When designing this kind of mechanism our goal is to essentially have this property of no regret. So what is a regret is how well our mechanism M does compared to this optimal fixed price mechanism P star. So that's what we are trying to optimize. So we are trying to get as close as possible to that optimal fixed price. And our going is to of course minimize this regret and what we can show here that using this simple intuitive idea about offering that price, we can bound the regret as follows. This is very nice intuitive terms. First part of the regret comes from the prices where we kind of offer low prices. So this is basically we fail to recruit because we had a bound on the number of people we would interact with so we failed to recruit people here. And the second part of the regret comes from prices where we essentially overpaid. Okay. So these are prices more than piece term so this comes from inefficient use of the budget. More importantly, what we can see here is that the regret is only log [indiscernible] which means that as you increases B as you learn more and more, you can essentially perform as good as compared to this piece style mechanism. >> Sorry. What is it? >> Adish Singla: Some [indiscernible] notion of utility of the mechanism. So here it would be number of acceptances. So here is basically what we were trying to optimize on the Y axis. >> So quick question. When your budget was [indiscernible] UCB algorithm? >> Adish Singla: It would be UCB algorithm. fixed budget, you get this bound. >> Only when [indiscernible], right? This is anytime. This is for a [Indiscernible]. >> Adish Singla: Yes. So [indiscernible] infinity anywhere reduce it to UCB, but basically, there is the fastest regret you can get anyways for a bounded budget as well. >> So you [indiscernible]. >> Adish Singla: Yes. Yes. >> I have a question. So what do you -- is there any assumption [indiscernible] you have to assume the higher price you offer, the higher chance of acceptance but is this a function that if you increase B, increase the price indefinitely, do you assume that eventually somebody is going to accept? >> Adish Singla: Yes. So [indiscernible] bound on the essentially we know the upper price and lowest price. So we can know the support of this distribution. >> But in reality, could we even for a very high price, nobody may accept. >> Adish Singla: >> This is the realistic -- >> Adish Singla: >> Yes. Yes, yes. Does that break the analysis somehow? >> Adish Singla: So it doesn't break actually. So here, in this analysis, we do assume that you actually go from zero to 1, F goes from zero to 1. But this analysis would still hold if you actually cap at .8. In fact, when we did some service to this, you actually see a cap at .8 because after that, doesn't matter how much you offer, people would just not accept it. >> So in the story you were telling about you, it's the number of acceptances. Now you have acceptances at each one of these time slices and so what was not clear to me is how many potential acceptances there are between each of the successive time places. Can you say -- >> Adish Singla: This result is running this mechanism for a fixed time slice. So that could be two hours or 1 day or 1 week. So this is a fixed time slice where you are given a budget, you are given some estimate of N and then you run this mechanism. >> And so it does not address the kind of multiple times place analysis. >> Adish Singla: >> It doesn't. Okay? It assumes that everything about the world stays the same. >> Adish Singla: Yes. So [indiscernible]. >> Wouldn't it be nice to know how sensitive the results would be to those kinds of considerations like time ->> Or how to formulate them into scenarios [indiscernible]. >> Yeah. >> Adish Singla: I mean, one thing that definitely happens is accounting for weather for instance. Then the whole dynamics change. And now, if you are using the same learning curve across two days and 1 day is sunny, one is rainy, it's not going to work. But if we have basically some kind of homogenous time slices, then it should perform reasonably okay. Okay? So I will continue with some of the evaluations that we did. So I'll focus on simulations and then I'll discuss some results from deployment. Okay? So basically, for simulations, we did actually very exhaustive simulations by using the data set that we obtained from [indiscernible] as well as we also did some service, that is to kind of learn to notice the underlying parameters of the users, and then we did the real deployment in Mainz, Germany with collaborators [indiscernible]. So let's start with the simulation experiments. So to perform these experiments, first of all, we need to simulate the dynamics of our bike sharing system. So what we did for this was we obtained this nice data set which was released by Boston Hubway. It's the data set of our one year time frame with complete rental information of 95 stations, 650 bikes, and about half a million rental information as well as filling level of every station recorded every minute. So it's a very detailed data set. And so basically that allows us to similar demand and supply in the bike sharing system, and on top of this, we need some truck repositioning policy because that's kind of the baseline positioning that is happening. So we implemented some truck repositioning policy which are kind of the most start of the art and that also kind of reflects what is being used in the real world. And finally, for simulations, we also need some parameters for user models so that we can kind of simulate how they would react to the incentives. And what we did ->> [Indiscernible]. >> Adish Singla: So truck repositioning is the default repositioning process. So trucks move the bikes from empty stations to free stations. So they have some kind of -- if you have €500 allocated to a policy for a day, then it would run for less than five hours and then it would keep on moving this. And this is essentially basically happening right now. So that's kind of what we want to compare against that instead of putting more budget into trucks, if we put some into incentives, how are the dynamics going to change. And for user model, so what we did is we did service studies in Mainz, Germany. And the idea was to essentially get some kind of realistic distributions about their cost curves as well as some walking distributions so that we can simulate them. Okay? So startling with the simulation results, so first of all, what this curve shows on the Y axis is this quality of service metric. That's the final metric that we are interested in maximizing and what you see on the X axis is some kind of tradeoff of budget allocated to trucks only on the left side and budget allocated to incentives only on the right side. So there are two different curves, one for €300 and one for €600 and then you do this tradeoff. So what we see here most interesting fact is that if you use just trucks only or if you use just incentives only, the system doesn't perform well. How will mixing them in some ratio, the exact ratio doesn't really matter because a flat is mostly flat in middle, essentially brings in this nice boost in the performance. So that kind of false assure this nice complimentary effect and this is kind of expected as well because trucks do more for macro level repositioning and incentives are doing more micro positioning depending on the current dynamics of the system as well as there's a temporal complimentary effect because trucks are only moving bikes early morning, late afternoon, whereas incentives are more focused during the peak hours. Okay? >> What is quality of service? >> Adish Singla: So quality of service is the portion of no service events. As a user, you went to pick up a bike but there was no bike there. And for the drop off scenario. >> So what do you do for that here? You're a distribution -- >> Adish Singla: So that's what we get from the survey studies. So for the simulation, we need some kind of underlying distribution for the users. So we got it from this part. So we basically did some service studies with the real users to get some kind of distribution of their walking distance and the prices that they're willing to accept. >> Do you have the [indiscernible] those results are to the -- >> Adish Singla: Actually not much. So we did a lot of studies where we are just using some distribution that we can essentially just simulate without actually reflecting the real world distributions and then we actually did this with the user studies. So it's not sensitive in a sense if you are learning it. It's very sensitive if you are just using some heuristic, let's say, mean price or mean price. >> [Indiscernible]. >> Adish Singla: Yes. So we kind of [indiscernible] sample from there and kind of assign it to the users. So it would actually become also clear about this [indiscernible] in terms of that the learning mechanism are quite robust but if you would use some heuristic then it's not. So what this board shows is again on the Y axis, we have quality of service. And on the X axis we are now increasing the daily budget. Okay? So there is existing policy which is currently running for three hours and on top of that, we are allocating more budget to trucks or different incentive schemes. So first of all, on the [indiscernible], what you see is that if I allocate more budget starting from zero to €600, the quality of service increased from .79 to -- to .9. Okay? Now we have two heuristic-based pricing scheme. One is min priced, which is basically just offering everybody 10 cents. There's the support of this distribution. And what you see here is that since a lot of people would just reject the offers, we won't be able to essentially use our budget in a nice way. So that's this min price policy. And what you see in the mean price is essentially just a mean of the distribution. So that already does slightly better than allocating more budget to trucks. And what you see in the green line is the learning mechanism that I proposed that I showed before is that it's able to do substantially better than the other incentive policies. >> So the min price mechanism, are you getting a budget surplus because you ->> Adish Singla: >> Yes. Okay. >> Adish Singla: So it's most of the budget is not being used. And so this mean price, mean price could be quite sensitive to the underlying distribution, whereas this learning policy is quite robust, actually. So we experimented with a lot of different policies. Okay? Finally, on the simulation side, what this shows is now, again, quality of service, but on the X axis, it's participation rate. Now, why this is important? Because when we would deploy some kind of system, maybe in the beginning, you only have five percent users who are using your incentive schemes. So what you would like to see is depending on the adoption rate of how many users are actually part of this new system, how would the system look like. So on the left, you have zero percent participation which means there are no incentives, nobody is using this new system. At one is the hundred percent participation rate and those are the results what I showed on the previous slides. And what this shows is that with the participation rate of our -- if you can get the participation rate 30 percent users in the city they are using our new system, then we are still already able to outperform that trucks. >> So I'm sorry. equivalent? We changed from N to quality of service. Are they >> Adish Singla: The equality of service is the final metric, and acceptance was more like a zero metric for the mechanism. Because quality of service depends on two decisions. Which stations you are offering incentives, when you are offering, and the mechanism of just another aspect where you are saying how much you're offering. So we kind of decouple this at that point but what we are interested in finally is this overall quality of service. >> So I can pick and choose my N and get different quality -- >> Adish Singla: N is something that's coming from the forecast. N is something says in the next two hours, you have this many people who are going to be coming at this problematic station and these are the people you can interact with. So on a rainy day, maybe you would have less number of people to interact with and on a busy day, you'll have more number of people to interact with. Okay? So I will present some of the deployment results. The summary of simulation. So this is how the deployment looks like basically. So this is in collaboration with electric and [indiscernible], 30 days pilot study on onus subset of users who agreed to participate. And this was for pickup scenario only. So what it looks like is what you see in this app. Basically a user would pull up the app, it would show you the current location where you can -- and it would show you the stations where you can pick up the bikes and it would also show you these red balloons which are essentially there are some extra incentive over there. So what a user can do is user can click on that red balloon, reserve the reward. And if he actually executes the action in the next 30 minutes, he'll get that balloon in his account. Okay. So I will illustrate, I'll show some of the qualitative results from this deployment now. So what this shows is some kind of heat map or the spatial distribution of accepted offers. Okay? So a few things to note here, first of all, so there's kind of the distribution is quite in-homogenous as one would imagine. Nobody is willing to kind of do this -- accept this kind of offers outside the city for example, so there's a lot of higher offers being accepted in the city center. So the size of the [indiscernible] essentially shows how many offers were accepted there. But also interesting is spread out at least in the main core city area which means there's a lot of dynamics going on in terms of where people accept these offers. Okay? >> How many accepts are there? >> Adish Singla: >> How many incentives is that? >> Adish Singla: >> So this is for 30 days with around ten percent users. So roughly 60 percent of offers were accepted. [Indiscernible] know how many [indiscernible]? >> Adish Singla: So it would be of -- >> A thousand? >> Adish Singla: >> [Indiscernible] thousand. Thousand. >> Adish Singla: Yeah. What this shows is different angle. So what this is quite interesting shows distribution of users of how many users accepted one offer, two offers, and so on. So interestingly, the mode is at one. There's a curiosity to download the app and try it at least once. And more importantly, there are many more users who tried it more than once. Surprisingly there are very few active users who kind of really use the system a lot. They were essentially kind of doing some kind of moving the bikes around to actually use it as a marketplace to earn money, so that was quite interesting. And this leads to some new insights of how this could be a marketplace. And one comment I would make here as well, so actually, in New York bike sharing system, so there is something called now valley parking because the bike stations are so full that there are actually people standing in front. So if you want to drop off a bike, there's no dock available, you just give some money for this valet parking and they will figure out how to drop off the bike. So there are some interesting marketplaces that could arise. >> Are you worried that people will game the system and they'll start creating congestion at the places where the chance exists to make money? you at all worried that people will game the system and start creating congestion? Are >> Adish Singla: There are a lot of strategic aspects. We don't think people place strategy aspects here, at least because they were kind of -they were more kind of volunteers who wanted to do this pilot study and we were also tracking how people behaving behind the doors. But if you think about it as a large marketplace, there are a lot of strategic aspects of how people can kind of game the system. Okay? This is something more of a qualitative collect as well. So this is essentially when we asked them to walk, how does the probability of acceptance change? So here, you see that after 700 meters, a probability of acceptance decreases quite drastically. And for around 100 meters, the probably of acceptance is close to .85. And finally, this is the temporal distribution of the accepted offers. So a few interesting things here. There is a peak in late morning, early afternoon. So that's actually quite useful because that acts as kind of compliment to the trucks because that's the time where it's difficult to deploy trucks. Trucks are mostly early morning, late afternoon or late evening. And also, the offers are spread throughout the date, even late in the night so this was quite interesting for us to look at. So that's kind of the summary of our deployment results. Now, just coming back to the challenges that I mentioned, basically, now, what we looked at is basically this bike sharing system and this is with one create example, now you can kind of connect with the challenges that I mentioned earlier. Right? And as we can see, there's a lot of opportunities for user modeling and machine learning technique to see contribute more towards the success of these systems. So now, before moving on to the next theme of the talk I'd quickly like to comment on two things. So first of all, this is some work related to extending the mechanisms for context. So earlier, all the mechanism were context free. So what is happening here is that as a mechanism, you can get some context let's say at time, weather or user features and user mechanism can make a more informed decision based on this context. And another thing I would like to comment here is that earlier, we modeled the learning as a multi-arm bandit problem. It may not be possible do so because you may not be able to actually compute the loss associated with the action that you took. So what we have been exploring in this work is kind of to formulate these problems as partial monitoring games which is essentially a much more generic and powerful framework for online learning. What I'll be happy to discuss more details about this off line. And another thing that I would like to comment which is a very recent work that we have been working on is learning incentives but in more complex scenarios. So here, the application setting is inspired from review aggregation on sites such as Airbnb or Yelp. And here, as you can imagine that a user seeks some recommendation and decides to go to some restaurant of a type I. Let's say a Mexican restaurant located in Redmond with 50 reviews. And as a system, as a Yelp, may want to maximize the system utility. For instance to gather more reviews for a newly opened restaurant. So here, the Yelp would like to make some discount offer to the user to still go to this restaurant, a Mediterranean restaurant in Bellevue center. So now, the price of this switching cost from I to J would depend on how similar or dissimilar these items are. And this would to some kind of learning problem that we looked at before. Now, if you think about that, there are N different types of restaurants, right? And the goal is to learn some kind of [indiscernible] switching cost, to switch from I to J. So it would look something like this. So we are trying to learn this probability distributions for every pair. So the key idea that we have been exploring in this work is that we want to exploit this structure of this hemi-metric, which is a relaxed motion of metric without symmetry. And then we are exploiting some kind of triangling of qualities defined by these probability distributions. So this is a very recent work that we have some very nice results and working on a number of follow-ups. This also I will be happy to provide more details off line. Okay. So now, so far we have focused on learning about incentives and how machine learning techniques could be useful. Now, one could flit the question around and ask how incentives could be useful to actually boost machine learning systems and to improve information gathering. So there's the second team of research in this and I will quickly give you a small peek of what we have been doing in this context. So especially what we have been working on is inspired from this application of community sensing. We are part of open sense project, which is a Swiss nationwide project with various collaborators and hospitals and the goal is to build community-sensing solution to build air quality maps at very high temporal and spatial resolution. And then understand their impact on public health. So what we have been doing here is first of all, developing truthful privacy-aware mechanisms so that users can participate in a system without fear of loss of privacy. And then we can compensate them for the data that they shared. Additionally, we have been looking at a new modality of data collection which is bike sensing. That is we deploy the bikes with small sensors which can then communicate with the smartphone apps of the users and transport the data into realtime mode, consideration of CO, CO2, humidity and temperature. And briefly, I would also mention a few things that I've been working at MSR during the internships. So generally, under the term of privacy where information gathering. So the first project that we did was on stochastic privacy, which is a new approach to privacy for data collection in online services and the key idea of stochastic privacy is that we want to provide guarantees to the users on privacy risk that is the probability -- abound in the probability of how their data will be assessed and used by the system. And now, users can choose this probability as desired and they can hope for increasing privacy risk for higher incentives. Another project that I worked on last year during internship was about information gathering in social networks with visibility constraints. What this basically means is that you can -- you have a social network that you want to query or get some information for instance in -- I'm here in Redmond and I want to inquire some information about Redmond. I want to find a set of people with some desired skills. And now the challenge here is how should I query which nodes should I recruit and pass this query so that I can essentially get this required information under this privacy constraints that you can only -- everybody can only see their local friends. Okay? So that's about second theme. And finally, a small peak on the third team of our research is inspired from the fact that learning itself acts as a powerful incentive. And essentially, drives human curiosity and participation in number of crowd-powered systems for instance in dual lingo where people are using the system to learn new languages. On citizens science projects, volunteers are using this to have some intrinsic motivation to learn more about the system. And from a recruiter's point of view, you may want to teach low-skill workers to learn to essentially improve their productivity in the long run. So we have been doing in this context is we developed some models of human learning as well as design some teaching policies, especially inspired from the task of image, annotation and classification task and we also looked into some personalized and adaptive teaching policies where we can adapt the teaching based on the underlying skills of the users and based on the responses that we get from them. I'm also happy to discuss more details of this work off line. And finally, to conclude, what we looked at is the spectrum of crowd-powered systems in our work with the goal of improving their effectiveness and we focused on one specific application, a bike sharing system and we looked at different challenges [indiscernible] of learning incentives and specifically focused on three dimensions of this interplay. And there are a lot of new crowd-powered systems and applications coming up and I think there's a huge opportunity for machine learning technique to see play a key role in the success of these crowd-powered systems and also machine learning systems can benefit from these systems as from the power of this crowd. With this, I conclude my talk, and thanks a lot. [Applause] >> I'll ask one question. >> Adish Singla: Sure. >> So the idea of number of accepts using number of accepts seems like it's problematic in practice because it doesn't model return policies. It's related to the question about gaming the system. Clearly in the deployment you had, it's possible for somebody to do something counterproductive to make more money. >> Adish Singla: Yes. >> And I think that's tightly coupled to the idea of using kind of the number of accepts as the criteria. >> Adish Singla: Right. >> Doing something counterproductive so you increase the number of accepts, possible accepts in the future. >> Adish Singla: Yeah. So what we have here is kind of an approximation because we decoupled this two steps of kind of we said when and where to offer these incentives. There was one part where we kind of decoupled it to maximizing the number of acceptance in the second phase with the hope that they would somehow lead to the quality of service but especially this is problematic in real world scenarios where the users can actually game the system. So the right way to set up would be having kind of thinking of the whole system as one big box and then we want to basically reason about the whole system. But we haven't if I recalled out kind of a nice way to reason out the whole system as one component. But I think there are a lot of possible interesting directions one could think about in this. >> One other question. What are your thoughts about including kind of return policy? So you mainly focus on pick up policy. So what are your thoughts about return policy? >> Adish Singla: So in the experiments that we did, the simulation experiments, those were for pickup and return both. It's just ->> What I mean is the incentives were for pickup. >> Adish Singla: >> Pickup and drop off both. Oh, there was an incentive for the return? >> Adish Singla: Yeah. From the simulations on both, but in the deployment, we just focus on pickup because drop off is a little more tricky than pickup in a sense that some users want to decide the drop off station beforehand sometimes. So it's more tricky to game -- it's more easier to game that system of drop off because for pickup, we know the true location of the user, as the main user is not gaming the GPS coordinates. So pickup is more easier to pick up in more real world in that sense. For drop off, we may need some kind of user tracking to figure out, okay, if user says, this is my drop off location. Is this really the right one of user just said it? But for simulations, they were both and they kind of showed compliment effect. Kind of the performance is equally improved by both the scenarios separately. >> [Indiscernible] some mix initiative exploration in this space where there's a lot of information you don't know about the user. So you're making decisions on an average basis for the general population. But the user knows where the user's had it or where the user's coming from. So if you could for example propose three different options to the user instead of proposing [indiscernible] if you take this bike to this location, I'll give you this much money. It could be like, here are three offers for you. You know? You can imagine that three of them works for you because you just want to distribute [indiscernible]. >> Adish Singla: Yes, yes, yes. >> And then the user can say, okay, this is close to where I'm going, so -have you thought about those ->> Adish Singla: >> Yeah. -- alternatives? >> Adish Singla: We did quite a bit of brainstorming and actually went to think about how to set up this system because you could think about maybe just give a set of initial points and destination points and the user can pick any initial and destination point then we actually did a lot of service studies with the Meinz users where this will actually -- we don't really decide the destination point in advance. So we kind of just make ->> The users don't -- >> Adish Singla: The users don't so we did a lot of this -- because other initial plan was to actually think of more like give initial and destination points and pick a bike like this. But what we got from survey studies was mostly like users during the pickup scenario, they just want to have some -there's a cents location and they want to pickup some nearby bike without thinking about the destination at that point. And similarly, during drop off scenario, they would just ride close to the destination and then they would open up the app to look for the destination. But I think those are very interesting scenarios where you kind of reason about ->> [Indiscernible] this plan [indiscernible] yet. >> Adish Singla: Yes. >> Like when they're actually doing the biking, they don't really think that, okay, I'm going to get some offers so I should check for that before I ->> Adish Singla: Yeah. So this is what is said without incentive part, right? This is how we kind of plan our script right now. But this could change if we actually put the system with incentives. Then the planning strategy may change at that point actually. But we did very exhaustive service studies in Meinz because we also want to be very careful that if we kind of have this new system in there and then kind of how it would affect user experience. So we were quite conservative in having something that users do usually. But I think there are a lot of interesting questions and kind of reasoning about more advanced policies. >> Thank you, Adish. >> Adish Singla: Thanks a lot.

36993

Related documents

Products

Support

36993

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib