36993

advertisement
36993
>> Eric Horvitz: It's a pleasure to have Adish Singla with us today. Adish
is a long-term Microsoft person. He actually was at Bing Search as a senior
development lead for over three years before going off to get his Ph.D. with
Andreas Krause at the learning and adaptive systems group, which I think in
part borrowed its name from its adaptive systems group here given Andreas his
advisor was an intern in our group. Actually co-located in time and space
with A.J. Kamar when she was an intern here. And Andreas went off to ETH.
And so we consider Adish a grandson intern. That's what we call interns of
interns of the past. And maybe we'll call the great grandsons soon. What do
you think? Anyway, Adish is here to speak about learning and incentives and
in crowd powered systems. He's done incredibly insightful work in human
computation and crowd sourcing and incentive mechanism design, submodular
optimization, among many other interesting projects with applications and
education, fairness in how bicycles are returned in bicycle sharing settings
in cities, photo sharing applications. So it's really -- he's taken insights
from quite a rich set of application areas and with each one, we find new
insights and ideas. So Adish.
>> Adish Singla: Thanks a lot, Eric, for the introduction. And thanks,
everyone, for joining the talk. So, I'm Adish from ETH Zurich. And in this
talk, I'm going to talk about crowd-powered systems and specifically the role
of learning and incentives. So let's begin by looking at the spectrum of
crowd-powered systems that we have. For instance, stack overflow, which is a
social [indiscernible] for questions in computer programming. And it has
over four million users and about 5,000 questions that are answered every
day. So broadly speaking, this is an informational system which is driven by
the content of the participants. And then we have significant science
projects where volunteers around the world help conduct scientific research.
On Galaxy Zoo alone, over millions of images have been annotated so far by
these volunteers. And on the other side, we have marketplaces such as
Mechanical Turk where workers perform micro tasks in return for monetary
incentives. And platforms like uproar, enable, outsourcing of the job to see
a huge talent pool of over ten million freelancers around the world. And an
emerging paradigm is the location-based services, especially enabled by the
growth of smartphone usage. For instance, on Waze, users can share realtime
traffic information with each other. And Uber and Airbnb are perhaps the two
recent examples of crowd powered systems that have entered our daily lives.
So this is a snapshot of only the success stories. Of course many more
systems developed, built, but unable to attract participants. So one of the
key ingredients for the success of this crowd powered systems is really the
active and effective participation of the users. So the key research
question that we're tackling a work is how to incentivize the users to
improve the overall effectiveness of these systems? Okay? So let's begin by
looking at some of the challenges in learning about the incentives for the
participants in these systems. For instance, if you take a very simple
example of marketplace like just Mechanical Turk, a simple caution is simply
how to price the task. Right? The requester has some kind of budget and
time constraints and over pricing the task would lead to ineffective users
budget whereas underpricing would lead to task starvation because nobody
would be willing to perform that task. Right? And similar challenges arise
in learning skills and expertise of the participants and then matching them
to tasks which are engaging for them. Right? And generally, we can think
about these challenges as modeling and learning the user preferences and
understanding the kind of tradeoffs that they make when they contribute to
the system. So I'll focus on one specific project for the rest around 30
minutes so that you can better appreciate some of the challenges that we have
been dealing with. And specifically, we're going to look at bike sharing
systems and how the operators can incentivize these users to balance the
bikes across the system. Okay? So what are bike sharing systems? Well,
it's an emerging paradigm of sustainable urban transportation. So what you
can do is you can pick up a bike from one part in the city very conveniently
and efficiently, go to another part of the city, and drop off the bike. And
you can have these stations at parts of the city which are difficult to reach
by cars for instance and is cheap and green alternative to other modes of
transport, especially car sharing. Right? So there are a bunch of nice
advantages and this increasing worldwide adoption. Right now, there are over
700 cities around the world serving close to one million bikes. And in
Seattle, there's a pronto bike sharing system which started operation in
October 2014 and currently serving 50 stations and over 500 bikes and the
plan is to grow this to over 2,000 bikes in the next few years. So any
guesses where is the largest bike share system in the world? Like to make a
guess?
>>
Zurich?
>> Adish Singla: Actually, Zurich doesn't have a bike sharing system. So
the largest one is actually in east China. It has amazingly large number of
3000 stations and 60,000 bikes and they have dedicated lanes throughout the
city for these bikes. The second largest is in Paris. There we have
basically bike sharing system with 2000 stations. And the largest one in
North America is tied between Montreal, Chicago, New York, and Mexico City,
which have around 450 stations and around 6,000 bikes. Okay. So, how do
people use these bike sharing systems? As you would imagine, as a user, you
decide to go somewhere. So what you would do is you would pick up a bike
from a nearby station. Drive to a station close to destination and drop off
your bike. Right? And what could happen is that a bunch of users who are
picking up the bikes from the same station and as the stations have a limited
capacity, some stations may get empty. And similarly, of course, what could
happen is that a bunch of users who are driving to the same destination
station and given the limited docks at the station, some stations may get
full. So empty and full stations look like this. And basic, if you are at
an empty station and you want to pick up a bike, you won't find any. And at
a full station, it will be hard for you to drop off a bike. And as a user,
you would get disappointed. You would have to go to another station and may
not use the system again. So as a system operator, this is something you
would really like to avoid this situation. Okay? So what could be a way to
solve this problem? Well, one very obvious one is to move the bikes from
full station and put them back to the empty stations. Right? And this is
one way how it is done. So this -- the truck in this picture is from our
collaborators in Mainz, Germany. So which basically moves from station to
station and drop the bikes from full station to the empty station. Okay? So
this is a solution. However, some of this is not so much in the green
concept, in the spirit of green concept of bike sharing systems, right? It's
an expensive operation. It may be difficult to do truck-based repositioning,
especially during peak hours and at the inner parts of the city. And of
course, it leads to traffic congestion and pollution. Right? So what could
be an alternative? Well, thinking of this in terms of crowd sourcing and
thinking of this as a crowd powered system, one alternative way could be if
we could get some of this users of the system itself to move these bikes. So
the key research caution that we ask here is how we can engage it's users in
this repositioning process to help solve this imbalance problem. Okay? So
let's briefly look what this engagement would look like. So the idea is very
simple. So let's say a user is going to drop off a bike at some station and
there's a station close by which is actually empty. So now, if we could
somehow convince the user to go to this empty station and do a little bit of
extra walking, then it's useful, it's good for the system because this
station is not empty anymore. Right? And similarly of course, a user wants
to pick up a bike and there is a full station nearby, so instead, we would
like that user to go to that full station to pick up the bike. Okay? So
that's our idea of engagement and how we want to kind of incentivize users
for this extra walking, extra that they have to put. Okay. So there have
been a bunch of recent work in trying to figure out these incentives in these
bike sharing systems. One recent one is basically what is used by Paris.
Velib bike sharing system. They have something what is a non-monetary
incentive. So if you are going to some elevated stations, so they have
premarked problematic stations for instance which are at higher altitudes, so
users get 15 extra minutes when they're going towards that station. And the
idea is that to bring more bikes to these problematic stations. Okay. So
what we have been doing in our work is a little bit more adaptive and more
dynamic. First of all, we offer monitoring incentives. So when user
actually execute our task, the money actually goes into their accounts
directly. So that acts as a stronger incentive than just having extra
minutes. Then we have more dynamic system where we decide based on the
current demand of the system as well as future forecast which stations we
should offer these incentives for. Right? And we also have learning
policies which can interact with the users and based on user feedback, we can
update what kind of price offers to make. And finally, our goal is to
maximize the overall quality of service. Okay? So one accepted notion of
this overall quality of service is to minimize no service events. So what is
no service event? Basically if a user wants to pick up a bike or drop off a
bike but is unable to do so because the station was empty or full. So that's
a no service event. Okay. And our goal is to minimize this to improve the
overall user satisfaction. So let me tell you a little bit about the overall
architecture that we have been developing in this context. So it looks like
this. So first of all, we have the hardware components, what is the rental
stations and the bikes, which is being managed by the bike sharing operator.
So in this context, we are collaborating with this company MVGmeinRad which
is located in Mainz, Germany. And then we have a virtual infrastructure
which basically keeps track of user accounts, makes some forecast about the
future traffic as well as keep track of stations and bikes and this forecast
is then also used by the repositioning policies. And this virtual
infrastructure is managed by a spin off electric field based in Zurich. And
what we have been developing on top of this is first of all, we are
developing a component incentive scheme will basically decide or reason about
which stations we should offer incentives based on the current load in the
stem as well as a pricing mechanism to learn based on interactions with the
users as well as also it has APIs to communicate with the rest of the
infrastructure to keep track of the latest system. And then we have a user
more link component which essentially reasons about how a user would respond
to the incentives we offer. And then we can make informed decisions based on
this component. And on top of this, what we have develop is a smartphone app
which basically allows us to interact with the users so what it would do is
user would open up this app to look for nearby stations. Also it would show
some specific stations where users can get these offers. So let me say a
little bit more about each of those components now. So let's start with
users. So users are associated with the location which is basically their
current location where they want to pick up the bike or drop off the bike and
we have a very -- quite simple model which basically assumes that there's a
maximum distance beyond which users are not willing to walk. So this is what
we call D max. So some kind of psychological variable. Let's say users are
never going to accept to walk more than 500 meters. So we can only make our
first within this radius of 500 meters. And users have some private cost.
So that's the sort of effort they have to put to go to an alternate station.
That's given by the C sub-U. And this cost is sampled from some underlying
distribution with [indiscernible] and this is the cost curve that we would
try to learn by interactions with the users.
>>
[Indiscernible] be fair and not too personalize costs correct?
>> Adish Singla:
>>
Okay.
Yes.
That would be useful [indiscernible].
>> Adish Singla:
>>
Yes.
The probability of [indiscernible].
>> Adish Singla:
>>
I will just comment on this in 30 seconds.
And with D max, that's a hard -- [indiscernible].
>> Adish Singla:
>>
Yes.
So here, yes.
[Indiscernible] assumptions move on it's how we okay it.
>> Adish Singla: Getting back to this -- some of more realistic models
actually. So here, what is happening is, yeah, we infer the location from
the app. So D max is something as a fixed threshold and this is something
that we inferred from the bike sharing operators and we take one fixed max
for this population. And affix something that is learned by the mechanism.
So few simplifying assumptions and also as kind of first experiment to deal
with fairness so we don't have any personalization aspect in this so there's
one unified F that we assume for the whole population. And some
simplifications that we make I'll come back to those later that we are not
considering context for instance time, weather, or the actual walking
distance required. And we can extend some of this results to this setting
which I'll mention later. But in this ->> [Indiscernible] is the standard [indiscernible].
standard for all [indiscernible] or just ->> Adish Singla:
yes.
>>
[Indiscernible] the
So this is [indiscernible] and for the whole population,
[Indiscernible].
>> Adish Singla:
Yes.
>> Also the D max question. Is the idea of D max, it's the distance between
stations? Or sit a distance -- so I could imagine that if my true
destination is someplace you don't know about, then the distance that I'm
willing to walk is unknown, right?
>> Adish Singla: Yeah, so distance is something that how much you would be
willing to walk generally. So let's say we never want to make offer more
than where you guys have to walk more than 1000 meters. So this is
independent of the locations of the stations.
>> But the distance that I'm willing to -- the distance that I will have to
walk is a function of where I'm going, are which you don't know.
>> Adish Singla: Yeah, so this is -- so given you have picked some station
where you are going anyways, this is the alternate station or recommendation
that I'm making. So basically, I know your current station, and I want to
say yes to you an alternate station. So that station I would never suggest
to you if you have to walk more than 500 meters. This is only for the part
of this incentives or recommendations.
>> I see. So if my destination is here, the station is within 500 meters of
there and then I have -- you recommend that I walk 400 meters the wrong way
and I would actually have to walk 800 meters to get where I want to go.
>> Adish Singla: So we don't care about the actual -- we don't know about
the true destination. But this is only what the preferred station and
recommended station.
>> So what is the main motivation for users to use bike sharing?
of going places where public transportation is not available for?
exercise or they just like this way of going?
Is it kind
Or to
>> Adish Singla: So few things. So first of all, basically, the first
30 minutes of bike sharing is free. So basically provides you very fast and
flexible where you can just pick up the bike and go somewhere and it's free
for 30 minutes. And that essentially covers most of the span of the city
that it provides.
>>
[Indiscernible] which of the -- which of the bike sharing schemes?
>> Adish Singla: For most of the bike sharing schemes. All over the world,
first 30 minutes are free. So there's one annual membership that you pay.
>> So it [indiscernible] the model's the same throughout the world, the
first distance free.
>> Adish Singla: Yes. And then you have basically hourly rate after that.
So that's actually a strong incentive to just pick up the bike and go
somewhere and I mean, if you have around 200 stations located in the city
where it is, you can always find some bike. And I think it's also kind of a
lifestyle question that people are preferring to go this way compared to cars
or public transport.
>> I wonder if there's a way to combine this method with public
transportation to expand the D max radius in some cases. I know that people
prefer biking but imagine there are two stations and they're easily, you
know, it's not a walking distance that people would actually take a bus to go
there.
>> Adish Singla:
>>
Yes.
Would you consider that to increase your radius?
>> Adish Singla: It could be, but I think in most settings, the stations are
actually quite close. So in a sense that you always have a few stations
within this radius. So stations are quite dense actually. And we
wouldn't -- I mean, basically that would definitely be possibilities to
extend this in more scenarios, but this is the simple setting where we say so
users don't have a lot of burden so they basically say this is where I'm
going, there's an alternate location just ten minutes walking so I can just
go there. Okay? He so, basically that's about the users. I'll get back to
some of these questions again later as well. So this is about the users and
now let's look into all the stations. So for station S, what we have is the
total capacity of that station. That's the number of bikes that the station
can hold. And there's some current load, that is how many bikes are there at
that station at this time. And then we have some near future forecast of how
the load is going to change let's say next two hours. So based on this, what
we do is we classify stations as near empty or near full, that is there is
some high chance that these stations would get empty or full in near future
and these are the problematic stations that we want to deal with in this time
window. Okay. So the way it looks like is something like this, right?
Let's consider this pickup scenario and let's say a user appears at this
location L sub-U. Okay. So by default, we assume that user is going to pick
to the nearest station. Okay. So that's again a model assumption that user
is going to pick to the nearest station. And what we would do is now, we
would look at this radius that we think user would walk if we would provide
him some incentives and we will look at the problematic stations in that
radius. Okay? Now, what we would do is we will give users some incentives
if you find some problematics station in that radius. Okay? So this is what
user would see from kind of the app when he would open up the screen. Okay.
So that's essentially answers this question of how do we pick which stations
to offer incentives and when. Okay. So now the next question is how do we
decide this price offers. So I'll discuss that. And so now, essentially
given that we are fixed which stations we are picking, now the question is
how do we offer this incentives, what price should we offer, and of course,
we want to do this under some kind of budget constraints. And the way we
view this is that we have some kind of continuous budget allocated to the
system. Let's say every two hours, system allocates us €200. And then we
also have some notion of number of users we would interact with this -- in
this time window. So let's say based on the traffic forecast, we know how
many users we're going make these offers to. And our goal is to make these
offers so that we can maximize the number of acceptance in this case. Okay?
>>
Can you raise the price or are you just -- you're only --
>> Adish Singla: It's going to fluctuate until it kind of learns
good price to offer. But this kind of a learning mechanism where
to offer a price and then user is going to accept and reject, and
on that, we -- if he's always getting rejection, then we probably
raising the price.
>>
what is a
it's going
then based
end up
But in principle, you can make the price [indiscernible] station?
>> Adish Singla: Yes. Yes. So that's where the budget constraint comes in.
Basically, you have some kind of constraint that you cannot just change the
prices as you would like.
>> You have some partner. Is the partner allowing you to increase prices
beyond what they're already charging?
>> Adish Singla: So this is just the offers that they're getting.
they're getting this money.
>>
So
[Indiscernible] negative.
>> Adish Singla:
So we are not doing that.
Yes.
So we are basically --
>> [Indiscernible] reducing user prices, right?
prices.
>> Adish Singla:
>>
So we are just giving them money.
You were discounting potentially the negative --
>> Adish Singla:
>>
You're not allowed to raise
Yes.
[Indiscernible].
>> Adish Singla:
Right.
>> I thought your original question, I was wondering, you can't actually
raise prices.
>> Adish Singla: Yeah. So you can actually raise prices as well. That's
something that Uber does for instance. Sometimes if there's a huge traffic,
then it will charge you 1.5 times. But the question -- the thing in bike
sharing system is that user are kind of used to this free pricing. So the
first 30 minutes are free anyways. So basically ->>
So your mechanism is just lowering prices.
>> Adish Singla: We are offering them -- so there's no price.
giving them incentive.
>>
We are just
Okay.
>> It's interesting, I mean, clearly there are a large number of possible de
signs that you could overlay in these [indiscernible]. You could actually
offer people free trips between stations, just take a bike and go, bike now
available. Free bike to go to this station.
>> [Indiscernible].
for this trip.
You're a platinum rider and your better bike is free
>> For this trip. And get bikes back. So [indiscernible] designs and go
beyond. And my guess is you came up with some really good assumptions and
you have some really interesting mechanisms to go along with it.
>> Adish Singla: Yeah. So basically, I think, what we could to assume is
that there's no price. I mean, the riding is free because that's -30 minutes is free throughout. And so what we are saying is if you would
ride a particular route, we are giving you some extra money. So that's the
offers. Okay. So now, the specific pricing mechanism that we're using is
what we proposed a little while ago. And this was proposed in the context of
micro task platform and I'll tell you a little bit more about this pricing
mechanism. So this -- to be explicitly clear, this price mechanism is
offering incentives to users. It's not changing the pricing of the bike
rental. Okay? So the way it would work is something like you know when a
user appears and we have to make some price offer, we would make some offer
based on what we have learned so far and then user has some private cost. So
these are strategic users. They have some private cost and if our offer is
more than our private cost, they would accept it and then they would
essentially do the assigned task of picking up the bike from specific
location and otherwise, they would reject the offer and then basically they
would just execute their default choice. Okay. And our goal is to somehow
interact with the users and be able to learn about this underlying price
distribution, what is a good price to offer. Okay? So I'll tell you a
little bit about the execution of this mechanism. So the main thing that we
are trying to learn here is this cost curve F. Okay? So for now, let's say
this cost curve is known to us and let's see what we could do if this cost
curve was given to us. Now, let's consider one scenario where we have
unlimited budget. So where we don't care about the budget constraint, we can
make as huge offers as we want, but we have a constraint on the number of
users we would interact with. In that case, the number of acceptance is
essentially given by this blue cove. What it means is if you offer a
particular price P, the probability of acceptance is this FP and then you
multiply this with N, so that's number of acceptance. Let's consider the
other extreme where we have a budget constraint but we have unlimited number
of people we can interact with. In this case, the number of acceptance is
actually given by this [indiscernible] B upon P. Putting these two together,
we have basically a budget constraint and the constraint of number of people
that we interact with that kind of dictates this underlying price in this
time window of two hours that we are interacting with.
>> I don't understand the BRP curve and how it matches with your X and Y
access.
>> Adish Singla: Basically you have unlimited of number people you can
interact with. If you would offer a particular price P, what is the number
of acceptances you can get? That's essentially this B over P.
>>
I see.
>> Adish Singla: Okay? And if we know the cost curve the optimal fixed
price to offer is given by these [indiscernible] so this is for a fixed B and
N. This would be the price that system would like to offer in terms of a
truthful mechanism. Okay? And then what we want to do is we want to learn
this cost curve based on the interactions and what would learning mean here
is that we can essentially offer a particular price point, 7, and then based
on the number of acceptance or rejections we get we get some kind of mean
incentive this value FP along with some kind of confidence intervals.
>> So I'm confused by use the word truthful.
their own bids.
It didn't sound like posting
>> Adish Singla: Yeah. So if a poster price mechanism, we make them offer,
they say yes or no based on this.
>> They're not revealing anything.
own price.
Truthfulness is about revealing their
>> Adish Singla: So here basically -- yes. So [indiscernible] price
mechanism so we don't ask them the true price but the non-truthful non
strategy setting would be where we know their true price.
>>
So this is -- so optimal means you're using all of your budget?
>> Adish Singla:
Yes.
Under this constraint over the number of users.
>> Okay. So but okay. So solution of suboptimal if you either don't spend
your budget or you spend all your budget and don't get as many people as you
could have?
>> Adish Singla: Mm-hmm. So you could either end up overpaying recruiting
less people or too few efforts and everybody would reject and you won't be
able to recruit. So optimal is where you're kind of making the right price
offers so that you can just recruit enough number of people under this budget
and constraint over end.
>> Well, okay, so what's confusing me is suppose there's no empty or full
stations anywhere. Right? We're saying it's optimal still if we're trying
to incent people to go places they didn't want to go.
>> Adish Singla: N is zero anyway there. N is the number of people that we
hope to interact with that we are trying to recruit.
>>
I see.
>> Adish Singla: So we have two constraints: budget and number of people.
Together they would define what is the optimal strategy.
>> [Indiscernible] the number of operations you want to do to balance it
out. So it's not really number of people. It's the number of tasks you want
to solve.
>> I see. But the problem is those tasks -- N is an aggregation of a bunch
of tasks. So one of the tasks may be going through a bad neighborhood,
right, so the cost on this particular one is going to be really high.
Adish Singla: Yes. So that's kind of simplifying assumption that we have.
So we consider this almost kind of uniform, all the neighborhoods so we are
not doing that distinction. That would be something like considering context
to account. If you are in this neighborhood, maybe the prices should be
higher. So I'll get back to that actually to kind of extend this where we
can kind of vary these prices across different neighborhoods.
>>
Is P a function of the distance though?
>> Adish Singla:
>>
P is the function [indiscernible] distance.
It is or it's not?
>> Adish Singla:
It's not.
>> So that's an aggregate too.
>> Adish Singla:
Yes.
>> So coming back to Chris's question about truthfulness, does truthfulness
mean that whenever you offer a person a price that is higher than their
personal cost, they would accept it.
>> Adish Singla:
Yes.
>> So they don't have any incentives to strategize that, okay, I'll wait for
[indiscernible].
>> Adish Singla: No. No. But it's basically, since we don't know their
private cost so you may end up paying more. So we make an offer, they say
yes or no, and then we pay the price we offered them. Whereas in
non-strategic setting we would really know their underlying cost and just
give them that value. Okay?
So what does this learning mean here is
different prices, right? As we offer a
more times, we can get kind of estimate
kind of confidence intervals. And then
prices as doing the process of learning
experiment with different prices to get
as some kind of confidence intervals.
>>
essentially just experimenting with
particular price, and if we offer it
of the curve here as well as some
we can experiment with different
the algorithm would be essentially
some kind of mean estimates as well
You can do this across all your stations right now?
Adish Singla:
Yes.
>> [Indiscernible] station and intrinsically getting the distribution of
where people go and so on.
>> Adish Singla:
>>
Yes.
So this is across all stations and all the users.
Interesting to look at the separate stations --
>> Adish Singla:
Yes.
More of a cross stream kind of thing as well.
>>
[Indiscernible] top of a mountain for example difference that occurs.
>>
Right.
It's really going to be a function of the parent, right?
>> Yeah. But the comment is that but the distribution would tend to go from
bike stations with the current -- the attentions are, for example. That
could be cash advice by station location. Could be time of day location.
[Indiscernible] their patterns and so on, but something across everything is
fine to start.
>> Adish Singla: And I will actually briefly mention about this taking
context into account. That's the more realistic setting we can really
optimize. So here, basically what is happening is more of it has a nice
connection to this multi-arm bandit where basically we want to experiment
with different prices and kind of learn utility. The key challenge here is
additional notion of budget constraint and this different notion of prices of
the actions which is not usually present there. So the caution here is from
a learning point of view is that we want to do some kind of exploration
exploitation but under this budget constraint dictated by this red code here.
Now, what does exploration mean here? So that's something we would like to
offer a price for which we have huge uncertainty at this point. So we can
kind of [indiscernible] the uncertainty and learn more about this underlying
code but what exploitation mean here is where our current predicted mean
intersect this red code so that's something that you would do as kind of this
gritty action or the exploitation. Now, how do we trade off this explore
exploit dynamic? We use the ideas from online learning. What we do is use
the optimistic estimate of the current predicted mean given by this upper
bound dotted line. We dissect it with the red code and that's what our
mechanism would propose. Okay? So this is essentially what is also being
done in multi-arm bandit to use the optimistic estimate to decide which
action to take. The key challenge is more the budget constraint coming into
picture which makes the analysis quite tricky. Okay. So I'll quickly jump
into some of the theoretical results and then go into the experimental
results. When designing this kind of mechanism our goal is to essentially
have this property of no regret. So what is a regret is how well our
mechanism M does compared to this optimal fixed price mechanism P star. So
that's what we are trying to optimize. So we are trying to get as close as
possible to that optimal fixed price. And our going is to of course minimize
this regret and what we can show here that using this simple intuitive idea
about offering that price, we can bound the regret as follows. This is very
nice intuitive terms. First part of the regret comes from the prices where
we kind of offer low prices. So this is basically we fail to recruit because
we had a bound on the number of people we would interact with so we failed to
recruit people here. And the second part of the regret comes from prices
where we essentially overpaid. Okay. So these are prices more than piece
term so this comes from inefficient use of the budget. More importantly,
what we can see here is that the regret is only log [indiscernible] which
means that as you increases B as you learn more and more, you can essentially
perform as good as compared to this piece style mechanism.
>>
Sorry.
What is it?
>> Adish Singla: Some [indiscernible] notion of utility of the mechanism.
So here it would be number of acceptances. So here is basically what we were
trying to optimize on the Y axis.
>>
So quick question.
When your budget was [indiscernible] UCB algorithm?
>> Adish Singla: It would be UCB algorithm.
fixed budget, you get this bound.
>>
Only when [indiscernible], right?
This is anytime.
This is for a
[Indiscernible].
>> Adish Singla: Yes. So [indiscernible] infinity anywhere reduce it to
UCB, but basically, there is the fastest regret you can get anyways for a
bounded budget as well.
>>
So you [indiscernible].
>> Adish Singla:
Yes.
Yes.
>> I have a question. So what do you -- is there any assumption
[indiscernible] you have to assume the higher price you offer, the higher
chance of acceptance but is this a function that if you increase B, increase
the price indefinitely, do you assume that eventually somebody is going to
accept?
>> Adish Singla: Yes. So [indiscernible] bound on the essentially we know
the upper price and lowest price. So we can know the support of this
distribution.
>>
But in reality, could we even for a very high price, nobody may accept.
>> Adish Singla:
>>
This is the realistic --
>> Adish Singla:
>>
Yes.
Yes, yes.
Does that break the analysis somehow?
>> Adish Singla: So it doesn't break actually. So here, in this analysis,
we do assume that you actually go from zero to 1, F goes from zero to 1. But
this analysis would still hold if you actually cap at .8. In fact, when we
did some service to this, you actually see a cap at .8 because after that,
doesn't matter how much you offer, people would just not accept it.
>> So in the story you were telling about you, it's the number of
acceptances. Now you have acceptances at each one of these time slices and
so what was not clear to me is how many potential acceptances there are
between each of the successive time places. Can you say --
>> Adish Singla: This result is running this mechanism for a fixed time
slice. So that could be two hours or 1 day or 1 week. So this is a fixed
time slice where you are given a budget, you are given some estimate of N and
then you run this mechanism.
>>
And so it does not address the kind of multiple times place analysis.
>> Adish Singla:
>>
It doesn't.
Okay?
It assumes that everything about the world stays the same.
>> Adish Singla:
Yes.
So [indiscernible].
>> Wouldn't it be nice to know how sensitive the results would be to those
kinds of considerations like time ->>
Or how to formulate them into scenarios [indiscernible].
>>
Yeah.
>> Adish Singla: I mean, one thing that definitely happens is accounting for
weather for instance. Then the whole dynamics change. And now, if you are
using the same learning curve across two days and 1 day is sunny, one is
rainy, it's not going to work. But if we have basically some kind of
homogenous time slices, then it should perform reasonably okay. Okay? So I
will continue with some of the evaluations that we did. So I'll focus on
simulations and then I'll discuss some results from deployment. Okay? So
basically, for simulations, we did actually very exhaustive simulations by
using the data set that we obtained from [indiscernible] as well as we also
did some service, that is to kind of learn to notice the underlying
parameters of the users, and then we did the real deployment in Mainz,
Germany with collaborators [indiscernible]. So let's start with the
simulation experiments. So to perform these experiments, first of all, we
need to simulate the dynamics of our bike sharing system. So what we did for
this was we obtained this nice data set which was released by Boston Hubway.
It's the data set of our one year time frame with complete rental information
of 95 stations, 650 bikes, and about half a million rental information as
well as filling level of every station recorded every minute. So it's a very
detailed data set. And so basically that allows us to similar demand and
supply in the bike sharing system, and on top of this, we need some truck
repositioning policy because that's kind of the baseline positioning that is
happening. So we implemented some truck repositioning policy which are kind
of the most start of the art and that also kind of reflects what is being
used in the real world. And finally, for simulations, we also need some
parameters for user models so that we can kind of simulate how they would
react to the incentives. And what we did ->>
[Indiscernible].
>> Adish Singla: So truck repositioning is the default repositioning
process. So trucks move the bikes from empty stations to free stations. So
they have some kind of -- if you have €500 allocated to a policy for a day,
then it would run for less than five hours and then it would keep on moving
this. And this is essentially basically happening right now. So that's kind
of what we want to compare against that instead of putting more budget into
trucks, if we put some into incentives, how are the dynamics going to change.
And for user model, so what we did is we did service studies in Mainz,
Germany. And the idea was to essentially get some kind of realistic
distributions about their cost curves as well as some walking distributions
so that we can simulate them. Okay? So startling with the simulation
results, so first of all, what this curve shows on the Y axis is this quality
of service metric. That's the final metric that we are interested in
maximizing and what you see on the X axis is some kind of tradeoff of budget
allocated to trucks only on the left side and budget allocated to incentives
only on the right side. So there are two different curves, one for €300 and
one for €600 and then you do this tradeoff. So what we see here most
interesting fact is that if you use just trucks only or if you use just
incentives only, the system doesn't perform well. How will mixing them in
some ratio, the exact ratio doesn't really matter because a flat is mostly
flat in middle, essentially brings in this nice boost in the performance. So
that kind of false assure this nice complimentary effect and this is kind of
expected as well because trucks do more for macro level repositioning and
incentives are doing more micro positioning depending on the current dynamics
of the system as well as there's a temporal complimentary effect because
trucks are only moving bikes early morning, late afternoon, whereas
incentives are more focused during the peak hours. Okay?
>>
What is quality of service?
>> Adish Singla: So quality of service is the portion of no service events.
As a user, you went to pick up a bike but there was no bike there. And for
the drop off scenario.
>>
So what do you do for that here?
You're a distribution --
>> Adish Singla: So that's what we get from the survey studies. So for the
simulation, we need some kind of underlying distribution for the users. So
we got it from this part. So we basically did some service studies with the
real users to get some kind of distribution of their walking distance and the
prices that they're willing to accept.
>>
Do you have the [indiscernible] those results are to the --
>> Adish Singla: Actually not much. So we did a lot of studies where we are
just using some distribution that we can essentially just simulate without
actually reflecting the real world distributions and then we actually did
this with the user studies. So it's not sensitive in a sense if you are
learning it. It's very sensitive if you are just using some heuristic, let's
say, mean price or mean price.
>>
[Indiscernible].
>> Adish Singla: Yes. So we kind of [indiscernible] sample from there and
kind of assign it to the users. So it would actually become also clear about
this [indiscernible] in terms of that the learning mechanism are quite robust
but if you would use some heuristic then it's not. So what this board shows
is again on the Y axis, we have quality of service. And on the X axis we are
now increasing the daily budget. Okay? So there is existing policy which is
currently running for three hours and on top of that, we are allocating more
budget to trucks or different incentive schemes. So first of all, on the
[indiscernible], what you see is that if I allocate more budget starting from
zero to €600, the quality of service increased from .79 to -- to .9. Okay?
Now we have two heuristic-based pricing scheme. One is min priced, which is
basically just offering everybody 10 cents. There's the support of this
distribution. And what you see here is that since a lot of people would just
reject the offers, we won't be able to essentially use our budget in a nice
way. So that's this min price policy. And what you see in the mean price is
essentially just a mean of the distribution. So that already does slightly
better than allocating more budget to trucks. And what you see in the green
line is the learning mechanism that I proposed that I showed before is that
it's able to do substantially better than the other incentive policies.
>> So the min price mechanism, are you getting a budget surplus because
you ->> Adish Singla:
>>
Yes.
Okay.
>> Adish Singla: So it's most of the budget is not being used. And so this
mean price, mean price could be quite sensitive to the underlying
distribution, whereas this learning policy is quite robust, actually. So we
experimented with a lot of different policies. Okay? Finally, on the
simulation side, what this shows is now, again, quality of service, but on
the X axis, it's participation rate. Now, why this is important? Because
when we would deploy some kind of system, maybe in the beginning, you only
have five percent users who are using your incentive schemes. So what you
would like to see is depending on the adoption rate of how many users are
actually part of this new system, how would the system look like. So on the
left, you have zero percent participation which means there are no
incentives, nobody is using this new system. At one is the hundred percent
participation rate and those are the results what I showed on the previous
slides. And what this shows is that with the participation rate of our -- if
you can get the participation rate 30 percent users in the city they are
using our new system, then we are still already able to outperform that
trucks.
>> So I'm sorry.
equivalent?
We changed from N to quality of service.
Are they
>> Adish Singla: The equality of service is the final metric, and acceptance
was more like a zero metric for the mechanism. Because quality of service
depends on two decisions. Which stations you are offering incentives, when
you are offering, and the mechanism of just another aspect where you are
saying how much you're offering. So we kind of decouple this at that point
but what we are interested in finally is this overall quality of service.
>>
So I can pick and choose my N and get different quality --
>> Adish Singla: N is something that's coming from the forecast. N is
something says in the next two hours, you have this many people who are going
to be coming at this problematic station and these are the people you can
interact with. So on a rainy day, maybe you would have less number of people
to interact with and on a busy day, you'll have more number of people to
interact with. Okay? So I will present some of the deployment results. The
summary of simulation. So this is how the deployment looks like basically.
So this is in collaboration with electric and [indiscernible], 30 days pilot
study on onus subset of users who agreed to participate. And this was for
pickup scenario only. So what it looks like is what you see in this app.
Basically a user would pull up the app, it would show you the current
location where you can -- and it would show you the stations where you can
pick up the bikes and it would also show you these red balloons which are
essentially there are some extra incentive over there. So what a user can do
is user can click on that red balloon, reserve the reward. And if he
actually executes the action in the next 30 minutes, he'll get that balloon
in his account. Okay. So I will illustrate, I'll show some of the
qualitative results from this deployment now. So what this shows is some
kind of heat map or the spatial distribution of accepted offers. Okay? So a
few things to note here, first of all, so there's kind of the distribution is
quite in-homogenous as one would imagine. Nobody is willing to kind of do
this -- accept this kind of offers outside the city for example, so there's a
lot of higher offers being accepted in the city center. So the size of the
[indiscernible] essentially shows how many offers were accepted there. But
also interesting is spread out at least in the main core city area which
means there's a lot of dynamics going on in terms of where people accept
these offers. Okay?
>>
How many accepts are there?
>> Adish Singla:
>>
How many incentives is that?
>> Adish Singla:
>>
So this is for 30 days with around ten percent users.
So roughly 60 percent of offers were accepted.
[Indiscernible] know how many [indiscernible]?
>> Adish Singla:
So it would be of --
>>
A thousand?
>> Adish Singla:
>>
[Indiscernible] thousand.
Thousand.
>> Adish Singla: Yeah. What this shows is different angle. So what this is
quite interesting shows distribution of users of how many users accepted one
offer, two offers, and so on. So interestingly, the mode is at one. There's
a curiosity to download the app and try it at least once. And more
importantly, there are many more users who tried it more than once.
Surprisingly there are very few active users who kind of really use the
system a lot. They were essentially kind of doing some kind of moving the
bikes around to actually use it as a marketplace to earn money, so that was
quite interesting. And this leads to some new insights of how this could be
a marketplace. And one comment I would make here as well, so actually, in
New York bike sharing system, so there is something called now valley parking
because the bike stations are so full that there are actually people standing
in front. So if you want to drop off a bike, there's no dock available, you
just give some money for this valet parking and they will figure out how to
drop off the bike. So there are some interesting marketplaces that could
arise.
>> Are you worried that people will game the system and they'll start
creating congestion at the places where the chance exists to make money?
you at all worried that people will game the system and start creating
congestion?
Are
>> Adish Singla: There are a lot of strategic aspects. We don't think
people place strategy aspects here, at least because they were kind of -they were more kind of volunteers who wanted to do this pilot study and we
were also tracking how people behaving behind the doors. But if you think
about it as a large marketplace, there are a lot of strategic aspects of how
people can kind of game the system. Okay? This is something more of a
qualitative collect as well. So this is essentially when we asked them to
walk, how does the probability of acceptance change? So here, you see that
after 700 meters, a probability of acceptance decreases quite drastically.
And for around 100 meters, the probably of acceptance is close to .85. And
finally, this is the temporal distribution of the accepted offers. So a few
interesting things here. There is a peak in late morning, early afternoon.
So that's actually quite useful because that acts as kind of compliment to
the trucks because that's the time where it's difficult to deploy trucks.
Trucks are mostly early morning, late afternoon or late evening. And also,
the offers are spread throughout the date, even late in the night so this was
quite interesting for us to look at. So that's kind of the summary of our
deployment results. Now, just coming back to the challenges that I
mentioned, basically, now, what we looked at is basically this bike sharing
system and this is with one create example, now you can kind of connect with
the challenges that I mentioned earlier. Right? And as we can see, there's
a lot of opportunities for user modeling and machine learning technique to
see contribute more towards the success of these systems. So now, before
moving on to the next theme of the talk I'd quickly like to comment on two
things. So first of all, this is some work related to extending the
mechanisms for context. So earlier, all the mechanism were context free. So
what is happening here is that as a mechanism, you can get some context let's
say at time, weather or user features and user mechanism can make a more
informed decision based on this context. And another thing I would like to
comment here is that earlier, we modeled the learning as a multi-arm bandit
problem. It may not be possible do so because you may not be able to
actually compute the loss associated with the action that you took. So what
we have been exploring in this work is kind of to formulate these problems as
partial monitoring games which is essentially a much more generic and
powerful framework for online learning. What I'll be happy to discuss more
details about this off line. And another thing that I would like to comment
which is a very recent work that we have been working on is learning
incentives but in more complex scenarios. So here, the application setting
is inspired from review aggregation on sites such as Airbnb or Yelp. And
here, as you can imagine that a user seeks some recommendation and decides to
go to some restaurant of a type I. Let's say a Mexican restaurant located in
Redmond with 50 reviews. And as a system, as a Yelp, may want to maximize
the system utility. For instance to gather more reviews for a newly opened
restaurant. So here, the Yelp would like to make some discount offer to the
user to still go to this restaurant, a Mediterranean restaurant in Bellevue
center. So now, the price of this switching cost from I to J would depend on
how similar or dissimilar these items are. And this would to some kind of
learning problem that we looked at before. Now, if you think about that,
there are N different types of restaurants, right? And the goal is to learn
some kind of [indiscernible] switching cost, to switch from I to J. So it
would look something like this. So we are trying to learn this probability
distributions for every pair. So the key idea that we have been exploring in
this work is that we want to exploit this structure of this hemi-metric,
which is a relaxed motion of metric without symmetry. And then we are
exploiting some kind of triangling of qualities defined by these probability
distributions. So this is a very recent work that we have some very nice
results and working on a number of follow-ups. This also I will be happy to
provide more details off line. Okay. So now, so far we have focused on
learning about incentives and how machine learning techniques could be
useful. Now, one could flit the question around and ask how incentives could
be useful to actually boost machine learning systems and to improve
information gathering. So there's the second team of research in this and I
will quickly give you a small peek of what we have been doing in this
context. So especially what we have been working on is inspired from this
application of community sensing. We are part of open sense project, which
is a Swiss nationwide project with various collaborators and hospitals and
the goal is to build community-sensing solution to build air quality maps at
very high temporal and spatial resolution. And then understand their impact
on public health. So what we have been doing here is first of all,
developing truthful privacy-aware mechanisms so that users can participate in
a system without fear of loss of privacy. And then we can compensate them
for the data that they shared. Additionally, we have been looking at a new
modality of data collection which is bike sensing. That is we deploy the
bikes with small sensors which can then communicate with the smartphone apps
of the users and transport the data into realtime mode, consideration of CO,
CO2, humidity and temperature. And briefly, I would also mention a few
things that I've been working at MSR during the internships. So generally,
under the term of privacy where information gathering. So the first project
that we did was on stochastic privacy, which is a new approach to privacy for
data collection in online services and the key idea of stochastic privacy is
that we want to provide guarantees to the users on privacy risk that is the
probability -- abound in the probability of how their data will be assessed
and used by the system. And now, users can choose this probability as
desired and they can hope for increasing privacy risk for higher incentives.
Another project that I worked on last year during internship was about
information gathering in social networks with visibility constraints. What
this basically means is that you can -- you have a social network that you
want to query or get some information for instance in -- I'm here in Redmond
and I want to inquire some information about Redmond. I want to find a set
of people with some desired skills. And now the challenge here is how should
I query which nodes should I recruit and pass this query so that I can
essentially get this required information under this privacy constraints that
you can only -- everybody can only see their local friends. Okay? So that's
about second theme. And finally, a small peak on the third team of our
research is inspired from the fact that learning itself acts as a powerful
incentive. And essentially, drives human curiosity and participation in
number of crowd-powered systems for instance in dual lingo where people are
using the system to learn new languages. On citizens science projects,
volunteers are using this to have some intrinsic motivation to learn more
about the system. And from a recruiter's point of view, you may want to
teach low-skill workers to learn to essentially improve their productivity in
the long run. So we have been doing in this context is we developed some
models of human learning as well as design some teaching policies, especially
inspired from the task of image, annotation and classification task and we
also looked into some personalized and adaptive teaching policies where we
can adapt the teaching based on the underlying skills of the users and based
on the responses that we get from them. I'm also happy to discuss more
details of this work off line. And finally, to conclude, what we looked at
is the spectrum of crowd-powered systems in our work with the goal of
improving their effectiveness and we focused on one specific application, a
bike sharing system and we looked at different challenges [indiscernible] of
learning incentives and specifically focused on three dimensions of this
interplay. And there are a lot of new crowd-powered systems and applications
coming up and I think there's a huge opportunity for machine learning
technique to see play a key role in the success of these crowd-powered
systems and also machine learning systems can benefit from these systems as
from the power of this crowd. With this, I conclude my talk, and thanks a
lot.
[Applause]
>>
I'll ask one question.
>> Adish Singla:
Sure.
>> So the idea of number of accepts using number of accepts seems like it's
problematic in practice because it doesn't model return policies. It's
related to the question about gaming the system. Clearly in the deployment
you had, it's possible for somebody to do something counterproductive to make
more money.
>> Adish Singla:
Yes.
>> And I think that's tightly coupled to the idea of using kind of the
number of accepts as the criteria.
>> Adish Singla:
Right.
>> Doing something counterproductive so you increase the number of accepts,
possible accepts in the future.
>> Adish Singla: Yeah. So what we have here is kind of an approximation
because we decoupled this two steps of kind of we said when and where to
offer these incentives. There was one part where we kind of decoupled it to
maximizing the number of acceptance in the second phase with the hope that
they would somehow lead to the quality of service but especially this is
problematic in real world scenarios where the users can actually game the
system. So the right way to set up would be having kind of thinking of the
whole system as one big box and then we want to basically reason about the
whole system. But we haven't if I recalled out kind of a nice way to reason
out the whole system as one component. But I think there are a lot of
possible interesting directions one could think about in this.
>> One other question. What are your thoughts about including kind of
return policy? So you mainly focus on pick up policy. So what are your
thoughts about return policy?
>> Adish Singla: So in the experiments that we did, the simulation
experiments, those were for pickup and return both. It's just ->>
What I mean is the incentives were for pickup.
>> Adish Singla:
>>
Pickup and drop off both.
Oh, there was an incentive for the return?
>> Adish Singla: Yeah. From the simulations on both, but in the deployment,
we just focus on pickup because drop off is a little more tricky than pickup
in a sense that some users want to decide the drop off station beforehand
sometimes. So it's more tricky to game -- it's more easier to game that
system of drop off because for pickup, we know the true location of the user,
as the main user is not gaming the GPS coordinates. So pickup is more easier
to pick up in more real world in that sense. For drop off, we may need some
kind of user tracking to figure out, okay, if user says, this is my drop off
location. Is this really the right one of user just said it? But for
simulations, they were both and they kind of showed compliment effect. Kind
of the performance is equally improved by both the scenarios separately.
>> [Indiscernible] some mix initiative exploration in this space where
there's a lot of information you don't know about the user. So you're making
decisions on an average basis for the general population. But the user knows
where the user's had it or where the user's coming from. So if you could for
example propose three different options to the user instead of proposing
[indiscernible] if you take this bike to this location, I'll give you this
much money. It could be like, here are three offers for you. You know? You
can imagine that three of them works for you because you just want to
distribute [indiscernible].
>> Adish Singla:
Yes, yes, yes.
>> And then the user can say, okay, this is close to where I'm going, so -have you thought about those ->> Adish Singla:
>>
Yeah.
-- alternatives?
>> Adish Singla: We did quite a bit of brainstorming and actually went to
think about how to set up this system because you could think about maybe
just give a set of initial points and destination points and the user can
pick any initial and destination point then we actually did a lot of service
studies with the Meinz users where this will actually -- we don't really
decide the destination point in advance. So we kind of just make ->>
The users don't --
>> Adish Singla: The users don't so we did a lot of this -- because other
initial plan was to actually think of more like give initial and destination
points and pick a bike like this. But what we got from survey studies was
mostly like users during the pickup scenario, they just want to have some -there's a cents location and they want to pickup some nearby bike without
thinking about the destination at that point. And similarly, during drop off
scenario, they would just ride close to the destination and then they would
open up the app to look for the destination. But I think those are very
interesting scenarios where you kind of reason about ->>
[Indiscernible] this plan [indiscernible] yet.
>> Adish Singla:
Yes.
>> Like when they're actually doing the biking, they don't really think
that, okay, I'm going to get some offers so I should check for that before
I ->> Adish Singla: Yeah. So this is what is said without incentive part,
right? This is how we kind of plan our script right now. But this could
change if we actually put the system with incentives. Then the planning
strategy may change at that point actually. But we did very exhaustive
service studies in Meinz because we also want to be very careful that if we
kind of have this new system in there and then kind of how it would affect
user experience. So we were quite conservative in having something that
users do usually. But I think there are a lot of interesting questions and
kind of reasoning about more advanced policies.
>>
Thank you, Adish.
>> Adish Singla: Thanks a lot.
Download