23767 >> Desney Tan: Sue and I have the pleasure... out from MIT, working with Rob Miller. Michael really...

advertisement
23767
>> Desney Tan: Sue and I have the pleasure of introducing Michael Bernstein, coming
out from MIT, working with Rob Miller. Michael really needs very little introduction. He's
been out here a couple of times for internships working with four or five different groups.
He's working now with MSR New England. So he's got plenty of experience with us.
Michael's amazingly decorated, has multiple best papers, best paper awards. And has a
fellowship in one of ours, the MSR Graduate Fellowship, and he'll tell us about his work
over the last couple of years, combining human and machine intelligence. So Michael.
>> Michael Bernstein: Thanks. Great. So I'm excited today to talk to you about
crowd-powered systems. And crowd-powered systems are interactive systems that are
going to combine human intelligence from crowds of people collaborating online with
machine intelligence.
And to set the stage for why this might be a good idea I'll start with the word processor.
That's because the word processor might be the most heavily designed, heavily used
interactive system ever.
And like most interactive systems, it tries to support a very complex cognitive process,
writing, and it does so by helping with some really complex manipulation tasks. You can
think of by now how we've got some relatively efficient algorithms to help with layout, we
can build language models to help with spelling and grammar. But at some level what we
have really little support for is the core act of writing itself or maybe even editing, think of
questions like expressivity or word choice. And or even situations, well, like this.
So how often have we all been in a situation like this where you have an hour or two
before deadline, strict order page limit and you're a little bit overlength. I think we've
collectively burned more cycles fixing this situation than we'd like to admit.
Now, historically, when you're in this kind of situation, you would turn to other humans, in
particular, editors, if you were a published author you could turn to an editor who would
help you shorten your text, who would collect things that Microsoft Word didn't catch, for
example, spelling or grammar errors.
And in a sense they were really a core part of the writer's toolbox. But this is really never
been a part of the toolbox that we could make a permanent part of our software, because
if you wanted to do that, you would have needed these editors to be available at large
scale really at any time.
And that just hasn't been possible. But today we do have tons of people online taking on
some really impressive tasks. This is known as crowd sourcing, and even within areas
related to computer science, we're using crowds to help collect data for machine learning
algorithms.
We're running studies on our systems. Even social scientists and economists, behavioral
economists are running large scale studies using crowd sourcing platforms or folding
proteins. We're even writing collectively in Encyclopedia.
And this isn't a new phenomenon. In fact it goes back to the 1700s when the British royal
astronomer started distributing spreadsheets for the calculation of nautical sea charts
through the mail. Reached its height in the 1930s when a WPA project hired 450
so-called human computers, which is actually the source of the term "computer" that we
use today.
But what I'd like to point out is that this lineage of distributed human computation has
acted as a batch platform. That is, you take a lot of your work, you push it over the wall.
You wait a while, hours, days, eventually bring it back and run some analysis on it.
What I'm going to talk about today are ways in which we can turn crowd sourcing from a
batch platform into one that supports interactive systems. That is, rather than having a
single human editor help you out with a situation like this, where we're stuck between
what the user's willing to do and what the system can support, what if we had tens or
thousands of individuals all look at your document, suggest ways that it could be
improved or shortened? We could algorithmically start to identify the of their best
suggestions and give you access to them in an interactive system. That's what I mean
when I talk about a crowd-powered system. Interactive system, user interface that
supports, that combines machine intelligence through whatever we can design or AI with
crowd intelligence.
Now, let's say that we thought this is a good idea. You're going to run into a couple of
challenges when you try to build these kinds of systems. The first one is quality. A few
weeks ago I asked a thousand people online to flip a coin and to type H if they got heads
and T for tails.
Hopefully in this room it would be about 50/50. Turns out on the Internet there's about a
two-to-one ratio of heads to tails. And this is not exactly that the Internet's a biased coin.
It's that people are trying to optimize for money. They start satisficing, they try to start
generate randomness when they do so they do so in random ways you might notice and
wants fully 7 percent of the respondents don't want to type H or T they actually went
outside of the grammar if you will and typed out the entire word, misspelled it or
enigmatic F. And these are the kinds of interesting challenges that we need to face when
we start talking about integrating crowd contributions voluntary or otherwise into software
systems. Because algorithms may not expect this.
A second challenge is going to be speed or latency. If we're building interactive systems
we expect them to react quite quickly. And crowd sourcing just is not react to that
quickly.
In fact, when it first came out people were very excited about it saying that it's extremely
fast and then pointed out it was 48 hours before they got a response.
In fact, some folks at U.C. Berkeley ran a survival analysis model found that the half life
for responses in these kinds of systems varies between 12 hours, two days, roughly,
depending how much you're offering in a paid crowd sourcing context. We really need to
cut this down by orders of magnitude if we want to have interactive systems.
So today I'm going to show that we can in fact create these crowd-powered systems that
we can overcome these challenges with quality and with latency. And embed crowd
intelligence into our everyday interactions.
And that in order to do that, I'm going to introduce several computationally motivated
techniques that help crowds accomplish these tasks that they wouldn't otherwise be able
to accomplish. Again, overcoming these challenges of quality and of latency.
Now, I'm going to focus for most of the talk on paid crowd sourcing. I'll come back near
the end to talk about how we can use other kinds of crowds to take on lots of tasks that
paid crowd sourcing would never be able to do.
But for the moment you may have heard of Amazon Mechanical Turk, perhaps the most
popular paid crowd sourcing platform. On Mechanical Turk, there are millions of tasks
that are done, things that look like this, label an image, transcribe a short audio clip for
small amounts, usually on the order of a few cents. People do a large number of these
and hopefully make up a reasonable amount of money.
If you look at the population on systems like this, it's about 40 percent in the U.S.,
40 percent in India, 20 percent elsewhere.
Across a variety of indices, gender, education, income, it mirrors the overall population
distribution. Suggests that you really have some relatively educated individuals on these
platforms who are looking to supplement or completely replace their existing income.
We're going to use paid crowd sourcing to explore this notion of crowd-powered systems.
I'm orienting my talk around two main systems. The first is Soylent, a word processor
with a crowd inside, which will hopefully convince you that this entire idea is worth
pursuing. And second is Adrenalin, which takes these concepts and makes them happen
in real time quite quickly. I'll start with Soylent. Soylent is people. It's a word processor
that's recruiting crowds as core elements of the user interface.
Now, I'd like to point out before I start I'm actually not the first person to come up with this
name. You may know one Daniel Fisher who handed the name off to me at some point.
What I hope you take way from this section is we're really embedding crowd contributions
as a core part of how this system works and that we're going to take something that
crowds aren't potentially very good at and decompose it in such a way that we can
actually focus individuals' efforts and get higher quality responses.
So rather than tell you about Soylent I'm actually going to show it to you. So this is
Soylent running on the Soylent paper, which is a little meta. But this is exactly the
situation we pointed out earlier where we're a little bit overlength. Let's decide this
conclusion is a little bit too long. Rather than shortening it myself I'm going to ask myself.
I'll push it off to what Soylent is called shorten.
When I push this, Soilent's going to push a bunch of tasks off to Mechanical Turk. You
don't need to pay too much attention to the details here, but you can see that workers are
marking up the text, making some edits, doing some votes.
When it all comes back, we've collected all of these suggestions that the workers made
and we can start to put them all together. And what you see on the left our original text
and on the right everything that has been marked up. Anything that's underlined in
purple here is a section of the text that has been marked up as being shortenable, a
patch we call it.
You can see every patch has a number of different options that have been suggested as
potential rewrites that are shorter. We can then consider the space of all possible
paragraphs, order them by length and give you sliders such when you drag the slider the
text rewrites itself and becomes shorter, longer or really anywhere in between. Now
when you're done your text is now on ten pages.
This shows you a new kind of interaction we can build by talking to crowd intelligence.
But we can also talk about how we might support existing AI systems.
Crowd-proof is a crowd sourced copy editor effectively. It's going to find errors that
Microsoft Word didn't and indicate solutions and give you plain English suggestions of
what's the problem.
You can see here that the crowd has suggested that this paragraph has two potential
problems with it. You can see that they've explained in one case a sentence too long
and another case here there's actually a parallel sentence structure error.
So introducing is the correct way of putting it. Interesting thing about this second one is
that this error got past, I think, eight authors and six reviewers before crowd proof caught
it before camera ready deadline. The reason is that this is the bottom of page 5.
By the time we're reading and getting to the bottom of page 5, our eyes are getting a little
bit tired. But crowd members are coming in with many different perspectives. And
they're perhaps even seeing this text as the first thing they see.
So there's a lot of different -- there's a lot of different reasons here why we can -- that was
not crowd-powered. Why we might actually want to draw on the crowds here. Now, we
can also talk about how crowds might support natural kinds of input to a system.
And in particular, I tend to write like this. I leave my citations in brackets like this, and I
need to come back later and fill in and write out a bibliography. In particular if I'm using
something that takes in bib tech as input I need to go find that metadata.
Let's say I wanted to go find help with that we can push out something with a human
macro not everything can be proof reading or shortening this allows you open ended
request. In this case I might ask for help finding bib tech for finding the citations in
brackets.
Rather than writing this myself, I'm actually going to show you what one of our user study
participants created. You can tell it's a little bit unclear.
You can locate these by Google Scholar searches and clicking on Bib Tech. Not the
clearest thing in the world but we'll go ahead and paste it in. We can say how many
people we want to pay. How much we want to pay them.
And when it comes back, you'll see something. It looks like this. Where they've gone out
to Google Scholar, figured out what we meant, and brought back the Bib Tech.
That in short is Soylent. Soilent's goal is to reach out to these crowd contributions to
create new kinds of interactive systems. Interactive shortening, a new kind of interaction.
Aid with proof reading. Supporting an AI system, and open ended requests like the
human macro, find me a figure because it's the Internet they'll find you cats.
These are the kinds of things that we think are possible when you engage with crowds as
a core part of interaction. Now, if you were to try and build a system like Soylent, you
might come up against some interesting challenges as I pointed out earlier.
In particular, we've worked with a lot of Mechanical Turk workers on these kinds of
systems and we simply see that roughly a third of what we get back or 30 percent is just
not something you'd want to show to a user, particularly not a user who might be paying
for such a system.
And we need to actually deal with this quality issue, if we want to make such a system
really large scale deployable. Why is this happening? I'll explain it through a couple
personas. I took this paragraph off of a high school essay website. It's a really horrible
paragraph. I've underlined a few of the issues here. We asked Mechanical Turk workers
to in an open ended sense edit it, make it better, proofread it.
You see two kinds of personas here. One we would call the lazy worker. This is
someone who is trying to optimize for money and send a clear signal that they've done
the work but do no more than they really need to.
So given this really error-filled paragraph a lazy worker is going to do something like this.
That is, they made a one character change to the world comradeship to fix the spelling.
It's not surprising they did that because it was the only word that was underlined in their
browser as being misspelled. But they made a clear edit. On the other end of the
spectrum there's the eager beaver, someone trying to give a signal they've done the work
but they sort of go too far.
They go outside the bounds of what the system might expect. Given the same
paragraph, an eager beaver will make some good fixes. But then they're also going to
insert new lines between each sentence which I personally would consider an
improvement to the text.
And these personas are not specific to Mechanical Turk. You can think about say
Wikipedia where you have some workers working quite hard to make edits but getting
reverted because they don't know the rules.
You have other people who are just getting by. And in my opinion the state of
programming with crowds in the loop is still very early days. And in my view it's sort of
similar to before we had patterns like model view controller and so on that started to
codify best practices such that you could get reliably better results.
So our goal here is going to be to start thinking about what such design patterns might be
in the crowd computing space. I'll introduce one that we use in Soylent called find, fix,
verify. The notion is that this design pattern is oriented towards these open ended
problems, like things that Soylent tackles. I can contrast it to a close ended problem
something like a multiple choice test where you can upload some small percentage of
ground truth and use that to compare and figure out whether the work is good.
Here, there's a huge space of possibly correct answers. What we're going to do is we're
going to decompose this very open ended problem into three stages which are slightly
less open ended and give workers more direction. I'll explain it through an example.
We used this both with proof reading and shortening. I'll show you with shortening here.
Rather than having workers directly edit the text, we're first going to have them just find
areas of the text that can be edited.
We're going to effectively get a heat map over, say, the paragraph. We're going to look
for independent agreement across workers to certify an area of the text as a patch. And
we're going to send each patch out in parallel to a fixed stage. In the fixed stage we're
going to show another set of workers exactly one of these issues and ask them to fix it,
that is, shorten the text, fix the typo depending on the text.
We're going to collect a bunch of these suggestions, randomize their order, and put them
through a verify stage. The verify stage is going to try and force some invariance here.
Basically we want to make sure we're not changing the intended meaning of the text and
not introducing any new style or grammar errors. Anything that survives the verification
stage we can finally pass back to the application logic and in particular here create
something like shorten.
Okay. So why is this a good idea? In particular, why would we split find from fix? Why
not just let workers go in and sort of improve the text?
Well, one major reason is that we're actually taking advantage of these two personas I
introduced. In the fine Find stage we can force lazy workers to find two or three errors in
the text which gives them a lower bound, that they can understand and we can
understand. In the fixed stage we can point these lazy workers at a problem that is
perhaps not the easiest one and they can't sort of get away without having fixed that
particular problem.
So by giving them a more specific task, we actually end up producing higher quality
results. At the same time, we can focus the eager workers on a task that we really want
accomplished right now and hopefully keep them from going too far off the rails.
It also allows us to group the suggestions. So we can get that drop-down if you ever
passed out a draft and gotten a bunch of different edits that are confusing, what we know
now is given a particular problem, these three edits are all different ways to fix it.
So you don't have to actually do that merging yourself. The notion behind the verify
stage is that we get higher quality output by putting these workers in productive tension
with each other.
So you have one set of workers whose goal is to try to suggest options and another set of
workers whose goal is to consider critically whether those are correct.
I'd like to point out that while this was somewhat early in the space, there's really a
growing literature, much of which was done here in fact that's playing into this larger
space of what happens when we combine crowds and algorithms.
>>: Why verifier versus a more general rank that would show two possible solutions to
someone and have them pick the better one?
>> Michael Bernstein: It's actually agnostic. With crowd proof we just try to pick the end
best. Like the one best, because we want to make one edit.
With shorten, we actually want to filter anything bad and sort of get a general rank,
because we just want to continue -- we want a set of as many options as possible. But
you can imagine doing a rank and then you'd have to assume where some cut-off would
be. But certainly there are many different ways you can run a verification stage. This is
just one way. Is that address your question?
>>: I think so.
>> Michael Bernstein: Come back later if I'm still being confusing. We wanted to know
whether this worked. In particular, we wanted to know three things. How high is the
quality, is this look like, how long do you have to wait? And how much is it going to cost?
So we can throw a bunch of input texts at Soylent, in particular here is shorten. Here are
five different input texts we give it ranging from Tech Crunch, HCI papers, OS papers,
and my personal favorite, a rambling e-mail from the Enron corpus and feed it through
Soylent and get edits that look something like this. Now across all these texts we see we
cut about 15 percent of the original paragraph length on average.
What this means is that you can take an 11-page draft of a paper, hold constant the title,
the figure, all that boilerplate, run it through shorten, and get a ten-page paper back,
without having changed any of your core arguments.
So why is this work? How is it work? Workers tend to avoid any sort of technical content
they don't understand and instead focus on wordy phrase, in this case the phrase are
going to have to can be changed to have to without changing the meaning of the text too
much. These are exactly the set of phrases we can start to collect a corpus of and train a
machine learning system to take over much faster and free, just have the crowd take over
a verification stage.
Now, they make more complex edits as well, for example, merging sentences. This
sentence now reads the larger tangible bits project, which introduced the meta desk into
companion platforms.
Okay. But this does not always work. Here's some interesting ways in which it fails.
One is that workers are not a member of your community of practice. That is, they're not
experts. And they might mistake their expertise. There's a signalling phrase in
academia, we say in this paper, we argue that: Workers just find it boring.
So you may disagree legitimately. So expertise is one issue. Another one, which is
endemic, has to do with parallelism, that is, workers in one patch can't see what the
workers in the other patch are doing.
In this case you have two list items and they cut the main phrase from one and the
parenthetical from the other which leaves the resulting sentence something meaningless.
If you wanted to fix this you would need to talk about enforcing global constraints which is
how Chi and Erica have been looking at or merging patches together when they get too
close together.
So across these three stages we're recruiting hundreds of people for each of these texts.
Costs about $1.25 a paragraph. If you're willing to wait longer it can get down to about
30 cents and from there you can start talking about optimizing and sort of in decision
theoretical terms trying to minimize the number of workers you would need for each stage
to optimize some global quality constraint.
Now, how long do you have to wait? There are two types of wait time in Soylent. The
first one is between when Soylent asks for help and when a worker says, okay, I'm going
to help you out.
Now, if you sum the median find, median fix and verify, you see this takes about 18 and a
half minutes. It can take much longer. It can stall, but roughly you're looking at about 20
minutes.
The second wait time has to do with between when the worker says they'll help and they
actually complete the task. If you sum the medians, this is actually much faster, it's about
two minutes.
Now the second half of the talk I'm going to show ways in which we can get that 18 and a
half minutes down by several orders of magnitude. But you're looking at perhaps in the
limit about a two-minute wait between when you asked for help and when Soylent can get
back to you.
So we can do the same thing with crowd proof. We can give it lots of different input text,
add Wikipedia pages, text which passes Word's grammar checker and the one at the top
which is an essay written by a nonnative English speaker. I'll focus on that one. You can
see some of the edits it makes. Word by itself finds about one-third of these errors.
Crowd proof finds about two-thirds of them.
Interestingly, they find different errors, which is to say when you combine them you get
about 82 percent coverage. When it finds an error, crowd proof fixes it about 90 percent
of the time. So when does it miss? Most commonly when there are two errors in the
same patch and the lazy workers come in and fix the obvious one but don't notice the
more detail subtle one.
So the same processes happening over and over. Human macros, same thing, you can
see exactly the text that I pasted in earlier, other things like finding figures, changing the
tense of a document.
I'll focus again on the first one. The input text looks something like this. In fact, if you're
familiar with the literature, you know that this is an incorrect citation, Duncan Watts is one
person, not two. But the worker still managed to figure out a noisy input and command
and get the correct answer.
Now, there's no verification stage here which means the 30 percent rule comes back. So
we see about 70 percent of the time these things are perfect. And about 90 percent they
have the right idea but some subtle error in them.
So so far I've introduced you to Soylent, which is introduced this new space of interactive
systems that are powered by crowd contributions. And in order to do this, I've introduced
the find fix verify design pattern to you which has started to focus these contributions to
address questions with quality.
>>: Can you go in what percentage of the workers end up actually contributing against
the verification.
>> Michael Bernstein: What percentage?
>>: If you have a lot of lazy workers giving you totally useless things.
>> Michael Bernstein: What you're asking is there high correlation if you give me bad
things now will you give me bad things later?
>>: Yes.
>> Michael Bernstein: My sense is yes, I don't have numbers for you there. Certainly
there's a parallel contribution in most of these things such as a small number of workers
are actually producing a large amount of your results.
There are lots of folks who think about sort of this global quality management, Crowd
Flower is a good example of someone who maintains sort of across many tasks. And I
think the first thing you'd want to do is start to build up a better notion of reputation, either
as the platform like Mechanical Turk or O desk can do this better than any individual
requester or as a requester we can get feedback from the user saying I like that edit.
That edit was really bad and we can sort of propagate backward.
>>: What's their deal and were they're ->> Michael Bernstein: So often -- [laughter] -- I don't have deep knowledge. That is, I
think you'd want to spend more time talking to them to get a better sense but there's this
notion that you're overcompensating. Sometimes it has to do with if you have an
interesting task or if it's the first one you don't know the task parameters yet or boundary
conditions.
I think that's one thing that's going. I think often it has to do with them overestimating
their abilities as well. Like yes I'll just insert new lines. Not a good idea.
>>: You mentioned that the percentage like the payment, by the weight, digital affected
the quality of the work.
>> Michael Bernstein: No. In fact, that Winter Mason and Duncan Watts paper
demonstrated that paying more gets you more work, that is faster, but no known
increasing quality. In general, that's what we see. You need to design your tasks better
to get higher quality results which I think as an HTI person that's nice to hear. That
means I can actually have an impact.
At this point I'll turn to adrenalin which is going to take these ideas and push them into
the real time space. The reason we want to do this is that the kinds of applications we
can build are really constrained in a very deep way by latency.
Now, soil is one of the first crowd-powered systems I can actually turn to the broader
research literature that has started to explore the space in a much broader way than even
I could alone showing that it's useful for design, health and nutrition, robotics, vision,
many other kinds of things.
But, fundamentally, all of these applications are constrained by the same limit as Soylent,
which is that sort of 20-minute wait time. In fact, the best result we've seen in the
literature comes out of Jeff Bigham's group out of University of Rochester, was able to
get one response from a worker, about 60 seconds after you ask.
And that response isn't verified because it's a singleton. And if that's the best we can do,
we've already lost, because usability psychology has demonstrated that users will only
pay attention to an interaction at max for ten seconds; they'll lose the flow.
What we really need to create are on demand flash, real time crowds. That's our goal
here. We're going to pick one motivating application, which is going to be adrenalin, a
camera, that is for novices, sort of built into your cell phone.
And what it's going to do is for the kinds of situations pushed out beyond what Mike's
been working on, we try to sort of find aesthetically subjectively the right moment to take
the photo.
It's sort of the moment camera but crowd-powered. We want to do this in real time,
because ever since the introduction of the digital camera, it's become a core part of our
photo-taking experience that you take the photo. You see the result. We can take
another one you can share it with your friends we don't want to go back to an era where
you have to develop your film overnight.
So this is the kind of thing that adrenalin looks like. You can see they're capturing a
video of people doing high 5s there. One of them in fact is me.
As of right now we just made a request to the workers to help us choose the best frame.
They're going to poke in along the bottom there. Start exploring the space, and very
quickly they'll focus in on the final frame now.
So just a few seconds later we have a final frame. Here are a few other kinds of pictures
that adrenalin takes. You can see people trying different angles. Different kinds of
poses. Action shots like people jumping off a bench, or just people being silly and hoping
that the crowd will pick a cute moment.
And, again, we can collect data from these kinds of systems and start to train more
automatic ones as we push out.
So if we want to create adrenalin, we need to solve two problems. And these correspond
to the two wait times I pointed out before. The first is how we get the crowds there
quickly.
And in order to do this I'm going to introduce a new recruitment approach for crowd
sourcing we call the retainer model.
The idea behind the retainer model is we're going to ask workers to come and sign up
before we need their help. And we're going to actually pay them a little bit extra while
they can go do anything else. They can work on other tasks. They can check their
e-mail, they can chat. But as soon as we have a task for them they have implicitly
agreed to come back.
When we have a task we just pop up a simple JavaScript alert, brings their attention to
our browser tab and we go from there. So is this bring people back quickly? It's an
empirical question. In fact, in the space of HCI kinds of questions it's one of the most
measurable.
So we ran a study on Mechanical Turk, counter balanced across days of the week.
Times of day. And what we did is we had people sign up for a task and then we called
them back. So I'm going to draw a graph here. On the X axis you're going to see how
long it took between when workers saw this dismissal and when they clicked the okay
button and started working.
Now, on the Y axis we're going to see CDF, what percentage of all the workers clicked
the okay button at least that quickly. They're randomized into different weight time
buckets. So if the workers weren't waiting very long, you saw a curve that looked
something like this. If they're waiting a little bit longer, a curve that looks more like this.
What you can take from this is if the workers are waiting under ten minutes, you get
about half of them back two seconds after you ask. And in fact you get about
three-quarters of them back, three seconds after you ask.
Now, what happens if they've been assigned to wait longer? Now you see more attrition.
But in the separate experiment we found if you offer a small bonus you can take a curve
that looks something like this, 25 percent chance of this, woulder coming back, and push
it all the way back as if the worker hadn't been waiting at all. We changed the incentives
and we changed the behavior.
So I noted I said we're paying half a cent minute into the sort of expected wait time. So it
costs about 30 cents an hour to have someone on retainer. Now even just with this, we
can create some real time kinds of applications. We built one called AB, sort of crowd
sources kind of instant votes. If I want to know which tie to wear today or which of the
two designs people like better, over here on the right you'll see a go button. I'm going to
click the go button and replay a result from one of our studies.
So this is something that took 20 minutes with Soylent. 60 seconds with Jeff Bigham's
work, and we get five votes in about five seconds.
So this is the kind of thing that the retainer model can do. Crowds in two seconds, and
traditional crowd sourcing kinds of tasks in about five seconds.
And that's great if what you're trying to do is choose between two photos. But adrenalin
is not. In particular it's trying to choose between say 100 or more photos all at once. So
what happens now is the workers arrive quickly but it takes them a long time to actually
shuttle between those last few frames and choose the best frame.
So how are we going to help them find that decisive moment. How do we help the
workers work together in order to overcome these slow work times? And we're going to
take advantage of one notion here. Which is that we really have created synchronous
crowds. For the first time you can assume all these crowd workers are arriving at once.
Not sort of arriving and leaving at individual whim as you usually have on Mechanical
Turk. And we can start to think about how we can get them to work together.
In particular, I'm going to claim in we're smart about this we can get the crowd to work
faster collaboratively than even the single fastest member of that crowd. The way we're
going to do that is through a technique called rapid refinement.
In continuous search space like what we had with adrenalin the notion with rapid
refinement is to look for agreement early as it's starting to emerge, before people would
have made their final selection. And use that to reduce the search space quickly and
focus everyone's attention. So I'll explain what we mean with pseudo code here. On the
left you see what the server has. On the right there are three workers who get initialized
to random positions in the video.
The server sees all of them, and we're going to start looping until we get down to a single
frame. We're going to look for agreement. We'll see how many workers are within a
particular region of this video. Right now there are none. We'll wait until there's a certain
amount of agreement, say two-thirds, as the workers start navigating through the space
there's still no agreement. Eventually you'll see that two folks do come together
indicating that they are interested in the same region rather than immediately jumping
forward, we're going to make sure it's not a false positive. We'll make sure they stay in
that region for two seconds.
They do, we're going to certify this as a refinement. Reduce the search space so those
folks who agreed can stay exactly where they were. Anyone who disagreed won't get
paid yet, and will get reinitialized to a random new part of the video. And we're going to
keep doing this again and again until we get down to a single frame.
This is how rapid refinement works. I'm going to show you exactly the same video I
showed you before. Just focused on the bottom part. So you're going to see that
workers arrive and very quickly they're going to start agreeing on that central region,
we're going to have refinement and an overlapping another refinement down to a single
frame.
So just a few seconds. Now, what came out of this? That is, is this work? Do we have
some sort of quality time trade-off happening here? And when we have low quality
results, what's going on? Sumit.
>>: Seems like there's an underlying assumption there's kind of a single best place,
because if you have multiple error functions ->>: So if there's bimodal distribution.
>>: You have problems there. But is that generally the assumption that the video is taken
away with one kind of optimal place.
>> Michael Bernstein: We assume there's one intent. But the nice thing about crowds is
that there's many of them. What you can do, although we don't do in the current
implementation is fork. You can imagine this as populating some probability distribution if
you see two peaks you can put one set of people over here and one sort of people over
here or focus on one and you can wheel around when you have more time to explore the
second one.
And I'll point out another reason in a minute why that might be a good idea.
>>: One more quick question. What's the final dollar amount that you were paying for
finding that frame, when you add ->> Michael Bernstein: I'll show you in just a moment. So cost is going to be another
element here. So we actually had 34 folks, I think, from our university come in and take
video photos from this, and we produced five different candidate frames from each of
these input videos. One of these frames was generated using rapid refinement as I
described here.
A second one was effectively at ground truth. We have a professional photographer
come in and choose that best moment. Third was an off-the-shelf production level
computer vision algorithm choosing aesthetic or representative frames within video. You
can think this is effectively Google what it is when it chooses video. The other two
techniques were more crowd sourcing-oriented. Generate and vote looks a lot like find,
fix verify. We call people in off retainer. They nominate frames. We call more people off
retainer, they vote amongst those frames.
Generate one just takes the first response we get in generate and vote. That is, the
fastest member of the crowd, as soon as they produce anything we just take it.
So we can measure two things. One is -- sorry three things. Cost, latency and quality.
We'll start with quality. So we can have these people rate on a nine point Lichert scale
how much they thought the photo was what they were looking for, that they like it.
What we see is rapid refinement tends to do statistically better than computer vision,
which chooses a different moment. And statistically is indistinguishable from the
photographer due to large variance.
Now, typically you see something that looks like this, where the crowd chooses
something in the same general area, but not exactly the same frame. On the top row
they were actually just one frame apart.
Sometimes you see something like this, though, where you notice this is a bad photo.
The guy's eyes are closed. It's blurry. So what happened here?
We actually had a false positive. You had two workers who were interested in nearby
regions of the video that didn't overlap. But they were close enough to each other that
the system thought they were interested in the intersection. Snap down, and they were
left with a region of the video that had nothing good.
So if you wanted to catch this you would need to notice thrashing behavior and pop back
out and explore different area.
Okay. So here I can answer your question about cost, rapid refinement was about 20
cents a photo. And it went up from there. So what I would hope you would take from
here is that rapid refinement was actually not only the fastest, but it was the most reliably
fast. It statistically had the least variance, which I would claim is very important for
interactive systems. You don't want something that reacts quickly sometimes. You want
it to be sort of reliably reacting quickly.
Really what's happening is we're pulling up the tail. Sometimes you do have fast
individuals. But sometimes you don't. Rapid refinement can identify that longer tail or
takes that longer tail and pushes that probability mass to the left.
Generate and vote still performs in under a minute, which is much faster than Soylent
and, in fact, matches the quality of the photographer, which we thought was pretty cool.
>>: I'm a little confused about the second row there. Because it seems like I mean
generate one, if you were using the same retainer scheme as you are in the first one
seems like that should be pushed even further towards zero because it's the very first
person that responds to anything, in upper case you're talking about some cascade.
>> Michael Bernstein: It is pushed a little bit farther to zero. If you want to like take your
eyeglasses you can see that it's actually sort of one unit left.
>>: Scan the whole thing.
>> Michael Bernstein: They still have to scan the entire thing. It takes them some time to
get called off retainer. And we're just taking the first one. So we have five people on
retainer. We call them all back and we only pay attention to the first one. Sometimes
you don't randomly have a fast person in your crowd. So that's really what's happening
here.
Okay. So we make a few trade-offs. One strength is we actually get fast preliminary
results. So within that ten second boundary we can return something to the users. And
that happens on average within that first refinement happens within ten seconds.
We also don't need a separate verification stage. Because verification is effectively built
into this algorithm. We're looking for agreement as we go. But we do sacrifice some
things. We're sacrificing some amount of quality to get the speech write off, we can think
of it as randomized algorithms where you're not getting potentially optimal result but
something that's much faster.
So you have this trade-off now. And more importantly, in my opinion, is the fact that
we're actually stifling individual creativity in the system. And this is not just adrenalin and
rapid refinement. Fine fix verify is the same thing. Most crowd sourcing systems have all
this regression to the mean effectively happening.
Imagine you were the photographer in the crowd. You have no special ability to actually
pull the crowd toward what you know to be a good result. If you want to push forward in
this, I think you want to start talking about automatically identifying these experts as we
were talking about earlier and giving them a privileged position within these systems and
these algorithms.
Now, in terms of generalizability, rapid refinement we think applies to sort of single
dimensional search spaces largely just within photography you can think about
brightness, contrast, color curves, these kinds of things.
So by combining the retainer model and rapid refinements, we're able to execute these
really large searches in a human perceptual space within about ten seconds. And this
allows us to turn around and start asking the same kinds of questions about say creativity
support kinds of applications.
This is PhotoShop. Let's say you were creating a poster for a rock concert, and you
wanted to have a band of or a crowd of screaming individuals in the audience.
This puppet work tool allows you to author a control point sort of like [inaudible] work and
you can drag it. Let's say we call people off of retainer and say make that person look
excited.
We can have a bunch of individuals do that manipulation. We can take all of their
suggestions, draw them back into a layer in PhotoShop and produce something that
looks like this.
So, in particular, with about eight workers on retainer, you start getting feedback in a
couple of seconds. You get your first figure in a half minute. We went out to several
hundred figures and kept getting new ones every three seconds on average. So we think
we've really closed the loop here and connected this back to a productivity-style,
creativity support desktop application that's allowing you to sort of draw on this crowd
intelligence as you need for things that perhaps you would never think of.
Now, back off here for a moment. And point out that the retainer model has started to
system ma ties the recruitment process for crowd sourcing. We're changing that
recruitment process, by systematizing it we can actually begin to model it and ask what
happens when we go from having, say, 20 people on retainer to huge numbers.
I won't go into too much detail here, but it turns out you can cast the retainer model using
qeueing theory, that is, this is just a formal framework that allows you to sort of
understand if workers are arriving at some rate and you can recruit new workers, tasks
are arriving at some rate and you can recruit new workers at some other rate, ask
questions about how long is the queue, the line.
And in particular this is an MMCCQ, which is to say we have C workers on retainer, and if
we have any more than that number of requests we're just going to give them a busy
signal.
Now we can ask, what's the probability that when I need help, that is, there's a task,
there's no one left on retainer to help me. We can derive this from Erlang's loss formula
in queueing theory. This pi of CUC here as closed form formula. And you can ask what's
the expected number of workers on retainer which gives you a better sense of cost.
You can then plot those two things against each other and treat it as a minimization
problem. You can say how few workers do I need on retainer to have some guarantee of
service. Like one in 10,000 chance when someone wants help, there's no one left to help
them.
This has lots of other applications. You can think about asking -- you can model what
happens when you share retainer pools across applications. You can ask what happens
when you then start routing tasks to workers to avoid starvation.
Or you can do what we call predictive recruitment or precruitment which is this the notion
that if we know that the on average a the task is going to arrive within then ext ten
seconds and workers will maintain their attention for up to ten seconds, we can actually
recall the worker before we have the task.
Show them a loading screen for a moment. When we do that, we actually see that we
can get feedback in just a half second. It really starts to blur this cognitive boundary
between me pressing a button and seeing feedback as sort of cognitively part of that
action.
So you can push farther on this. But I'll back off here. I'll say at this point I hope I've
convinced you we can create real time crowd-powered systems, and that we can
introduce techniques in order to support these. Things like the retainer model and rapid
refinement.
So, yes, question.
>>: So you created a model of human behavior by giving people these incentives to stick
around and wait for your response, and so you're basically one economic entity in the
system who has done this.
And the question is, just as in any type of arbitrage system, what happens when
everyone else starts running arbitrage?
>>: Right, this is exactly ->>: Is it stay the same.
>> Michael Bernstein: This is exactly why we want to start asking and modeling what
happens when we combine retainers across requesters. What I want to put forward is
that the platform could actually support this. You can imagine having two sets of tasks.
There's the real time tasks and the non-real timer batch style tasks and you can sort of
agree to follow like in the sense of Twitter like I like that request, that kind of task, that
kind of task and just give them to me as they come. Otherwise I'll start picking up tasks
and the system can actually consider the space of everything I signed up, the space of
everything else have signed up for and the tasks coming in and route them to actually
sort of keep a globally optimal solution.
So it's definitely possible right now on Mechanical Turk to do that kind of arbitrage, right?
I'm trying to push forward and say how would you design the next platform to avoid that
kind of problem.
But your concern is absolutely valid given where we are now.
>>: So you're assuming a monopolist buyer.
>> Michael Bernstein: I'm assuming what.
>>: A monopolist buyer, that there's one system that makes the rules and hands all the
requests out to the workers?
>> Michael Bernstein: I wouldn't call it a monopolist buyer but you can imagine it that
way. Platform support or you could be like crowd flower a middleman where I'll help you
get real time workers, and you just sign up through me and I'll help you. But, yes,
effectively we're talking in that case about what happens when we centralize.
I think if you start splitting and everyone's competing for real time workers, it would work
just not as well. For exactly the reason that is your intuition.
Okay. So across these two systems, I hopefully have convinced you so far we can
create these crowd-powered interfaces these interactive systems that are supporting the
kinds of tasks we cannot support with traditional systems the line between user and
system. Now, there's a third dimension effectively of crowd.
So that we can create these interactive systems that embed crowd intelligence and in
order to do that we can start to look at computationally motivated techniques to help the
crowds accomplish these tasks.
Now, at the beginning I promised that I would push past paid crowds, and I'm going to do
that now. There are actually many different kinds of crowds out there on the Web. We
can pay crowds, we can create new kinds of crowds. We can mine the activities that
crowds have already have gone and taken upon themselves. And I just want to give you
a brief tour through sort of that brief space. Because I like to play across all of them with
several citations with work I did here actually.
I'll start with designing new signs of social computing systems. In particular if you wanted
to create a crowd that never existed before. This is work that tends to appear at HCI
conferences like Cayenne Wist as well as social computing conferences like ICSM. Our
goal is to create new social systems that never existed and understand how to design
those systems.
Now, I'll give one example, this is work I did with Eric and Desney and Greg and several
others on friend sourcing. The notion here is that we may actually want information that a
generic crowd would never know.
So in particular if I wanted to know what to get Desney for his birthday, Mechanical Turk
would have no idea. Yahoo! Answers has no idea but people in this room, his social
network really do. By creating incentives over the social network we could encourage
people, I would say Mary was also involved in this work, this is what happens when
you're on the spot.
To encourage them to share these tags. Many people in this room, in fact, some of
[inaudible] users. We got tens of thousands of tags on thousands of individuals. In
follow-up, we created a system called Feed Me that starts to effectively learn models of
people's interests by riding on this activity of people sharing interesting news with each
other.
When we do this, we can create systems like this. We can route questions like is IUI
research tend to appear at the Wist conference. These individuals are tagged with both
kinds of -- both IUI and Wist but they never had to sit there and tag themselves with their
interests.
We were taking advantage of the fact there's a power law here that we can take a small
number of individuals who are really active on these social networks and spread out their
interest in activities such that it's to the benefit of everyone else in the social network.
We can also ask what happens when we take unusual designs in the space of social
computing systems. Particularly, you may have heard of 4Tran [phonetic] or /B. They
created the anonymous hacker collective which you may have heard of.
It's sort of heterodocks community, to say the least. It's an unusual space in the Internet.
I don't recommend checking it during the talk.
Now, they make some really interesting decisions. One is that by default all posts are
anonymous. Two is that they don't keep archives. It's not gooingable. In fact when new
content comes in it pushes off older content.
We got five and a half million posts from this site and simulated the dynamics of the site
to ask what happens in a large scale online community when you have anonymity and
efformality as core design tenants. We saw that the median thread lasted just five
seconds in the intentional sphere of most people; that is, on the first page. And it was
pushed off completely from the site within five minutes.
In fact, we also found that over 90 percent of the post were made completely
anonymously. We found some interesting ways in which -- there was a suggestion that
these exact decisions of anonymity and eformality are what were leading to Fortran's
ability to drive Internet culture.
If you've seen a lul cat, if you've ever been to rick rolled [phonetic] you've experienced the
output of Fortran. So we can think about these kinds of questions of how to design these
online communities as well.
We can also talk about mining what crowds have already done. This is again work that
tends to appear at HCI conferences like Cayenne Wist. .I'll focus here on some work I
did over the summer with Sue and Jamie on tail answers.
That's right, I missed another one. And Eric.
>>: [inaudible] [laughter].
>> Michael Bernstein: All right. So answers you may have been familiar with or one box
something like this when you query for weather. In addition to the organic search results
you see something like this, which is a result that's been designed specifically perhaps
kind of sad in the case of Boston for that kind of query.
But we don't have any kind of response for something that is a much less common kind
of query like what are the substitutes for molasses, which we know to be actually
collectively quite common; that is, they're in the tail. There's a large number of somewhat
popular queries.
So what we created was something called tail answers, where we can again augment
these organic search results with a direct response telling you exactly what you would
replace molasses with. In fact, we can create hundreds or thousands of these through an
automated process, answering questions like how long does audible stitches last, the
story of the invention of the light bulb, how to turn up the volume on Windows XP, and
many others that are all sort of collectively somewhat popular.
And, I'm sorry, individually somewhat popular and collectively quite popular. So we really
do turn to crowd data to make this happen.
We can look for search trails, like Ryan has been exploring, where we identify where
people start searching, navigating through the Web, and find pages where they have
unusually high probability of getting to that page and then ending their search session.
If we combine that with looking for the canaries in the coal mine, a small number of
searchers who use question words in their queries, like what is the average body
temperature of a dog, we can start to identify Web pages where people are finding
concise informational answers to their needs.
We can then use something that looks a lot like find, fix, verify, to extract that content
from the Web and promote it into direct response.
So really there's a broad space here of crowd powered interfaces and crowd powered
systems. We can talk about how we might pay people, how we can create new kinds of
crowds to collect information that's never been collected before, how to look to what
crowds have already been doing.
So my goal really in the large scale is to integrate social and crowd intelligence directly as
a core part of interaction, as software and of computation, more generally.
Now, focusing just in the paid crowd spacing there's a lot of ways to get there. We want
to think about how we integrate crowds with machine learning, for one. That is, we can
already now start to deploy these systems, collect the data and train better machine
learning systems. But we can also then take these machine learning systems and use
them to make the crowds more effective.
For example, in tail answers we found that by using an open information extraction -open information extraction system, we can actually just have the crowds vet the
answers, which ends up being much faster and cost less.
We can also think about what happens when you say start treating these workers as like
stump learners in an ensemble. We want to think about the platform. How would we
change O Desk, Mechanical Turk, many of these systems, Top Coder? We've seen what
happens when there's a small scale like hundreds of individuals online at once.
What happens when everyone is a contractor effectively that when we have hundreds of
thousands of people participating, how do we help them develop expertise and notice
when they have the expertise? How do we help with lifelong learning. What do
benchmarks and complexity look like in this space? If you come up with a better
algorithm, how do we actually compare and understand the ways in which it is better.
And at a high level we can start to talk about ways we can combine machine and social
intelligence to take on these really complex or high level tasks. Think of sort of the big
questions helping you write a lecture, write a symphony.
Big questions. Now, this work opens up many cans of worms. Now, we don't have
enough time to really get into an in depth discussion here. But I want to give you a sense
of the kinds of questions that I think about and that I think are important in this space.
First is that we have this sort of returning notion of scientific management. How do we
think about contract ethics in this space? How do we make sure that people in
expectation make a living wage when they're doing piecework.
What happens when your software has goals and dreams? That is, there are individuals
participating as part of this system and you want to support their interactions socially.
You want to give them the opportunities for career advancements. These are all
important parts. And I would argue will lead to a better result if we think from that
platform side.
Finally, we've complicated notions of attribution. Should we have had thousands of
authors on the Soylent paper? On the flip side, if there's an error, who is now at fault?
So these are just a few of the issues that I think we need to push on.
So in the meantime, people have picked up find, fix, verify to start doing things like image
segmentation like you can see in the upper right there. Authoring maps. They've
modeled it using formal crowd languages. It's been integrated into course work at
several universities.
And, more broadly, again, while Soylent was one of the first crowd-powered systems I'd
like to point out that there are lots of these systems that are really gaining traction in the
research and practice space, things helping with blind individuals, translation, databases.
It's a big space, and I hope you'll come play with me in it. So I hope I've convinced you in
this hour that we can in fact create these crowd-powered systems that are going to
enable experiences that you wouldn't be able to accomplish with just machine
intelligence, but nor could crowds on their own perhaps do them either, within the
symbiotic system that actually plays to both of their strengths. And, more generally, I
hope I've convinced you that computation can become a critical component of what's
known as the wisdom of crowds.
So I'm part of a small crowd of collaborators. My closest mentor is David Miller and Rob
Karger at MIT, variety of researchers across many institutions, including here, graduate
students, undergraduates, and many others. So thanks to all of them.
And at this point I'd be happy to turn it around to discussion and questions. Thanks.
[applause]
Yes?
>>: One question about just your observation of the workers out there. The Mechanical
Turk workers. What is the total population of workers is, is it evolving over time, moving
east, moving west, states, spreading out?
>> Michael Bernstein: I think this will continue to be a question. The most recent
information I've seen has sort of moving east, I guess, would be a good characterization.
So there are more workers in India than there used to be.
I think the model you want to keep in your head here is that people in the U.S. are using it
to supplement their income. In India, Bill These is actually doing some great work at
MSR India looking at ways in which people are actually replacing their income entirely in
ways in which you can say use cell phone platforms, give people cell phones as a way to
actually start expanding this.
>>: Do you have any known as what is the total size?
>> Michael Bernstein: That's a great question. Amazon doesn't say. My estimate would
be tens of thousands are signed up, but perhaps hundreds maybe low thousands at any
given time. That might even be an overestimate. I think that these platforms have a
large space to grow.
>>: Still small.
>> Michael Bernstein: I said five million tasks a year. That's actually ->>: I was surprised how small ->> Michael Bernstein: So if you think about who is actually using these largely, it's
researchers and companies like Crowd Flower that are using it to like verify business
listings and so on.
Part of what I do in my role here is expressing the much broader space that crowds could
really tackle. And by doing so I hope that will push open the boundaries of what can
happen. And then you look to things like O Desk where there's real expertise. Like I've
hired music engineers to help me create a song for a CAI madness last year. Law Tech
people, mathematicians, they exist on these platforms. You'll start to see a continuum
from Mechanical Turk which is homogenous and sort of generic intelligence out to things
like O Desk where there's real expertise.
I promise to come back to you if I wasn't clear. Are you happy?
>>: So the question at the beginning was why verify rather than rank. So verify would
seem to return a Boolean. And rank would assume that it takes an input of multiple
things and can get the best [inaudible] doing one thing and just rank it.
So I guess I was saying why be specific if you could be more general?
>> Michael Bernstein: You absolutely could. Eric spent time thinking sorting crowd
sourcing, another way to do this, that is effectively ranking. Now you have sort of noisy
comparators effectively. What we're getting is a histogram for each of those pieces of
text, we're getting a number of votes that it's bad.
So in a sense we can get a noisy rank from that, and it's just a matter of what you do with
it. But you're right, you can push out more generally and consider. Verify is a notion of
semantic, that means we're trying to effectively get a notion of what's good and bad, but
you can imagine rank being a better term for that if you wanted.
Yeah?
>>: So one thing people often ask about crowd sourcing, what domains can you apply it
to? I think you gave a lot of really compelling examples but maybe you could talk a little
bit about generalizing some of the things you talked about, like task decomposition to
find, fix, verify, you gave a few examples. But can that -- task decomposition the key to
sort of any domain? And how might that play into tasks you might try to do in advance at
sort of the system level?
>> Michael Bernstein: So I think about this in the following way: Right now, I mean I can
view this as a limit, like theorists want to know what's the limit of this. But you can view it
as a research challenge, what can we engineer in the future. Weird space there.
One thing that crowds are poor at is anything that requires high level knowledge. That is,
if you wanted to have your entire paper shortened, there's this orthogonal element where
we give every paragraph have it shortened individually. But when I shorten the text it
happens by saying this section feels wordy or I could just cut this paragraph entirely.
If you want to make those kinds of assertions, imagine you wanted to build a crowd
source personal assistant, someone to help order pizza, reserve rooms, set up meetings,
they would need to have some globally consistent knowledge of who you are and you
don't like anchovies, how do you build those very large scale kinds of pieces of
understanding across lots of distributed tasks, seems like a very hard problem.
In particular, you can think about how much time it takes someone to get up to speed on
a task. In comparison to how much it takes them to actually do the work. So if it takes
me forever to sort of figure out what it is I need to do and get the expertise and read the
text and I sort of hit yes or no, it's not right now a good match for crowd sourcing, but we
don't have a good sense of that curve and how quickly it drops off. That would be a great
thing to do. Eric?
>>: So interesting topics you've covered. Which particular challenge really gets you
excited for the next few years in terms of what brings your attention?
>> Michael Bernstein: I think start pushing out bigger. Exactly what I was suggesting.
Start pushing out from sort of not toy systems but things that are taking on simple tasks
to things that really start solving complex interdependent problems is I think really hard
and really exciting, if we can make it happen.
>>: Get a sense of technologies or challenges?
>> Michael Bernstein: I mean, I think about whether you can start attacking that through
sort of sampling-based approaches or whether you could actually create mini
management structures within the crowd to start taking on those kinds of things. I have
no clue whether that will work but that's sort of what's exciting about it.
Certainly I also think really at a high level pushing at bringing crowd data into interactive
applications is a hugely underexplored space in HCI right now. And I'm perhaps
preaching to the choir here. But I really do think that that's another thing that has huge
legs that we can push on.
Yes?
>>: So you've been talking about this [inaudible] crowd intelligence into software
systems, right?
>> Michael Bernstein: Yes.
>>: Do you have any thoughts on how you might want to enter reliability, like Word spell
checker, I know it's not as good as editors but I know it works. But this crowd source
system, who knows maybe the workers are not going to be as good as the [inaudible]
workers.
>> Michael Bernstein: Great point. Another way to put this would be if I run Crowd Proof
three times I'm going to get a bunch of different suggestions. I may not get the same
things twice. That's a hard problem.
But I think that it is something we need to start addressing. I pointed out that reliability in
terms of latency was really important. But you're absolutely right that in addition we need
to think about sort of reliability in terms of repeatability. Great point. I have nothing to
add, but I think it's important. Yes.
>>: As I think through like all the different places you have crowd sourcing systems,
you've shown at least one or two ways, you can apply to, say, Office and I have a whole
bunch of ideas, but I think it totally changes the experience in PowerPoint [inaudible]
what other kinds of desktop systems. But at the same time I think one of the things
software systems allow you to do, is they let you use it in a way where you're trying to
construct something where you're talking about what you're going to do against your
competitor.
And so software systems are nice because they're kind of like private to you and they're
reliable in that way. What do you think about how do you have the crowd as a consultant
under NDA or how do you have privacy? How do you prevent your creative works from
being ripped off and released before you release them?
>> Michael Bernstein: I just published your next paper, by the way.
>>: I can appreciate that. If you're in the business of making a creative work.
>> Michael Bernstein: That's right.
>>: You don't want to have the first chart [inaudible] by the crowd and released on line.
>> Michael Bernstein: What you start to see already is that companies are getting
contracted crowds not from Mechanical Turk but under NDA. Several companies are
doing this. So you can sort of have a sort of your own on demand crowd that you size
dynamically as you need based on the size of the enterprise.
You can also think about homomorphic crowd sourcing, what would it mean to take
actually it and reliably obfuscate the critical parts in a way in which the work can still be
done. But also thinking -- you were sort of pointing at what happens as you go off the
desktop and what's getting crowd sourced is stuff of where I am at any given point.
We're just starting to see people think about this.
I think Jason Hong is starting to think about it and his students, but it's going to be a fun
ride for sure, at the very least.
>> Desney Tan: Any more questions? Thank you.
[applause]
Download