Crowdsourcing Presentation

advertisement
Crowdsourcing:
Different concepts and
Platforms
Reshmi De
Bhargabi Chakrabarti
What is Crowdsourcing?
Obtaining service,
ideas , content, etc. by
soliciting contributions from an online
community.
Crowdsourcing can apply to a wide range of
activities. Crowdsourcing can involve division
of labor for tedious tasks split to use crowdbased outsourcing, but it can also apply to
specific requests, such as crowdvoting,
crowdfunding, broad-based competition,
general search for - answers, solutions, or a
missing person.
Crowdsourcing Typology
In his 2013 book, Crowdsourcing, Daren C. Brabham puts forth a problembased typology of crowdsourcing approaches.
These types are:
Knowledge Discovery & Management - for information management
problems where an organization mobilizes a crowd to find and assemble
information. Ideal for creating collective resources.
Distributed Human Intelligence Tasking - for information management
problems where an organization has a set of information in hand and
mobilizes a crowd to process or analyze the information. Ideal for processing
large data sets that computers cannot easily do.
Broadcast Search - for ideation problems where an organization mobilizes a
crowd to come up with a solution to a problem that has an objective, provable
right answer. Ideal for scientific problem solving.
Peer-Vetted Creative Production - for ideation problems where an
organization mobilizes a crowd to come up with a solution to a problem which
has an answer that is subjective or dependent on public support. Ideal for
design, aesthetic, or policy problems.
Crowdfunding
Crowdfunding is the process of funding your projects by a
multitude of people contributing a small amount in order to
attain a certain monetary goal. Goals may be for donations or
for equity in a project.
A well-known crowdfunding tool is Kickstarter, which is the
biggest website for funding creative projects. It has raised
over $100 million, despite its all-or-nothing model which
requires one to reach the proposed monetary goal in order to
acquire the money.
Crowdrise brings together volunteers to fundraise in an
online environment.
Crowdvoting
Crowdvoting occurs when a website gathers a large group's opinions
and judgment on a certain topic.
The Iowa Electronic Market is a prediction market that gathers
crowds' views on politics and tries to ensure accuracy by having
participants pay money to buy and sell contracts based on political
outcomes.
Threadless.com selects the t-shirts it sells by having users provide
designs and vote on the ones they like, which are then printed and
available for purchase. Despite the small nature of the company,
thousands of members provide designs and vote on them, making
the website's products truly created and selected by the crowd,
rather than by the company alone.[
Some of the most famous examples have made use of social media
channels: Domino's Pizza, Coca Cola, Heineken and Sam Adams
have thus crowdsourced a new pizza, bottle design, beer or song,
respectively.[29]
Crowdsearching
Chicago-based startup crowdfynd utilizes a
version of crowdsourcing best termed as
crowdsearching, which differs from
Microwork in that there is no obligated
payment for taking part in the search.
Their platform, through geographic location
anchoring, builds a virtual search party of
smartphone and internet users to find a lost
item, pet or person, as well as return a found
item, pet or property.
Citizen science
Also known as crowd science, crowd-sourced science, or
networked scienceIs scientific research conducted, in whole or in part, by amateur or
nonprofessional scientists, often by crowdsourcing and
crowdfunding.
Sometimes called "public participation in scientific research.“
Citizen-science activities can take many forms: Example:
Citizen scientists can help gather data that will be analyzed by
professional researchers. The American Association of Variable Star
Observers has gathered data on variable stars for educational and
professional analysis since 1911 and promotes participation beyond
its membership on its Citizen Sky website. On BugGuide.Net, an
online community of naturalists who share observations of
arthropods, amateurs and professional researchers contribute to
the analysis.
Popular Platforms



Crowdforge
Artigo
Crowdflower
Crowd4U
 Quadrant of Euphoria
 CrowdDB

Turkit
 Turkomatic
 Amazon Mechanical Turk

Crowdforge
Task breakdown, roughly inspired by the
MapReduce programming
partitition -split a problem into sub-problems
 map -solves a small unit of work
 reduce - combine multiple results
into one


Only map task involves
human intelligence.
Artigo – Social Image tagging

An online gaming platform providing several GWAPs
(game with a purpose)

Aims to supply artworks with tags provided by the
University of Munich, Germany, to automatically
build up an artwork search engine, to scientifically
investigate artworks' reception, and finally to provide
an artwork learning environment

The users learn about art while playing the various
games offered on the platform
Crowdflower

CrowdFlower uses crowdsourcing techniques to
provide a wide range of enterprise solutions.

Has over 50 labor channel partners, among them
Amazon Mechanical Turk and TrialPay.

Peer review helps maintain high accuracy levels.
Crowd4U
Is a project for developing an open
crowdsourcing platform for academic purposes.
The project started in 2010 and the first system
was launched in 2011. As of December 2012, it
was deployed in 14 universities in Japan.
Different crowdsourcing projects, such as those
for information retrieval, library problems, and
help in disasters, are going on with Crowd4U.It
supports a declarative programming language
named CyLog for writing crowdsourcing
applications.
Quadrant of Euphoria
A Crowdsourcing Platform for Quality of
Experience Assessment in network and
multimedia studies,
which features low cost, participant diversity,
meaningful and interpretable QoE
scores, subject consistency assurance, and a
burdenless experiment process.
CrowdDB

Answering Queries with Crowdsourcing
Some queries cannot be answered by machines
only. Processing such queries requires human
input for providing information that is missing
from the database, for performing computationally
difficult functions, and for matching, ranking, or
aggregating results based on fuzzy criteria.
CrowdDB uses human input via crowdsourcing to
process queries that neither database systems nor
search engines can adequately answer. It uses SQL
both as a language for posing complex queries and
as a way to model data.
Turkit

TurKit is a Java/JavaScript API for running
iterative tasks on Mechanical Turk.
You can safely re-execute TurKit programs
without re-running costly side effects on
Mechanical Turk, like creating new HITs, but still
write your program in a straightforward
imperative manner—there is no need to unravel
the program into a state machine
 TurKit is open source, and is hosted on Google
Code—you can download the source code.
http://code.google.com/p/turkit/source/che
ckout
AMAZON MECHANICAL TURK
AMT
MTURK
Background -“The Turk”
●
Automated Chess Player built in 80's
●
...had an human hiding inside it
AMT
●
●
●
Launched Nov 2, 2005
Initially used to solve 'in house' issues of
amazon which required human judgment
and intelligence
Soon realized this is an unique service and
shared it as a web service.
AMT- Stat
AMT
●
HIT
(Human Intelligence Task)
●
Requesters/Developers.
●
Workers.
Designing HITS
Requesters can specify:
Task
●
Keyword
●
Expiration date
●
Reward
●
Time allotted
●
Qualification
●
Creating HITS
●
createHits()
- Method of RequesterService class
(com.amazonaws.mturk.service.axis.RequesterService).
mturk.proterties file
-Contains the requester credentials needed for creating
●
the HITS.
.question file
-Contains the questions in XML format
●
.properties file
-Contains how long the HIT will remain active, how many
●
assignments etc.
Creating multiple HITS
Problem 1::Site Categorization –( eg: Search Engine,
News Site, Online Retailer, Others ?)
URL1 – www.google.com
URL2 – www. Amazon.com
URL3 – www.reuters.com
etc
Need to create separate (but similar) HITS for each
URL.
.
Creating multiple HITS
Solution::
Step1 - Specify the URLS in the .input file:
Creating multiple HITS
Step 2-Specify the template (for categorization)in a
html file:
.question file
Creating multiple HITS
Step 3- .question will have the xml format
$urls variable which is defined as a field in the
input file will be included in the .question file.
Creating multiple HITS
Step 4 - No. of assignments is specified in .properties
file
.
Getting Response back...
.success file
●
-Contains the unique id of the HIT(s) created.
●
getHITTypeResults(success)
-Method of RequesterService class
-Input is an object of HITDataInput class
-Return type is object of HITTypeResults class
●
writeResults()
-Method of HITTypeResults class
●
Sentiment Projects in MTurk
Create sentiment question, specify the number
of Worker responses, and upload your data(in
.csv format, tags can be added).
Aggregated results sent requester to understand
how strong the sentiment is for each item.
AMT sends an email when your project is
completed.
Good for comparing results and for evaluation.
Sentiment Projects in MTurk
Instructions for workers:
Sentiment Projects in MTurk
Scale of Rating
Criticism - AMT

Because HITs are typically simple, repetitive tasks and
users are paid often only a few cents to complete themsome have criticized Mechanical Turk as a "digital
sweatshop".

Because workers are paid as contractors rather than
employees, requesters do not have to file forms for, nor pay
payroll taxes, and they avoid laws regarding minimum
wage, overtime, and workers compensation. Workers,
though, must report their income as self-employment
income.

Some requesters have taken advantage of workers by
having them do the tasks, then rejecting their submissions
in order to avoid paying.

Amazon.com does not monitor the service and refers all
complaints to the poster of the HIT.
Download