Effects of Award on Worker Behaviors in Crowdsourcing

advertisement
Effect of Award on Worker Behaviors in
Competitive Crowdsourcing Tasks
Ye Yang, Razieh Saremi
Stevens Institute of Technology
CII Forum 2015 Nov. 16-18, 2015
Arlington, VA
1
Introduction and Definitions
• Crowdsourcing
―Coined in 2005 by Jeff Howe and Mark Robinson
―Howe and Robinson in 2006:
o Simply defined, crowdsourcing represents the act of a company or institution taking
a function once performed by employees and outsourcing it to an undefined (and
generally large) network of people in the form of an open call.
―Merriam Webster in 2012:
o The process of obtaining needed services, ideas, or content by soliciting
contributions from a large group of people, and especially from an online
community, rather than from traditional employees or suppliers.
• Software Crowdsourcing
―Stol and Fitzgerald in 2014
o The accomplishment of specified software development tasks on behalf of an
organization by a large and typically undefined group of external people with the
requisite specialist knowledge through an open call.
2
Introduction and Definitions (cont’d)
• New paradigm of
crowdsourced software
development (CSD)
 Reported benefits
 shortened schedule
 innovative solutions
 reduced cost
• Challenges
―Predictive models don’t work
General Competitive CSD Processes
3
CSD Decisions Needing Better Analytics
Developer
Recommendation
(SOSE’15)
Task
Scheduling
(ASE’15)
Task Pricing
(ICSE’13)
Failure
Prediction
(Under review,
ICSE’16)
4
Conflicts with Traditional Model/Law
• Schedule reduction
― Compared with
Parkinson’s law
o “Work expands so as
to fill the time
available for its
completion.”
• Cost reduction
― Compared with
COCOMO
o EFFORT = a * SIZE b
Traditional Predictive Models
don’t work for CSD!
5
Task Price Can Be Accurately Predictable!
(ICSE-NIER’13)
6
Under-Priced Tasks Are More Failure Prone
Estimated vs. Actual Price on Failed Tasks
(82% under-pricing)
Estimated vs. Actual Price on Successful Tasks
(56% under-pricing)
7
Example Decision Scenarios
• How can I estimate how many potential workers that might want
to sign up in my competition?
• How can I assure to get interested workers to work on my task
and make final submissions for the money I pay?
• How can I predict if I am going to get qualified submission from
the registered workers?
• How can I incentivize workers in order to obtain better results?
8
Conceptual Award-Worker Behavior Model
Yerkers-Dodson law
Conceptual Award-Worker Behavior Model
9
Research Questions
• RQ1: How does the award correlate with worker’s behavior in task
selection and completion?
• RQ2: How consistent do workers behave from registering to
submitting for tasks?
• RQ3: How does the number of registrants correlate to the quality
of submission?
• RQ4: For similar tasks, will the number of registrants and
submissions increase as award increase?
10
10
Dataset
• Datasets
―514 component development tasks from Sep 2003 to Sep 2012
―Extracted from TopCoder website
―All tasks are successfully tasks
• Data-preprocessing
Original
Dataset
(514)
Discretize
30 bins,
(514)
Analysis
Remove I
Outlier
Main: 10
bins (494
Analysis I
Aggregate
General (10)
Analysis II, III
Stratification
APPL
(33)
COMM
(34)
DATA
(142)
DEVE
(52)
Analysis
IV
11
Metrics and Statistics
Summary of Basic Statistics
Mean
STDEV
Award
112.5
3000
750
754
372
Size
310
21925
2290
2978
2268
1
72
16
18
11
25000
20000
15000
10000
5000
0
44
4
5
5
Score
75
100
94.16
92.5
6.2
2000
1000
0
DEVE
60
#Reg
1
3000
APPL COMM DATA
#Reg
#Sub
4000
Award
Median
APPL COMM DATA
DEVE
APPL COMM DATA
DEVE
40
30
40
#Sub
Max
20
20
10
0
0
APPL COMM DATA
DEVE
100
Score
Min
Size
Metric
90
Min
80
Max
70
Median
60
Average
APPL COMM DATA
DEVE
Comparison across 4 subsets
12
Analysis I: overall correlation analysis on
dataset “Main”
Rationales and statistics of tasks in four regions
Scatter-plot of #Registrants vs. Award;
Very weak negative correlation of -0.015
Region
Award
#Registrants
#task
s
I
<=750; cheaper
<=18; less
competition
243
II
<=750; cheaper
>18; broader
competition
185
III
>750; more
expensive
>18; broader
competition
22
IV
>750; more
expensive
<=18; less
competition
44
•
•
63% of all tasks priced 750$
85% of all the tasks followed top-7
award settings ($750, $450, $600,
$900, $1050, $150, and $375)
13
13
Analysis I: overall correlation analysis on
dataset “General”
Bin #Tasks
There are bigger pool of workers for
cheaper tasks since they are relatively
easier and require lower experience and
less skill sets.
Award
Size
#Reg #Sub
Score
1
32
142
2562
14
4
94
2
17
226
2582
18
4
95
3
19
353
2886
19
5
95
4
24
447
2766
20
8
96
5
25
612
2663
21
6
95
6
311
750
3129
19
5
92
7
23
913
3468
15
4
94
8
19
1050
2286
22
5
91
9
15
1210
2597
17
4
95
10
Sum
:
9
494
1500
2509 14
3
Correlatio
n:
-0.09 -0.13 -0.40
14
87
-0.71
14
Analysis II: Behavior consistency
APPL
16
14
12
10
8
6
4
2
0
y = 0.211x + 0.5918
R² = 0.4819
0
20
APPL
COMM
40
16
14
12
10
8
6
4
2
0
60
y = 0.2694x - 0.5749
R² = 0.614
0
10
20
30
40
50
0.1
0
0
20
y = 0.1995x + 0.8816
R² = 0.4273
0
40
60
60
0
20
#Registration vs. #Submissions
strong positive correlation of 0.71
0
10
20
40
60
30
40
50
DEVE
DATA
5
0
40
1.2
1.2
5
20
0.2
0
10
10
y = -0.038ln(x) + 0.3425
R² = 0.0245
0.4
0.3
DEVE
y = 0.3116x - 0.1672
R² = 0.4303
0
0.4
0.1
15
15
0.5
0.5
0.2
20
20
y = -0.068ln(x) + 0.4484
R² = 0.1028
0.6
0.3
30
25
0.6
0.7
DATA
35
COMM
0.8
y = -0.11ln(x) + 0.6187
R² = 0.1148
1
0.8
1
y = -0.246ln(x) + 1.0021
R² = 0.5123
0.8
0.6
0.6
0.4
0.4
0.2
0.2
0
0
0
20
40
60
0
20
40
60
#Registration vs. SubmissionRatio
SR: Median 0.25, Mean 0.30
15
Analysis III: Submission Quality
COMM
APPL
110
110
100
100
90
90
80
80
70
70
0
20
40
60
0
20
40
60
40
60
DEVE
DATA
110
100
90
80
70
110
100
90
80
70
0
20
40
60
0
20
Tasks with more than 20 registrants, have higher chance (71.5%) of
receiving better quality submissions (score > 93).
1616
Analysis IV: Similar Tasks
• Further breakdown by size
SubGroup
#
#observed #outlier
Tasks
award
removed
Model
Model
Fitness Fitness registrants Submissions
<2kloc
56
13
1
0.69499
0.44283
2kloc~
3kloc
31
6
1
0.59802
0.85468
3kloc~
5kloc
27
9
1
0.76909
0.0619
>5kloc*
26
11
1
0.46303
0.57195
1717
Discussions – (1)
• Correlation between award and worker behaviors
―Award negatively correlates to all four metrics in the General dataset.
―This indicates that overall, as award increase, the number of registrants, the
number of submissions, and the quality of the final submission all decrease.
―This observation supports the negligible negative roles that award plays in
worker behavior in the conceptual model.
―It may not be cost-effective to simply raise award in order to attract broader
competition, esp. for competitive, 2-winner software crowdsourcing tasks.
• Answer to RQ1: Generally, in task selection, the number of
registrants will decrease as award increase; in task completion,
the number of submissions and score will decrease as award
increase.
18
Discussions – (2)
• Behavior Consistency
―By attracting more registrants, there are higher chances of receiving
satisfactory submissions, however the willingness to submit for each
individual worker reduces.
―This reflects the behavior inconsistency from task registration to task
completion, which supports the assumption of distracting factors in the
conceptual model.
―Possible distracting factors include competition pressure, insufficient time to
complete the task, etc.
• Answer to RQ2: There is a strong positive correlation of 0.71
between number of submissions and number of registrants.
However, there is a decreasing tendency in making submission as
the number of registrants increases.
19
Discussions – (3)
• Quality of Submission
―The positive correlation between number of registrants and submission
scores confirms the improved quality due to leveraging on worker diversity
―However, the low correlation of 0.19 indicate that the previously reported
great impact of team diversity could be limited or weakened by many
distracting factors due to increased competition
―Similar viewpoint: “maximum point” reported by Tajedin et al.
• In summary, the answer to RQ3 is: There is a weak positive
correlation of 0.19 between number of registrants and score of
the winning submission.
20
Discussions – (4)
• Interaction of Award and Competition
―Analysis IV demonstrate some examples on optimal award.
―It is recommended for task requesters to design hybrid competition
combining both collaborative and competitive tasks.
―Future decision support research directions:
o Pricing strategies such as broader competition or higher quality
o Sensitivity analysis for task requesters to explore different options with respect to
their needs and preferences;
• Answer to RQ4: For similar tasks, the relationship between award
and worker behavior follows a variety of inverted U-shape curves.
21
Future Works
• Further evaluation
―Additional dataset from Jan. 2014-Feb. 2015
• In-depth causality analysis using more attributes;
―task complexity, worker registration order, worker availability (multitasking), worker rating, and so on;
• Predictive models to support strategic pricing.
22
Thank you!
Contact:
Ye Yang, ye.yang@stevens.edu
23
Download