Effect of Award on Worker Behaviors in Competitive Crowdsourcing Tasks Ye Yang, Razieh Saremi Stevens Institute of Technology CII Forum 2015 Nov. 16-18, 2015 Arlington, VA 1 Introduction and Definitions • Crowdsourcing ―Coined in 2005 by Jeff Howe and Mark Robinson ―Howe and Robinson in 2006: o Simply defined, crowdsourcing represents the act of a company or institution taking a function once performed by employees and outsourcing it to an undefined (and generally large) network of people in the form of an open call. ―Merriam Webster in 2012: o The process of obtaining needed services, ideas, or content by soliciting contributions from a large group of people, and especially from an online community, rather than from traditional employees or suppliers. • Software Crowdsourcing ―Stol and Fitzgerald in 2014 o The accomplishment of specified software development tasks on behalf of an organization by a large and typically undefined group of external people with the requisite specialist knowledge through an open call. 2 Introduction and Definitions (cont’d) • New paradigm of crowdsourced software development (CSD) Reported benefits shortened schedule innovative solutions reduced cost • Challenges ―Predictive models don’t work General Competitive CSD Processes 3 CSD Decisions Needing Better Analytics Developer Recommendation (SOSE’15) Task Scheduling (ASE’15) Task Pricing (ICSE’13) Failure Prediction (Under review, ICSE’16) 4 Conflicts with Traditional Model/Law • Schedule reduction ― Compared with Parkinson’s law o “Work expands so as to fill the time available for its completion.” • Cost reduction ― Compared with COCOMO o EFFORT = a * SIZE b Traditional Predictive Models don’t work for CSD! 5 Task Price Can Be Accurately Predictable! (ICSE-NIER’13) 6 Under-Priced Tasks Are More Failure Prone Estimated vs. Actual Price on Failed Tasks (82% under-pricing) Estimated vs. Actual Price on Successful Tasks (56% under-pricing) 7 Example Decision Scenarios • How can I estimate how many potential workers that might want to sign up in my competition? • How can I assure to get interested workers to work on my task and make final submissions for the money I pay? • How can I predict if I am going to get qualified submission from the registered workers? • How can I incentivize workers in order to obtain better results? 8 Conceptual Award-Worker Behavior Model Yerkers-Dodson law Conceptual Award-Worker Behavior Model 9 Research Questions • RQ1: How does the award correlate with worker’s behavior in task selection and completion? • RQ2: How consistent do workers behave from registering to submitting for tasks? • RQ3: How does the number of registrants correlate to the quality of submission? • RQ4: For similar tasks, will the number of registrants and submissions increase as award increase? 10 10 Dataset • Datasets ―514 component development tasks from Sep 2003 to Sep 2012 ―Extracted from TopCoder website ―All tasks are successfully tasks • Data-preprocessing Original Dataset (514) Discretize 30 bins, (514) Analysis Remove I Outlier Main: 10 bins (494 Analysis I Aggregate General (10) Analysis II, III Stratification APPL (33) COMM (34) DATA (142) DEVE (52) Analysis IV 11 Metrics and Statistics Summary of Basic Statistics Mean STDEV Award 112.5 3000 750 754 372 Size 310 21925 2290 2978 2268 1 72 16 18 11 25000 20000 15000 10000 5000 0 44 4 5 5 Score 75 100 94.16 92.5 6.2 2000 1000 0 DEVE 60 #Reg 1 3000 APPL COMM DATA #Reg #Sub 4000 Award Median APPL COMM DATA DEVE APPL COMM DATA DEVE 40 30 40 #Sub Max 20 20 10 0 0 APPL COMM DATA DEVE 100 Score Min Size Metric 90 Min 80 Max 70 Median 60 Average APPL COMM DATA DEVE Comparison across 4 subsets 12 Analysis I: overall correlation analysis on dataset “Main” Rationales and statistics of tasks in four regions Scatter-plot of #Registrants vs. Award; Very weak negative correlation of -0.015 Region Award #Registrants #task s I <=750; cheaper <=18; less competition 243 II <=750; cheaper >18; broader competition 185 III >750; more expensive >18; broader competition 22 IV >750; more expensive <=18; less competition 44 • • 63% of all tasks priced 750$ 85% of all the tasks followed top-7 award settings ($750, $450, $600, $900, $1050, $150, and $375) 13 13 Analysis I: overall correlation analysis on dataset “General” Bin #Tasks There are bigger pool of workers for cheaper tasks since they are relatively easier and require lower experience and less skill sets. Award Size #Reg #Sub Score 1 32 142 2562 14 4 94 2 17 226 2582 18 4 95 3 19 353 2886 19 5 95 4 24 447 2766 20 8 96 5 25 612 2663 21 6 95 6 311 750 3129 19 5 92 7 23 913 3468 15 4 94 8 19 1050 2286 22 5 91 9 15 1210 2597 17 4 95 10 Sum : 9 494 1500 2509 14 3 Correlatio n: -0.09 -0.13 -0.40 14 87 -0.71 14 Analysis II: Behavior consistency APPL 16 14 12 10 8 6 4 2 0 y = 0.211x + 0.5918 R² = 0.4819 0 20 APPL COMM 40 16 14 12 10 8 6 4 2 0 60 y = 0.2694x - 0.5749 R² = 0.614 0 10 20 30 40 50 0.1 0 0 20 y = 0.1995x + 0.8816 R² = 0.4273 0 40 60 60 0 20 #Registration vs. #Submissions strong positive correlation of 0.71 0 10 20 40 60 30 40 50 DEVE DATA 5 0 40 1.2 1.2 5 20 0.2 0 10 10 y = -0.038ln(x) + 0.3425 R² = 0.0245 0.4 0.3 DEVE y = 0.3116x - 0.1672 R² = 0.4303 0 0.4 0.1 15 15 0.5 0.5 0.2 20 20 y = -0.068ln(x) + 0.4484 R² = 0.1028 0.6 0.3 30 25 0.6 0.7 DATA 35 COMM 0.8 y = -0.11ln(x) + 0.6187 R² = 0.1148 1 0.8 1 y = -0.246ln(x) + 1.0021 R² = 0.5123 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 0 0 20 40 60 0 20 40 60 #Registration vs. SubmissionRatio SR: Median 0.25, Mean 0.30 15 Analysis III: Submission Quality COMM APPL 110 110 100 100 90 90 80 80 70 70 0 20 40 60 0 20 40 60 40 60 DEVE DATA 110 100 90 80 70 110 100 90 80 70 0 20 40 60 0 20 Tasks with more than 20 registrants, have higher chance (71.5%) of receiving better quality submissions (score > 93). 1616 Analysis IV: Similar Tasks • Further breakdown by size SubGroup # #observed #outlier Tasks award removed Model Model Fitness Fitness registrants Submissions <2kloc 56 13 1 0.69499 0.44283 2kloc~ 3kloc 31 6 1 0.59802 0.85468 3kloc~ 5kloc 27 9 1 0.76909 0.0619 >5kloc* 26 11 1 0.46303 0.57195 1717 Discussions – (1) • Correlation between award and worker behaviors ―Award negatively correlates to all four metrics in the General dataset. ―This indicates that overall, as award increase, the number of registrants, the number of submissions, and the quality of the final submission all decrease. ―This observation supports the negligible negative roles that award plays in worker behavior in the conceptual model. ―It may not be cost-effective to simply raise award in order to attract broader competition, esp. for competitive, 2-winner software crowdsourcing tasks. • Answer to RQ1: Generally, in task selection, the number of registrants will decrease as award increase; in task completion, the number of submissions and score will decrease as award increase. 18 Discussions – (2) • Behavior Consistency ―By attracting more registrants, there are higher chances of receiving satisfactory submissions, however the willingness to submit for each individual worker reduces. ―This reflects the behavior inconsistency from task registration to task completion, which supports the assumption of distracting factors in the conceptual model. ―Possible distracting factors include competition pressure, insufficient time to complete the task, etc. • Answer to RQ2: There is a strong positive correlation of 0.71 between number of submissions and number of registrants. However, there is a decreasing tendency in making submission as the number of registrants increases. 19 Discussions – (3) • Quality of Submission ―The positive correlation between number of registrants and submission scores confirms the improved quality due to leveraging on worker diversity ―However, the low correlation of 0.19 indicate that the previously reported great impact of team diversity could be limited or weakened by many distracting factors due to increased competition ―Similar viewpoint: “maximum point” reported by Tajedin et al. • In summary, the answer to RQ3 is: There is a weak positive correlation of 0.19 between number of registrants and score of the winning submission. 20 Discussions – (4) • Interaction of Award and Competition ―Analysis IV demonstrate some examples on optimal award. ―It is recommended for task requesters to design hybrid competition combining both collaborative and competitive tasks. ―Future decision support research directions: o Pricing strategies such as broader competition or higher quality o Sensitivity analysis for task requesters to explore different options with respect to their needs and preferences; • Answer to RQ4: For similar tasks, the relationship between award and worker behavior follows a variety of inverted U-shape curves. 21 Future Works • Further evaluation ―Additional dataset from Jan. 2014-Feb. 2015 • In-depth causality analysis using more attributes; ―task complexity, worker registration order, worker availability (multitasking), worker rating, and so on; • Predictive models to support strategic pricing. 22 Thank you! Contact: Ye Yang, ye.yang@stevens.edu 23