WELCOME! Amazon Mechanical Turk New York City Meet Up September 1, 2009 © 2009 Amazon.com, Inc. or its Affiliates. AGENDA Welcoming Statements Introductions Dolores Labs – Video Directory Use Case Knewton – Adaptive Learning Use Case FreedomOSS – Enterprise Integration New York University – Worker Quality Solution Panel Questions and Answers © 2009 Amazon.com, Inc. or its Affiliates. Amazon Mechanical Turk Requester Meetup Howie Liu Dolores Labs © 2009 Amazon.com, Inc. or its Affiliates. Dolores Labs Introduction Founded in 2008 by Lukas Biewald, Senior Scientist, Powerset (MSFT); Yahoo! Search; Stanford AI Lab – Recognized enormous potential of AMT platform Dolores Labs develops quality control technology (CrowdControl™) to make AMT more accessible and reliable © 2009 Amazon.com, Inc. or its Affiliates. Case Study A large video directory needed to select relevant thumbnails for 200k+ videos © 2009 Amazon.com, Inc. or its Affiliates. Why Mechanical Turk? Size of project and turnover speed made MTurk the obvious solution – Given the needs of the client, traditional outsourcing or hiring employees was not an option – However, the client was concerned about quality of results Inherent variability of Mechanical Turk workers – Unlike other Amazon marketplaces, workers are not a perfect commodity – Significant variations in quality (accuracy) – Need to ensure workers diligently completed work – Intelligently aggregate multiple responses to find the single best thumbnail for a video 6 © 2009 Amazon.com, Inc. or its Affiliates. 3 Step Process for Optimizing the Task • Create a custom interactive UI Baseline Performance • 74% result accuracy • Apply statistical quality control CrowdControl™ • 90% result accuracy • Second pass for Turkers to verify results CrowdControl™ + 2 pass • 98% result accuracy © 2009 Amazon.com, Inc. or its Affiliates. High Quality on Mechanical Turk: Best Practices CrowdControl™ vs Baseline Result Accuracy CrowdControl™ + Custom Solutions CrowdControl™ Baseline Performance 70 75 80 85 90 95 100 Statistical inference algorithms to dynamically assess quality – …Of each worker, of each result – …While the task is live – Smart allocation of worker resources • Blindly increasing redundancy is expensive Aggregating all responses from workers with varying quality into a single “best” answer White paper with Stanford AI Lab about quality on AMT http://bit.ly/DLpaper © 2009 Amazon.com, Inc. or its Affiliates. Other Insights Clear task instructions are crucial for good results – Garbage in, garbage out Intuitive and efficient task interface makes the task faster (read—cheaper) and more fun! Mechanical Turk is an unprecedented, hyperefficient labor marketplace – Need to understand its dynamics through experience in order to harness its power © 2009 Amazon.com, Inc. or its Affiliates. Amazon Mechanical Turk Requester Meetup Dahn Tamir, Knewton Inc. © 2009 Amazon.com, Inc. or its Affiliates. Knewton - Introduction Live online GMAT and LSAT prep courses customized for each student, powered by the world’s most advanced adaptive learning engine. Selected to the 2009 AlwaysOn Global 250 List. Named Category Winner in the Digital Education field. © 2009 Amazon.com, Inc. or its Affiliates. How we use MTurk Calibration for computer-adaptive testing Quality assurance Focus Groups and Surveys Database building Marketing © 2009 Amazon.com, Inc. or its Affiliates. Why Mturk? Speed Cost Appropriate worker population for each task Quality © 2009 Amazon.com, Inc. or its Affiliates. What We Learned Turkers are a diverse and capable population Use qualification tests Invest in building good HITs Hesitate to reject work (but not cheaters) Meet Turker Nation © 2009 Amazon.com, Inc. or its Affiliates. Thank you! --Questions? dahn@knewton.com 978-KNEWTON © 2009 Amazon.com, Inc. or its Affiliates. Amazon Mechanical Turk Requester Meet-up (Max Yankelevich, Chief Architect– Freedom OSS) © 2009 Amazon.com, Inc. or its Affiliates. Freedom OSS- Introduction Freedom OSS is a professional services organization with a focus on Practical Implementations using Cloud Computing & Open Source Technologies International Firm – US Offices: PA,NYC, GA, KC ,NV, WA,NC – 4 Large Solution Centers in Eastern Europe (Russia, Belarus, Ukraine and Lithuania) Practical Approach to Cloud Computing – most successfully completed Enterprise Cloud Computing projects in the Industry Key Cloud Computing Partnerships – Top Amazon AWS Enterprise System Integrator – Top Eucalyptus Enterprise Partner Key Open Source Partnerships – Top Red Hat Advanced Business Partner – #1 JBoss Advanced Business Partner in US 2008 “JBoss SOA Innovation” Award Winner 2007-08 “Practical SOA” Award Winner 2008 “Red Hat Extensive Ecosystem” Award Winner Leading technology partner for many Fortune 2000 companies Freedom is a privately held corporation © 2009 Amazon.com, Inc. or its Affiliates. MTurk and Enterprise Integration Most Legacy systems are not architected to include the human intervention Providing a technological interface to maintain the workflow while inserting human intelligence and building self adjudicating business flows Leveraging Mechanical Turk programmatically in your everyday systems Freedom OSS has leveraged the power of Enterprise Service Bus (ESB) & Practical Service Oriented Architecture (SOA) to make the process of on-boarding and managing MTurk workers a rapid and cost effective process Using its Professional Open Source ESB – freeESB , Freedom has developed many powerful Connectors for some of the most used Enterprise Systems and Technologies such as SAP, Mainframe, Siebel, Java/J2EE, Oracle , IBM MQ ,etc © 2009 Amazon.com, Inc. or its Affiliates. Master Data Cleansing & Validation Use Case Keeping Master Customer Data File (Master Data Management) – Record de-duping – Contact information validation Traditional MDM tactics – Expensive software – Big Bang approach – Invasive Code Changes to Legacy Applications Clean and consistent customer data © 2009 Amazon.com, Inc. or its Affiliates. Business Applications Real-time access API Real-time Events AWS Cloud Master Data freeESB Routing , Transformation, Connectivity, QoS Business Process Orchestration & Workflow First Turk Task – Simple Data Checking Second Turk Task – Deeper Data Checking Third Turk Task – Data Edit/Trusted Task Business Rules Engine Legacy Applications Mainframe, Client-Server, Oracle, .NET, SAP, Siebel ,etc © 2009 Amazon.com, Inc. or its Affiliates. Outcome Low operational costs Non-invasive data integration High-degree of accuracy due to multi-task distribution Some Best Practices when integrating MTurk within an Enterprise – Deliver value incrementally – Inversion of Control © 2009 Amazon.com, Inc. or its Affiliates. Thank you! --Questions? © 2009 Amazon.com, Inc. or its Affiliates. Amazon Mechanical Turk Requester Meetup (Panos Ipeirotis – New York University) © 2009 Amazon.com, Inc. or its Affiliates. Panos Ipeirotis - Introduction New York University, Stern School of Business “A Computer Scientist in a Business School” http://behind-the-enemy-lines.blogspot.com/ Email: panos@nyu.edu © 2009 Amazon.com, Inc. or its Affiliates. Example: Build an Adult Web Site Classifier Need a large number of hand-labeled sites Get people to look at sites and classify them as: G (general), PG (parental guidance), R (restricted), X (porn) Cost/Speed Statistics Undergrad intern: 200 websites/hr, cost: $15/hr MTurk: 2500 websites/hr, cost: $12/hr © 2009 Amazon.com, Inc. or its Affiliates. Bad news: Spammers! Worker ATAMRO447HWJQ labeled X (porn) sites as G (general audience) © 2009 Amazon.com, Inc. or its Affiliates. Improve Data Quality through Repeated Labeling Get multiple, redundant labels using multiple workers Pick the correct label based on majority vote 11 workers 93% correct 1 worker 70% correct Probability of correctness increases with number of workers Probability of correctness increases with quality of workers © 2009 Amazon.com, Inc. or its Affiliates. But Majority Voting is Expensive Single Vote Statistics MTurk: 2500 websites/hr, cost: $12/hr Undergrad: 200 websites/hr, cost: $15/hr 11-vote Statistics MTurk: 227 websites/hr, cost: $12/hr Undergrad: 200 websites/hr, cost: $15/hr © 2009 Amazon.com, Inc. or its Affiliates. Using redundant votes, we can infer worker quality Look at our spammer friend ATAMRO447HWJQ together with other 9 workers We can compute error rates for each worker Our “friend” ATAMRO447HWJQ P[X → G]=90.153% mainly marked sites as G. Obviously a spammer… P[G → G]=99.947% Error rates for ATAMRO447HWJQ P[X → X]=9.847% P[G → X]=0.053% © 2009 Amazon.com, Inc. or its Affiliates. Rejecting spammers and Benefits Random answers error rate = 50% Average error rate for ATAMRO447HWJQ: 45.2% P[X → X]=9.847% P[G → X]=0.053% P[X → G]=90.153% P[G → G]=99.947% Action: REJECT and BLOCK Results: Over time you block all spammers Spammers learn to avoid your HITS You can decrease redundancy, as quality of workers is higher © 2009 Amazon.com, Inc. or its Affiliates. After rejecting spammers, quality goes up Spam keeps quality down Without spam, workers are of higher quality Need less redundancy for same quality Same quality of results for lower cost Without spam 5 workers 94% correct Without spam 1 worker With spam 80% correct 11 workers 93% correct With spam 1 worker 70% correct © 2009 Amazon.com, Inc. or its Affiliates. Correcting biases Classifying sites as G, PG, R, X Sometimes workers are careful but biased Error Rates for Worker: ATLJIK76YH1TF P[G → G]=20.0% P[P → G]=0.0% P[R → G]=0.0% P[X → G]=0.0% P[G → P]=80.0% P[P → P]=0.0% P[R → P]=0.0% P[X → P]=0.0% P[G → R]=0.0% P[P → R]=100.0% P[R → R]=100.0% P[X → R]=0.0% P[G → X]=0.0% P[P → X]=0.0% P[R → X]=0.0% P[X → X]=100.0% Classifies G → P and P → R Average error rate for ATLJIK76YH1TF: 45.0% Is ATLJIK76YH1TF a spammer? © 2009 Amazon.com, Inc. or its Affiliates. Correcting biases Error Rates for Worker: ATLJIK76YH1TF P[G → G]=20.0% P[P → G]=0.0% P[R → G]=0.0% P[X → G]=0.0% P[G → P]=80.0% P[P → P]=0.0% P[R → P]=0.0% P[X → P]=0.0% P[G → R]=0.0% P[P → R]=100.0% P[R → R]=100.0% P[X → R]=0.0% P[G → X]=0.0% P[P → X]=0.0% P[R → X]=0.0% P[X → X]=100.0% For ATLJIK76YH1TF, we simply need to compute the “nonrecoverable” error-rate (technical details omitted) Non-recoverable error-rate for ATLJIK76YH1TF: 9% © 2009 Amazon.com, Inc. or its Affiliates. Too much theory? Open source implementation available at: http://code.google.com/p/get-another-label/ Input: – Labels from Mechanical Turk – Cost of incorrect labelings (e.g., XG costlier than GX) Output: – Corrected labels – Worker error rates – Ranking of workers according to their quality Alpha version, more improvements to come! Suggestions and collaborations welcomed! © 2009 Amazon.com, Inc. or its Affiliates. Thank you! Questions? “A Computer Scientist in a Business School” http://behind-the-enemy-lines.blogspot.com/ Email: panos@nyu.edu © 2009 Amazon.com, Inc. or its Affiliates.