Search and the New Economy Wisdom of the Crowds Prof. Panos Ipeirotis Summary from last session • We can quantify unstructured, qualitative data. We need: • A context in which content is influential and not redundant (experiential content for instance) • A measurable economic variable: price (premium), demand, cost, customer satisfaction, process cycle time • Methods for structuring unstructured content • Methods for aggregating the variables in a business contextaware manner Summary from last session • What are good properties of opinion mining systems? – Structuring: Opinions are expressed in many ways, summarize uniformly – Independent summaries: Not all scenarios have associated economic outcomes, or difficult to measure (e.g., discussion about product pre-announcement) – Personalization: The weight of the opinion of each person varies (interesting future direction!) – Data collection: Rarely evaluations are in one place (basic value proposition of Buzzmetrics) Summary from last session • Review/reputation systems gather the opinion of many users • Review/reputation systems exhibit biases • Prediction markets aggregate information from traders • Prediction markets appear to be (?) robust to biases • Today’s question: Can we harness any wisdom from the crowd? Madness of the Crowds • In 19th century, it was – “The madness of the crowds” (Mackay, 1841) – “A sensible individual, as a member of a crowd becomes a blockhead” (Baruch) – “Madness is the exception in individuals and the rule in groups” (Nietzsche) – “Crowds deliver verdicts of which every individual disagree” (Le Bon) Were they wrong? Case Study 1: InnoCentive Innocentive • Started in 2001 by VP of Eli Lilly • A Craigslist for solutions to scientific problems • Posters post problem and bounty • Solvers try to find a solution • Double blind process, identities hidden • More than 1/3 of problems solved • 57% “reduction to practice” • 43% pencil-and-paper • Problems unsolvable by labs of P&G, etc. The reach of InnoCentive Total Solvers > 70,000 Total Seekers - 34 Scientific Disciplines - 40 Challenges Posted > 200 Challenges Solved > 58 = Countries With Registered Solvers InnoCentive reaches talent pools "Wherever you work, most of the smart people are somewhere else." --- Bill Joy • Traditional pools – US, EU Academics – Contract labs (FTE) – Individual networks Traditional networks • Opportunity pools – – – – – Global Academia Researchers in Russia, India, China. Scientists in other industries Excess capacity Retirees Nontraditional pools of intellectual capacity Who are the solvers? •INNOCENTIVE 216128 (Protein crosslinks) Head of Indian research institute •INNOCENTIVE 3109 (R4-(4-Hydroxyphenyl) Butanoic Acid) Retired head of Hoechst R&D •INNOCENTIVE 96229 (Regio-StereocontrollexTricyclic Alcohols) N. Ireland CRO and US Professor •INNOCENTIVE 258382 (Paracrystalline Arrays) Outside discipline •INNOCENTIVE 55195 (Substituted isoquinoline) Russian scientist Note: In other InnoCentive-like companies, solutions often come not only from individuals but from companies in other industries (the case of unused patents) InnoCentive Proposition • Seekers – Project Roadblock – Revive Stalled Projects or Close “Dead” Projects – Culture – work outside the walls • Solvers – – – – Intellectual challenge Business development opportunities Financial reward Recognition – No direct involvement of a “crowd”: Single solver Why InnoCentive’s approach works? • Reason for success? • Motivation for participants? • Ways to improve? Case Study 2: Procter & Gamble P&G’s Collaborative Approach The "P&G Advisors” program allows consumers to try new products and offer suggestions and feedback to P&G for refining their products and shaping national marketing plans. Before, P&G would spend $25,000 to field a new product concept test that took two months to complete. Now, by engaging the customers, the company spends $2,500 and gets results in about two weeks. P&G spins profits with the SpinBrush • Developed in 1998 by a startup (Dr Johns' Products) • Originated as the “Spin Pop”, a lollipop with a battery-operated handle that twirled the candy in the eater’s mouth. • Price point of $5 was a breakthrough in making this a massmarket product. • In January 2001, Osher sold the idea to P&G. • P&G retained Osher and his team for a year to oversee the transition and gave Osher and his team lots of leeway in bending P&G’s corporate rules in marketing the product. • The Crest SpinBrush is the best-selling toothbrush in the US. It generates over $300 million in annual revenues for P&G. Why P&G’s approach worked? • Reason for success? • Motivation for participants? • Ways to improve? Case Study 3: Wikipedia IS IT TRUE THAT ANYONE CAN EDIT IT? WHAT IS IT? IS IT GOOD? The “Wiki” in Wikipedia • Shorter form of wiki wiki (weekie, weekie) which is from the native language of Hawaii, where it is commonly used to denote something "quick" or "fast” • A wiki is a collaborative website which can be directly edited by anyone with access to it Wikipedia History • Formally began on January 15, 2001 as a complement to the Nupedia project Wikipedia in 2007 • Wikipedia continues to grow, with some 5 million registered editor accounts; the combined Wikipedias in all languages together contain 1.74 billion words in 7.5 million articles in approximately 250 languages; the English Wikipedia gains a steady 1,700 articles a day, with the wikipedia.org domain name ranked at around the 10th busiest on the Internet … Nature (2005) • • • • • Compared Wikipedia with Britannica Online 42 science entries blindly reviewed by experts Results: Britannica averaged 3 errors, Wikipedia 4 Nature: All entries were blinded Britannica: Study had numerous errors Nature (2005) • • • • Economist (4-6-06): Study compares apples and oranges Britannica articles shorter; omissions counted against it Authorities not favored, even viewed with suspicion Response: Do we really need experts for most entries in a general reference source? Nature (2005) • Entries for pop cultural figures vs. those for great literary figures, scientists, etc. • Entry for Britney Spears longer than entry for St. Augustine • Seinfeld longer than Shakespeare; Barbie longer than Bellow • Further drawback of the Nature study: No comparisons of style Why Wikipedia Works? • Reason for success? • Motivation for participants? Case Study 4: Collective Tagging Flickr • Online photo albums • People describe photos using tags • Tag search How to label ALL images on the Web? • The slides that follow demonstrate simple principles: – motivate your users – other people’s procrastination can be your productivity Labeling Images with Words MARTHA STEWART FLOWERS SUPER EVIL STILL AN OPEN PROBLEM Desiderata A METHOD THAT CAN LABEL ALL IMAGES ON THE WEB FAST AND CHEAP Using Humans CLEVERLY THE ESP GAME COULD LABEL ALL IMAGES ON THE WEB IN 30 DAYS! The ESP Game TWO-PLAYER ONLINE GAME PARTNERS DON’T KNOW EACH OTHER AND CAN’T COMMUNICATE OBJECT OF THE GAME: TYPE THE SAME WORD THE ONLY THING IN COMMON IS AN IMAGE The ESP Game PLAYER 1 PLAYER 2 GUESSING: CAR GUESSING: BOY GUESSING: HAT GUESSING: CAR GUESSING: KID SUCCESS! YOU AGREE ON CAR SUCCESS! YOU AGREE ON CAR © 2004 Carnegie Mellon University, all rights reserved. Patent Pending. The ESP Game is FUN 4.1 MILLION LABELS WITH 23,000 PLAYERS THERE ARE MANY PEOPLE THAT PLAY OVER 20 HOURS A WEEK 5000 PEOPLE PLAYING SIMULTANEOUSLY CAN LABEL ALL IMAGES ON GOOGLE IN 30 DAYS! INDIVIDUAL GAMES IN YAHOO! AND MSN AVERAGE OVER 5,000 PLAYERS AT A TIME The ESP Game in Single-Player Mode A SINGLE PERSON CAN PLAY WITH PRERECORDED ACTIONS AS THEIR PARTNER WE EMULATE PARTNER BY PLAYING PRE- RECORDED MOVES (0:12) CAR (0:15) HAT (0:21) KID (0:08) BOY (0:23) CAR WHEN 2 PEOPLE PLAY, WE RECORD EVERY ACTION WITH TIMING INFORMATION NOTICE THAT THIS DOESN’T STOP THE LABELING PROCESS! What about Cheating? Speed Detection IF A PAIR PLAYS TOO FAST, WE DON’T RECORD THE WORDS THEY AGREE ON What about Cheating? Qualification Test WE GIVE PLAYERS TEST IMAGES FOR WHICH WE KNOW ALL THE COMMON LABELS: WE ONLY STORE A PLAYER’S GUESSES IF THEY SUCCESSFULLY LABEL THE TEST IMAGES Sample Labels BEACH CHAIRS SEA PEOPLE MAN WOMAN PLANT OCEAN TALKING WATER PORCH SAMPLE LABELS SADDAM MR. WILSON MAN FACE MOUSTACHE COMING SOON: MEET YOUR SOUL MATE BUSH THROUGH THE GAME! PRESIDENT DUMB YUCK Why ESP Works? • Reason for success? • Motivation for participants? Case Study 5: Amazon Mechanical Turk • Some tasks at present can’t be done by computers, or humans do them much better • At Amazon, this is called a Human Intelligence Task (HIT) • Also called Artificial artificial intelligence • HITs taken by Turks • Examples of HITs – – – – – – – – – – Add Keywords to images Crop Images Distributed Telemarketing Spam Identification Subtitling, speech-to-text Adult content analysis Facial Recognition Proof Reading OCR Correction/Verification Document labelling MTurk – HITs • “Turkers” can take Human Intelligence Tests from the Amazon website – Paid whatever the HIT is worth, potential bonus – Can be required to “qualify” for the HIT – Results Submission can be file upload, multiple choice or freeform text – Demo HIT at mturk.com MTurk – Creating HITs • Define the Question/Task and submission method – Question/Answer Schema • Define number of ‘assignments’ • Qualifications for the HIT • Value of the HIT MTurk – Quality Control • How do you ensure the work delivered by the Turk is of good quality? – Accepting only “correct” answers • Manually • Automatically? – Reputation system for Turkers – Qualification Tests MTurk - Qualifications • Account Based Qualifications – – – – – HIT Abandonment Rate (%) HIT Approval Rate (%) HIT Rejection Rate (%) … more combinations of the above Location • Create Your own Why Mturk Works? • Reason for success? • Motivation for participants? Case Study 6: Digg • Kevin Rose, founder of Digg.com – • • • • $60 million in 18 months New model for a newspaper Readers are also contributors Readers dig up interesting stories from all over the web and post brief synopses Other readers vote on them—the most popular ascend the page It is a community made up of a fairly homogenous demographic—80% are male, mainly young techie readers Execution • The site harnesses the competitive instincts of the readers/contributors to compete to see whose story will lead • The site is dynamic—leading stories change by the minute or hour Key ideas • Putting the human back in the loop! • Who selects to stories that go up on Digg.com’s site? – • People do that Who votes them up (or down)? – People do that (better than any algorithm) Similar ideas • • • Naver.com outcompeted Google in Korea! How did they do that? By replacing the Google algorithm with a natural language search tool that allows users to ask questions Which are then answered by other users! • – – – The Mechanical Turk is inside the system! Yahoo Answers (tries to) repeat this in the US Google Answers failed Why Digg is not as successful as Wikipedia? • Why Digg and Naver are successful but not VERY successful? Wisdom vs Stupidity of the Crowds • To get “wisdom” from a crowd we need: – – – – – Diversity of opinion (same inputs, each participant weighting differently) Independence (no influencing of each other’s decisions) Decentralization (different inputs + bottom up process) Micro-contributions Aggregation mechanism • Reputation/Reward/Quality control • Encourage micro-contributions • If some of the above fail, we get stupidity: – – – – No independence: bubbles No diversity: no appreciation of some inputs No decentralization: no access to vital information No aggregation: the blind men and the elephant Practical Problems • Achieving independence: – – – – – Independence assumes no interaction Independence assumes all have information at same time Often incorrect information arrives first People imitate “early winners” (esp. after 2 observations) Catch: Doing “what the group” does is often good! • Achieving diversity and decentralization: – Difficult when the group aims to satisfy individuals – When personalized results are preferred, the (single result) of homogenous groups work better • Aggregation – – – – Difficult to solicit actions from crowd Attention budget limited Models that capture implicit actions inherently more scalable Need for network effects: better product from consumption Wisdom of the Crowds • In 19th century, it was – “The madness of the crowds” (Mackay, 1841) – “A sensible individual, as a member of a crowd becomes a blockhead” (Baruch) – “Madness is the exception in individuals and the rule in groups” (Nietzsche) – “Crowds deliver verdicts of which every individual disagree” (Le Bon) So, were they wrong? Design Your Own Application • What decision you would “outsource” to a crowd? • What design choices you would make? – – – – How would you try to achieve the requirements? How would you motivate users? Risks? Benefits?