Summary from last session - Search and the New Economy

Search and the New Economy
Wisdom of the Crowds
Prof. Panos Ipeirotis
Summary from last session
• We can quantify unstructured, qualitative data. We need:
• A context in which content is influential and not redundant
(experiential content for instance)
• A measurable economic variable: price (premium), demand,
cost, customer satisfaction, process cycle time
• Methods for structuring unstructured content
• Methods for aggregating the variables in a business contextaware manner
Summary from last session
• What are good properties of opinion mining systems?
– Structuring: Opinions are expressed in many ways,
summarize uniformly
– Independent summaries: Not all scenarios have associated
economic outcomes, or difficult to measure (e.g., discussion
about product pre-announcement)
– Personalization: The weight of the opinion of each person
varies (interesting future direction!)
– Data collection: Rarely evaluations are in one place (basic
value proposition of Buzzmetrics)
Summary from last session
• Review/reputation systems gather the opinion of many users
• Review/reputation systems exhibit biases
• Prediction markets aggregate information from traders
• Prediction markets appear to be (?) robust to biases
• Today’s question: Can we harness any wisdom from the crowd?
Madness of the Crowds
• In 19th century, it was
– “The madness of the crowds” (Mackay, 1841)
– “A sensible individual, as a member of a crowd becomes a blockhead”
(Baruch)
– “Madness is the exception in individuals and the rule in groups” (Nietzsche)
– “Crowds deliver verdicts of which every individual disagree” (Le Bon)
Were they wrong?
Case Study 1: InnoCentive
Innocentive
• Started in 2001 by VP of Eli Lilly
• A Craigslist for solutions to scientific problems
• Posters post problem and bounty
• Solvers try to find a solution
• Double blind process, identities hidden
• More than 1/3 of problems solved
• 57% “reduction to practice”
• 43% pencil-and-paper
• Problems unsolvable by labs of P&G, etc.
The reach of InnoCentive
Total Solvers > 70,000
Total Seekers - 34
Scientific Disciplines - 40
Challenges Posted > 200
Challenges Solved > 58
= Countries With Registered Solvers
InnoCentive reaches talent pools
"Wherever you work, most of the smart people are somewhere else." --- Bill Joy
• Traditional pools
– US, EU Academics
– Contract labs (FTE)
– Individual networks
Traditional networks
• Opportunity pools
–
–
–
–
–
Global Academia
Researchers in Russia, India, China.
Scientists in other industries
Excess capacity
Retirees
Nontraditional pools of
intellectual capacity
Who are the solvers?
•INNOCENTIVE 216128 (Protein crosslinks)
Head of Indian research institute
•INNOCENTIVE 3109 (R4-(4-Hydroxyphenyl) Butanoic Acid)
Retired head of Hoechst R&D
•INNOCENTIVE 96229 (Regio-StereocontrollexTricyclic Alcohols)
N. Ireland CRO and US Professor
•INNOCENTIVE 258382 (Paracrystalline Arrays)
Outside discipline
•INNOCENTIVE 55195 (Substituted isoquinoline)
Russian scientist
Note: In other InnoCentive-like companies, solutions
often come not only from individuals but from companies
in other industries (the case of unused patents)
InnoCentive Proposition
• Seekers
– Project Roadblock
– Revive Stalled Projects or Close “Dead” Projects
– Culture – work outside the walls
• Solvers
–
–
–
–
Intellectual challenge
Business development opportunities
Financial reward
Recognition
– No direct involvement of a “crowd”: Single solver
Why InnoCentive’s approach works?
• Reason for success?
• Motivation for participants?
• Ways to improve?
Case Study 2: Procter & Gamble
P&G’s Collaborative Approach
The "P&G Advisors” program allows consumers to try
new products and offer suggestions and feedback to P&G
for refining their products and shaping national
marketing plans.
Before, P&G would spend $25,000 to field a new
product concept test that took two months to
complete.
Now, by engaging the customers, the company spends
$2,500 and gets results in about two weeks.
P&G spins profits with the SpinBrush
•
Developed in 1998 by a startup (Dr Johns' Products)
•
Originated as the “Spin Pop”, a lollipop with a battery-operated
handle that twirled the candy in the eater’s mouth.
•
Price point of $5 was a breakthrough in making this a massmarket product.
•
In January 2001, Osher sold the idea to P&G.
•
P&G retained Osher and his team for a year to oversee the
transition and gave Osher and his team lots of leeway in
bending P&G’s corporate rules in marketing the product.
•
The Crest SpinBrush is the best-selling toothbrush in the US. It
generates over $300 million in annual revenues for P&G.
Why P&G’s approach worked?
• Reason for success?
• Motivation for participants?
• Ways to improve?
Case Study 3: Wikipedia
IS IT TRUE THAT
ANYONE CAN EDIT
IT?
WHAT IS
IT?
IS IT GOOD?
The “Wiki” in Wikipedia
• Shorter form of wiki wiki (weekie,
weekie) which is from the native
language of Hawaii, where it is
commonly used to denote something
"quick" or "fast”
• A wiki is a collaborative website
which can be directly edited by
anyone with access to it
Wikipedia History
• Formally began on January 15, 2001 as a complement to
the Nupedia project
Wikipedia in 2007
• Wikipedia continues to grow, with some 5 million
registered editor accounts; the combined Wikipedias in all
languages together contain 1.74 billion words in 7.5
million articles in approximately 250 languages; the
English Wikipedia gains a steady 1,700 articles a day, with
the wikipedia.org domain name ranked at around the
10th busiest on the Internet …
Nature (2005)
•
•
•
•
•
Compared Wikipedia with Britannica Online
42 science entries blindly reviewed by experts
Results: Britannica averaged 3 errors, Wikipedia 4
Nature: All entries were blinded
Britannica: Study had numerous
errors
Nature (2005)
•
•
•
•
Economist (4-6-06): Study compares apples and oranges
Britannica articles shorter; omissions counted against it
Authorities not favored, even viewed with suspicion
Response: Do we really need
experts for most entries in a
general reference source?
Nature (2005)
• Entries for pop cultural figures vs. those for great literary
figures, scientists, etc.
• Entry for Britney Spears longer than entry for St.
Augustine
• Seinfeld longer than Shakespeare; Barbie longer than
Bellow
• Further drawback of the Nature
study: No comparisons of style
Why Wikipedia Works?
• Reason for success?
• Motivation for participants?
Case Study 4: Collective Tagging
Flickr
• Online photo albums
• People describe photos using tags
• Tag search
How to label ALL images on the Web?
• The slides that follow demonstrate simple principles:
– motivate your users
– other people’s procrastination can be your productivity
Labeling Images with Words
MARTHA STEWART
FLOWERS
SUPER EVIL
STILL AN OPEN PROBLEM
Desiderata
A METHOD THAT CAN LABEL
ALL IMAGES ON THE WEB
FAST AND CHEAP
Using Humans CLEVERLY
THE ESP GAME COULD LABEL ALL
IMAGES ON THE WEB IN 30 DAYS!
The ESP Game
TWO-PLAYER ONLINE GAME
PARTNERS DON’T KNOW EACH OTHER
AND CAN’T COMMUNICATE
OBJECT OF THE GAME:
TYPE THE SAME WORD
THE ONLY THING IN COMMON IS
AN IMAGE
The ESP Game
PLAYER 1
PLAYER 2
GUESSING: CAR
GUESSING: BOY
GUESSING: HAT
GUESSING: CAR
GUESSING: KID
SUCCESS!
YOU AGREE ON CAR
SUCCESS!
YOU AGREE ON CAR
© 2004 Carnegie Mellon University, all rights reserved. Patent Pending.
The ESP Game is FUN
4.1 MILLION LABELS WITH 23,000 PLAYERS
THERE ARE MANY PEOPLE THAT PLAY
OVER 20 HOURS A WEEK
5000 PEOPLE PLAYING SIMULTANEOUSLY CAN
LABEL ALL IMAGES ON GOOGLE IN 30 DAYS!
INDIVIDUAL GAMES IN YAHOO! AND MSN
AVERAGE OVER 5,000 PLAYERS AT A TIME
The ESP Game in Single-Player Mode
A SINGLE PERSON CAN PLAY WITH PRERECORDED ACTIONS AS THEIR PARTNER
WE EMULATE PARTNER BY
PLAYING PRE- RECORDED
MOVES
(0:12) CAR
(0:15) HAT
(0:21) KID
(0:08) BOY
(0:23) CAR
WHEN 2 PEOPLE PLAY, WE
RECORD EVERY ACTION WITH
TIMING INFORMATION
NOTICE THAT THIS
DOESN’T STOP THE
LABELING PROCESS!
What about Cheating?
Speed Detection
IF A PAIR PLAYS TOO FAST, WE DON’T
RECORD THE WORDS THEY AGREE ON
What about Cheating?
Qualification Test
WE GIVE PLAYERS TEST IMAGES FOR WHICH
WE KNOW ALL THE COMMON LABELS:
WE ONLY STORE A PLAYER’S GUESSES
IF THEY SUCCESSFULLY LABEL THE
TEST IMAGES
Sample Labels
BEACH
CHAIRS
SEA
PEOPLE
MAN
WOMAN
PLANT
OCEAN
TALKING
WATER
PORCH
SAMPLE LABELS
SADDAM
MR. WILSON
MAN
FACE
MOUSTACHE
COMING SOON:
MEET YOUR SOUL MATE
BUSH
THROUGH THE GAME!
PRESIDENT
DUMB
YUCK
Why ESP Works?
• Reason for success?
• Motivation for participants?
Case Study 5: Amazon Mechanical Turk
• Some tasks at present can’t
be done by computers, or
humans do them much
better
• At Amazon, this is called a
Human Intelligence Task
(HIT)
• Also called Artificial
artificial intelligence
• HITs taken by Turks
• Examples of HITs
–
–
–
–
–
–
–
–
–
–
Add Keywords to images
Crop Images
Distributed Telemarketing
Spam Identification
Subtitling, speech-to-text
Adult content analysis
Facial Recognition
Proof Reading
OCR Correction/Verification
Document labelling
MTurk – HITs
• “Turkers” can take Human Intelligence Tests from the
Amazon website
– Paid whatever the HIT is worth, potential bonus
– Can be required to “qualify” for the HIT
– Results Submission can be file upload, multiple choice or freeform
text
– Demo HIT at mturk.com
MTurk – Creating HITs
• Define the Question/Task and submission method
– Question/Answer Schema
• Define number of ‘assignments’
• Qualifications for the HIT
• Value of the HIT
MTurk – Quality Control
• How do you ensure the work delivered by the Turk is of
good quality?
– Accepting only “correct” answers
• Manually
• Automatically?
– Reputation system for Turkers
– Qualification Tests
MTurk - Qualifications
• Account Based Qualifications
–
–
–
–
–
HIT Abandonment Rate (%)
HIT Approval Rate (%)
HIT Rejection Rate (%)
… more combinations of the above
Location
• Create Your own
Why Mturk Works?
• Reason for success?
• Motivation for participants?
Case Study 6: Digg
•
Kevin Rose, founder of Digg.com
–
•
•
•
•
$60 million in 18 months
New model for a newspaper
Readers are also contributors
Readers dig up interesting stories from all
over the web and post brief synopses
Other readers vote on them—the most
popular ascend the page
It is a community made up of
a fairly homogenous
demographic—80% are male,
mainly young techie readers
Execution
•
The site harnesses the competitive instincts of the
readers/contributors to compete to see whose story will
lead
•
The site is dynamic—leading stories change by the
minute or hour
Key ideas
•
Putting the human back in the loop!
•
Who selects to stories that go up on Digg.com’s site?
–
•
People do that
Who votes them up (or down)?
–
People do that (better than any algorithm)
Similar ideas
•
•
•
Naver.com outcompeted Google in Korea!
How did they do that?
By replacing the Google algorithm with a natural
language search tool that allows users to ask questions
Which are then answered by other users!
•
–
–
–
The Mechanical Turk is inside the system!
Yahoo Answers (tries to) repeat this in the US
Google Answers failed
Why Digg is not as successful as Wikipedia?
• Why Digg and Naver are successful but not VERY
successful?
Wisdom vs Stupidity of the Crowds
• To get “wisdom” from a crowd we need:
–
–
–
–
–
Diversity of opinion (same inputs, each participant weighting differently)
Independence (no influencing of each other’s decisions)
Decentralization (different inputs + bottom up process)
Micro-contributions
Aggregation mechanism
• Reputation/Reward/Quality control
• Encourage micro-contributions
• If some of the above fail, we get stupidity:
–
–
–
–
No independence: bubbles
No diversity: no appreciation of some inputs
No decentralization: no access to vital information
No aggregation: the blind men and the elephant
Practical Problems
• Achieving independence:
–
–
–
–
–
Independence assumes no interaction
Independence assumes all have information at same time
Often incorrect information arrives first
People imitate “early winners” (esp. after 2 observations)
Catch: Doing “what the group” does is often good!
• Achieving diversity and decentralization:
– Difficult when the group aims to satisfy individuals
– When personalized results are preferred, the (single result) of homogenous
groups work better
• Aggregation
–
–
–
–
Difficult to solicit actions from crowd
Attention budget limited
Models that capture implicit actions inherently more scalable
Need for network effects: better product from consumption
Wisdom of the Crowds
• In 19th century, it was
– “The madness of the crowds” (Mackay, 1841)
– “A sensible individual, as a member of a crowd becomes a
blockhead” (Baruch)
– “Madness is the exception in individuals and the rule in groups”
(Nietzsche)
– “Crowds deliver verdicts of which every individual disagree” (Le
Bon)
So, were they wrong?
Design Your Own Application
• What decision you would “outsource” to a crowd?
• What design choices you would make?
–
–
–
–
How would you try to achieve the requirements?
How would you motivate users?
Risks?
Benefits?