Fighting Fire With Fire:
Crowdsourcing Security Solutions
on the Social Web
Christo Wilson
Northeastern University
cbw@ccs.neu.edu
High Quality Sybils and Spam
2
We tend to think of spam as “low quality”
Christo Wilson
Stock
MaxGentleman is the bestest male enhancement system
Photographs
avalable. http://cid-ce6ec5.space.live.com/
What about high quality spam and Sybils?
3
Black Market Crowdsourcing
4
Large and profitable
Growing
exponentially in size and revenue in China
$1 million per month on just one site
Cost effective: $0.21 per click
Starting to grow in US and other countries
Mechanical
Turk, Freelancer
Twitter Follower Markets
Huge problem for existing security systems
Little
to no automation to detect
Turing tests fail
Crowdsourcing Sybil Defense
5
Defenders are losing the battle against OSN Sybils
Idea: build a crowdsourced Sybil detector
Leverage human intelligence
Scalable
Open Questions
How accurate are users?
What factors affect detection accuracy?
Is crowdsourced Sybil detection cost effective?
User Study
6
Two groups of users
Also used by
spammers
Experts – CS professors, masters, and PhD students
Turkers – crowdworkers from Mechanical Turk and Zhubajie
Three ground-truth datasets of full user profiles
Renren – given to us by Renren Inc.
Stock
Facebook US and India
Picture
Crawled
Legitimate profiles – 2-hops from our profiles
Suspicious profiles – stock profile images
Banned suspicious profiles = Sybils
Testers may skip around
and revisit profiles
Real or fake?
Navigation Buttons
Why?
Progress
Classifying
Profiles
Browsing
Profiles
Screenshot of Profile
(Links Cannot be Clicked)
7
Experiment Overview
Data from Renren Fewer Experts
8
Dataset
# of Profiles
Sybil
Renren
Facebook
US
Facebook
India
100
32
50
Test Group
# of
Testers
Profile per
Tester
Chinese Expert
24
100
Chinese Turker
418
10
US Expert
40
50
US Turker
299
12
India Expert
20
100
India Turker
342
12
Legit.
100
50
49
Crawled Data
More Profiles for Experts
Individual Tester Accuracy
9
100
Chinese Turker
US Turker
US Expert
Chinese Expert
80
60
Not so
good :(
CDF (%)
• Experts prove that humans can be accurate
Awesome!
• Turkers
need extra help…
40
80% of experts have
>90% accuracy!
20
0
0
10
20
30
40
50
60
70
Accuracy per Tester (%)
80
90
100
Accuracy of the Crowd
10
Treat each classification by each tester as a vote
Majority makes final decision
Almost Zero False
Experts
False
False
Positives
Dataset
Test Group
Negatives
PerformPositives
Okay
Chinese Expert
0%
Renren
Chinese
0%
Miss
• False positive
ratesTurker
areTurkers
excellent
US Expert Lots of Sybils
0%
Facebook
• Turkers need extra help against false
US
US Turker
2%
3%
63%
10%
India Expert
India Turker
16%
50%
negatives
19%
• What can be done to improve accuracy?
Facebook
India
0%
0%
How Many Classifications Do You Need?
11
100
Error Rate (%)
• Only need a 4-5 classifications
to converge
False Negatives
80
• Few classifications = less cost
China
India
60
40
False Positives
20
US
0
2
4
6
8 10 12 14 16 18 20 22 24
Classifications per Profile
Eliminating Inaccurate Turkers
12
Most workers are
>40% accurate
False Negative Rate (%)
100
80
China
Dramatic
India
Improvement
From 60% to 10%
US
False Negatives
• Only
60 a subset of workers are removed (<50%)
• Getting rid of inaccurate turkers is a no-brainer
40
20
0
0
10
20
30
40
50
Turker Accuracy Threshold (%)
60
70
How to turn our results into a system?
13
1.
Scalability
OSNs
2.
with millions of users
Performance
Improve
turker accuracy
Reduce costs
3.
Privacy
Preserve
user privacy when giving data to turkers
Filter OutArchitecture
Inaccurate
System
14
Turkers
Maximize Usefulness of
High Accuracy Turkers
Crowdsourcing Layer
Rejected!
Experts
Turker
Selection
Very Accurate
Turkers
Accurate Turkers
Sybils
• Leverage Existing Techniques
All Turkers
• Help the System Scale
• Continuous Quality Control
• Locate Malicious Workers
Social Network
Heuristics
User Reports
?
Suspicious Profiles
Filtering Layer
Trace Driven Simulations
15
Classifications
2
Very Accurate
Turkers
Accurate Turkers
Simulate
2000 profiles
Error rates drawn from survey data
Vary 4 parameters
20-50%
Controversial Range
Results++
Results
• Average 6
8 classifications per profile
90%
Threshold
• <1%
<0.1%false
falsepositives
positives
• <1%
<0.1%false
falsenegatives
negatives
Classifications
5
Estimating Cost
16
Estimated cost in a real-world social networks: Tuenti
12,000 profiles to verify daily
14 full-time employees
Minimum wage ($8 per hour) $890 per day
Crowdsourced Sybil Detection
20sec/profile, 8 hour day 50 turkers
Facebook wage ($1 per hour) $400 per day
Cost with malicious turkers
Estimate that 25% of turkers are malicious
63 turkers
$1 per hour $504 per day
Takeaways
17
Humans can differentiate between real and fake profiles
Crowdsourced Sybil detection is feasible
Designed a crowdsourced Sybil detection system
False
positives and negatives <1%
Resistant to infiltration by malicious workers
Sensitive to user privacy
Low cost
Augments existing security systems
18
Questions?
Survey Fatigue
19
100
80
80
80
80
60
60
60
60
40
40
40
40
20
20
20
20
0
0
0
0
0
10
20 30 40
Profile Order
Time per Profile (s)
100
Accuracy (%)
Time per Profile (s)
100
US Turkers
Accuracy
Time
0
2
4 6 8 10
Profile Order
All testers speed up over
time matters
No fatigue
Fatigue
100
Accuracy (%)
US Experts
Sybil Profile Difficulty
Experts perform well on
most difficult Sybils
20
Average Accuracy per Sybil (%)
100
90
80
70
60
• Some Sybils are more stealthy
50
• Experts
catch more tough Sybils than turkers
40
30
Really difficult
profiles
20
10
0
0
5
10
15
20
25
30
Sybil Profiles Ordered By Turker Accuracy
35
Turker
Expert
Preserving User Privacy
21
Showing profiles to crowdworkers raises privacy issues
Solution: reveal profile information in context
Crowdsourced
Evaluation
Friends
Crowdsourced
Evaluation
Friend-Only
Public Profile
Profile
Information
Information