slides

advertisement
Fighting Fire With Fire:
Crowdsourcing Security Solutions
on the Social Web
Christo Wilson
Northeastern University
cbw@ccs.neu.edu
High Quality Sybils and Spam
2

We tend to think of spam as “low quality”
Christo Wilson
Stock
MaxGentleman is the bestest male enhancement system
Photographs
avalable. http://cid-ce6ec5.space.live.com/

What about high quality spam and Sybils?
3
Black Market Crowdsourcing
4

Large and profitable
 Growing
exponentially in size and revenue in China
 $1 million per month on just one site
 Cost effective: $0.21 per click

Starting to grow in US and other countries
 Mechanical
Turk, Freelancer
 Twitter Follower Markets

Huge problem for existing security systems
 Little
to no automation to detect
 Turing tests fail
Crowdsourcing Sybil Defense
5


Defenders are losing the battle against OSN Sybils
Idea: build a crowdsourced Sybil detector
Leverage human intelligence
 Scalable


Open Questions
How accurate are users?
 What factors affect detection accuracy?
 Is crowdsourced Sybil detection cost effective?

User Study
6

Two groups of users
Also used by
spammers
Experts – CS professors, masters, and PhD students
 Turkers – crowdworkers from Mechanical Turk and Zhubajie


Three ground-truth datasets of full user profiles
Renren – given to us by Renren Inc.
Stock
 Facebook US and India
Picture
 Crawled
 Legitimate profiles – 2-hops from our profiles
 Suspicious profiles – stock profile images
 Banned suspicious profiles = Sybils

Testers may skip around
and revisit profiles
Real or fake?
Navigation Buttons
Why?
Progress
Classifying
Profiles
Browsing
Profiles
Screenshot of Profile
(Links Cannot be Clicked)
7
Experiment Overview
Data from Renren Fewer Experts
8
Dataset
# of Profiles
Sybil
Renren
Facebook
US
Facebook
India
100
32
50
Test Group
# of
Testers
Profile per
Tester
Chinese Expert
24
100
Chinese Turker
418
10
US Expert
40
50
US Turker
299
12
India Expert
20
100
India Turker
342
12
Legit.
100
50
49
Crawled Data
More Profiles for Experts
Individual Tester Accuracy
9
100
Chinese Turker
US Turker
US Expert
Chinese Expert
80
60
Not so
good :(
CDF (%)
• Experts prove that humans can be accurate
Awesome!
• Turkers
need extra help…
40
80% of experts have
>90% accuracy!
20
0
0
10
20
30
40
50
60
70
Accuracy per Tester (%)
80
90
100
Accuracy of the Crowd
10


Treat each classification by each tester as a vote
Majority makes final decision
Almost Zero False
Experts
False
False
Positives
Dataset
Test Group
Negatives
PerformPositives
Okay
Chinese Expert
0%
Renren
Chinese
0%
Miss
• False positive
ratesTurker
areTurkers
excellent
US Expert Lots of Sybils
0%
Facebook
• Turkers need extra help against false
US
US Turker
2%
3%
63%
10%
India Expert
India Turker
16%
50%
negatives
19%
• What can be done to improve accuracy?
Facebook
India
0%
0%
How Many Classifications Do You Need?
11
100
Error Rate (%)
• Only need a 4-5 classifications
to converge
False Negatives
80
• Few classifications = less cost
China
India
60
40
False Positives
20
US
0
2
4
6
8 10 12 14 16 18 20 22 24
Classifications per Profile
Eliminating Inaccurate Turkers
12
Most workers are
>40% accurate
False Negative Rate (%)
100
80
China
Dramatic
India
Improvement
From 60% to 10%
US
False Negatives
• Only
60 a subset of workers are removed (<50%)
• Getting rid of inaccurate turkers is a no-brainer
40
20
0
0
10
20
30
40
50
Turker Accuracy Threshold (%)
60
70
How to turn our results into a system?
13
1.
Scalability
 OSNs
2.
with millions of users
Performance
 Improve
turker accuracy
 Reduce costs
3.
Privacy
 Preserve
user privacy when giving data to turkers
Filter OutArchitecture
Inaccurate
System
14
Turkers
Maximize Usefulness of
High Accuracy Turkers
Crowdsourcing Layer
Rejected!
Experts
Turker
Selection
Very Accurate
Turkers
Accurate Turkers
Sybils
• Leverage Existing Techniques
All Turkers
• Help the System Scale
• Continuous Quality Control
• Locate Malicious Workers
Social Network
Heuristics
User Reports
?
Suspicious Profiles
Filtering Layer
Trace Driven Simulations
15
Classifications
2
Very Accurate
Turkers
Accurate Turkers
 Simulate
2000 profiles
 Error rates drawn from survey data
 Vary 4 parameters
20-50%
Controversial Range
Results++
Results
• Average 6
8 classifications per profile
90%
Threshold
• <1%
<0.1%false
falsepositives
positives
• <1%
<0.1%false
falsenegatives
negatives
Classifications
5
Estimating Cost
16

Estimated cost in a real-world social networks: Tuenti
12,000 profiles to verify daily
 14 full-time employees
 Minimum wage ($8 per hour)  $890 per day


Crowdsourced Sybil Detection
20sec/profile, 8 hour day  50 turkers
 Facebook wage ($1 per hour)  $400 per day


Cost with malicious turkers
Estimate that 25% of turkers are malicious
 63 turkers
 $1 per hour  $504 per day

Takeaways
17



Humans can differentiate between real and fake profiles
Crowdsourced Sybil detection is feasible
Designed a crowdsourced Sybil detection system
 False
positives and negatives <1%
 Resistant to infiltration by malicious workers
 Sensitive to user privacy
 Low cost

Augments existing security systems
18
Questions?
Survey Fatigue
19
100
80
80
80
80
60
60
60
60
40
40
40
40
20
20
20
20
0
0
0
0
0
10
20 30 40
Profile Order
Time per Profile (s)
100
Accuracy (%)
Time per Profile (s)
100
US Turkers
Accuracy
Time
0
2
4 6 8 10
Profile Order
All testers speed up over
time matters
No fatigue
Fatigue
100
Accuracy (%)
US Experts
Sybil Profile Difficulty
Experts perform well on
most difficult Sybils
20
Average Accuracy per Sybil (%)
100
90
80
70
60
• Some Sybils are more stealthy
50
• Experts
catch more tough Sybils than turkers
40
30
Really difficult
profiles
20
10
0
0
5
10
15
20
25
30
Sybil Profiles Ordered By Turker Accuracy
35
Turker
Expert
Preserving User Privacy
21


Showing profiles to crowdworkers raises privacy issues
Solution: reveal profile information in context
Crowdsourced
Evaluation
Friends
Crowdsourced
Evaluation
Friend-Only
Public Profile
Profile
Information
Information
Download