Fighting Fire With Fire: Crowdsourcing Security Solutions on the Social Web Christo Wilson Northeastern University cbw@ccs.neu.edu High Quality Sybils and Spam 2 We tend to think of spam as “low quality” Christo Wilson Stock MaxGentleman is the bestest male enhancement system Photographs avalable. http://cid-ce6ec5.space.live.com/ What about high quality spam and Sybils? 3 Black Market Crowdsourcing 4 Large and profitable Growing exponentially in size and revenue in China $1 million per month on just one site Cost effective: $0.21 per click Starting to grow in US and other countries Mechanical Turk, Freelancer Twitter Follower Markets Huge problem for existing security systems Little to no automation to detect Turing tests fail Crowdsourcing Sybil Defense 5 Defenders are losing the battle against OSN Sybils Idea: build a crowdsourced Sybil detector Leverage human intelligence Scalable Open Questions How accurate are users? What factors affect detection accuracy? Is crowdsourced Sybil detection cost effective? User Study 6 Two groups of users Also used by spammers Experts – CS professors, masters, and PhD students Turkers – crowdworkers from Mechanical Turk and Zhubajie Three ground-truth datasets of full user profiles Renren – given to us by Renren Inc. Stock Facebook US and India Picture Crawled Legitimate profiles – 2-hops from our profiles Suspicious profiles – stock profile images Banned suspicious profiles = Sybils Testers may skip around and revisit profiles Real or fake? Navigation Buttons Why? Progress Classifying Profiles Browsing Profiles Screenshot of Profile (Links Cannot be Clicked) 7 Experiment Overview Data from Renren Fewer Experts 8 Dataset # of Profiles Sybil Renren Facebook US Facebook India 100 32 50 Test Group # of Testers Profile per Tester Chinese Expert 24 100 Chinese Turker 418 10 US Expert 40 50 US Turker 299 12 India Expert 20 100 India Turker 342 12 Legit. 100 50 49 Crawled Data More Profiles for Experts Individual Tester Accuracy 9 100 Chinese Turker US Turker US Expert Chinese Expert 80 60 Not so good :( CDF (%) • Experts prove that humans can be accurate Awesome! • Turkers need extra help… 40 80% of experts have >90% accuracy! 20 0 0 10 20 30 40 50 60 70 Accuracy per Tester (%) 80 90 100 Accuracy of the Crowd 10 Treat each classification by each tester as a vote Majority makes final decision Almost Zero False Experts False False Positives Dataset Test Group Negatives PerformPositives Okay Chinese Expert 0% Renren Chinese 0% Miss • False positive ratesTurker areTurkers excellent US Expert Lots of Sybils 0% Facebook • Turkers need extra help against false US US Turker 2% 3% 63% 10% India Expert India Turker 16% 50% negatives 19% • What can be done to improve accuracy? Facebook India 0% 0% How Many Classifications Do You Need? 11 100 Error Rate (%) • Only need a 4-5 classifications to converge False Negatives 80 • Few classifications = less cost China India 60 40 False Positives 20 US 0 2 4 6 8 10 12 14 16 18 20 22 24 Classifications per Profile Eliminating Inaccurate Turkers 12 Most workers are >40% accurate False Negative Rate (%) 100 80 China Dramatic India Improvement From 60% to 10% US False Negatives • Only 60 a subset of workers are removed (<50%) • Getting rid of inaccurate turkers is a no-brainer 40 20 0 0 10 20 30 40 50 Turker Accuracy Threshold (%) 60 70 How to turn our results into a system? 13 1. Scalability OSNs 2. with millions of users Performance Improve turker accuracy Reduce costs 3. Privacy Preserve user privacy when giving data to turkers Filter OutArchitecture Inaccurate System 14 Turkers Maximize Usefulness of High Accuracy Turkers Crowdsourcing Layer Rejected! Experts Turker Selection Very Accurate Turkers Accurate Turkers Sybils • Leverage Existing Techniques All Turkers • Help the System Scale • Continuous Quality Control • Locate Malicious Workers Social Network Heuristics User Reports ? Suspicious Profiles Filtering Layer Trace Driven Simulations 15 Classifications 2 Very Accurate Turkers Accurate Turkers Simulate 2000 profiles Error rates drawn from survey data Vary 4 parameters 20-50% Controversial Range Results++ Results • Average 6 8 classifications per profile 90% Threshold • <1% <0.1%false falsepositives positives • <1% <0.1%false falsenegatives negatives Classifications 5 Estimating Cost 16 Estimated cost in a real-world social networks: Tuenti 12,000 profiles to verify daily 14 full-time employees Minimum wage ($8 per hour) $890 per day Crowdsourced Sybil Detection 20sec/profile, 8 hour day 50 turkers Facebook wage ($1 per hour) $400 per day Cost with malicious turkers Estimate that 25% of turkers are malicious 63 turkers $1 per hour $504 per day Takeaways 17 Humans can differentiate between real and fake profiles Crowdsourced Sybil detection is feasible Designed a crowdsourced Sybil detection system False positives and negatives <1% Resistant to infiltration by malicious workers Sensitive to user privacy Low cost Augments existing security systems 18 Questions? Survey Fatigue 19 100 80 80 80 80 60 60 60 60 40 40 40 40 20 20 20 20 0 0 0 0 0 10 20 30 40 Profile Order Time per Profile (s) 100 Accuracy (%) Time per Profile (s) 100 US Turkers Accuracy Time 0 2 4 6 8 10 Profile Order All testers speed up over time matters No fatigue Fatigue 100 Accuracy (%) US Experts Sybil Profile Difficulty Experts perform well on most difficult Sybils 20 Average Accuracy per Sybil (%) 100 90 80 70 60 • Some Sybils are more stealthy 50 • Experts catch more tough Sybils than turkers 40 30 Really difficult profiles 20 10 0 0 5 10 15 20 25 30 Sybil Profiles Ordered By Turker Accuracy 35 Turker Expert Preserving User Privacy 21 Showing profiles to crowdworkers raises privacy issues Solution: reveal profile information in context Crowdsourced Evaluation Friends Crowdsourced Evaluation Friend-Only Public Profile Profile Information Information