Fighting Fire With Fire: Crowdsourcing Security Threats and Solutions on the Social Web Gang Wang, Christo Wilson, Manish Mohanlal, Ben Y. Zhao Computer Science Department, UC Santa Barbara. gangw@cs.ucsb.edu A Little Bit About Me 2 3nd Year PhD @ UCSB Intern at MSR Redmond 2011 Intern at LinkedIn (Security Team) 2012 Research Interests: Security and Privacy Online Social Networks Crowdsourcing Data Driven Analysis and Modling Recap: Threats on the Social Web 3 Social spam is a serious problem 10% of wall posts with URLs on Facebook are spam 70% phishing Sybils underlie many attacks on Online Social Networks Spam, spear phishing, malware distribution Sybils blend completely into the social graph Existing countermeasures are ineffective Blacklists only catch 28% of spam Sybil detectors from the literature do not work Sybil Accounts on Facebook 4 In-house estimates Early 2012: 54 million August 2012: 83 million 8.7% of the user base Fake likes VirtualBagel: useless site, 3,000 likes in 1 week 75% from Cairo, age 13-17 • Sybils attacks in large scale • Advertisers are fleeing Facebook Sybil Accounts on Twitter Followers 5 4,000 new followers/day 100,000 new followers in 1 day 92% of Newt Gingritch’s followers are Sybils • Twitter is vital infrastructure Russian political protests on Twitter • Sybils usurping Twitter for political ends 25,000 Sybils sent 440,000 tweets 1 million Sybils controlled overall Talk Outline 6 1. Malicious crowdsourcing sites – crowdturfing [WWW’12] Spam and Sybils generated by real people Huge threat in China Growing threat in the US 2. Crowdsourced Sybil detection [NDSS’13] If attackers can do it, why not defenders? Can humans detect Sybils? User Study Is this cost effective? Design a crowdsourced Sybil detection system Outline 7 Intro Crowdturfing Crowdsourcing Overview What is Crowdturfing How bad is it? Crowdturfing in the US Crowdsourced Sybil Detection Conclusion High Quality Sybils and Spam 8 We tend to think of spam as “low quality” Stock Photographs What about high quality spam and Sybils? Gang Wang Open questions MaxGentleman is the bestest male enhancement system What avalable. http://cid-ce6ec5.space.live.com/ is the scope of this problem? Generated manually or mechanically? What are the economics? Black Market Crowdsourcing 9 Amazon’s Mechanical Turk Admins remove spammy jobs Black market crowdsourcing websites Spam and fake accounts, generated by real people Major force in China, expanding in the US and India Crowdturfing = Crowdsourcing + Astroturfing 10 Crowdturfing Workflow 11 Customers Initiate campaigns May be legitimate businesses Campaign Agents Manage campaign and workers Verify completed tasks Workers Tasks Reports Complete tasks for money Control Sybils on other websites Crowdturfing in China 12 Site Active Since Total $ for $ for Campaigns Workers Reports Workers Site Zhubajie Nov. 2006 76K $2.4M $595K 1000000 100000 1000 Zhubajie $ 100 1000 $ 10 1 10000 Campaigns 100 Sandaha 10 Campaigns Jan. 08 Jan. 09 Jan. 10 Jan. 11 1 Dollars per Month 6.3M Site Growth Over Time 10000 Campaigns per Month 169K Spreading Spam on Weibo 13 100 50% of campaigns reach >100000 users 90 80 70 8% reach >1 million users CDF 60 • Campaigns reach huge audiences • How effective are these campaigns? 50 40 30 20 10 0 100 1000 10000 100000 1000000 10000000 Approximate Audience Size per Campaign How Effective is Crowdturfing? 14 Initiate our own campaigns as a customer Web Display Ads 4 benign ad campaigns promoting real e-commerce sites CPC = $0.01 All clicks route through our measurement server Campaign About Target Vacation Advertise for a discount vacation through a travel agent Weibo QQ Forums Cost $15 Tasks 100 Reports Clicks Cost Per Click 108 28 $0.21 118 187 $0.09 123 3 $0.90 Travel agency reported sales statistics 2 sales/month before our campaign 11 sales within 24 hours after our campaign Each trip sells for $1500! Crowdturfing in America 15 Black Market Legit US Sites % Crowdturfing Mechanical Turk 12% MinuteWorkers 70% MyEasyTasks 83% Microworkers 89% ShortTasks 95% Other studies support these findings Freelancer 28% spam jobs Bulk OSN accounts, likes, spam Connections to botnet operators Poultry $20 Markets for 1000 followers Ponzi scheme Takeaways 16 Identified a new threat: Crowdturfing Growing exponentially in size and revenue in China $1 million per month on just one site Cost effective: $0.21 per click Starting to grow in US and other countries Mechanical Turk, Freelancer Twitter Follower Markets Huge problem for existing security systems Little to no automation to detect Turing tests fail Outline 17 Intro Crowdturfing Crowdsourced Sybil Detection Open Questions User Study Accuracy Analysis System Design Conclusion Crowdsourcing Sybil Defense 18 Defenders are losing the battle against OSN Sybils Idea: build a crowdsourced Sybil detector Leverage human intelligence Scalable Open Questions How accurate are users? What factors affect detection accuracy? Is crowdsourced Sybil detection cost effective? User Study 19 Two groups of users Crowdturfing Site Experts – CS professors, masters, and PhD students Turkers – crowdworkers from Mechanical Turk and Zhubajie Three ground-truth datasets of full user profiles Renren – given to us by Renren Inc. Facebook US and India Stock Picture Crawled Legitimate profiles – 2-hops from our own profiles Suspicious profiles – stock profile images Banned suspicious profiles = Sybils Testers may skip around and revisit profiles Real or fake? Navigation Buttons Why? Progress Classifying Profiles Browsing Profiles Screenshot of Profile (Links Cannot be Clicked) 20 Experiment Overview 21 Data from RenrenFewer Experts Dataset # of Profiles Sybil Legit. Renren 100 100 Facebook US 32 50 Facebook India 50 49 Test Group # of Testers Profile per Tester Chinese Expert Chinese Turker US Expert 24 418 40 100 10 50 US Turker India Expert India Turker 299 20 342 12 100 12 Crawled Data More Profiles per Experts Individual Tester Accuracy 22 100 Chinese Turker US Turker US Expert Chinese Expert 80 60 Not so good :( CDF (%) • Experts prove that humans can be accurate Awesome! • Turkers need extra help… 40 80% of experts have >90% accuracy! 20 0 0 10 20 30 40 50 60 70 Accuracy per Tester (%) 80 90 100 Accuracy of the Crowd 23 Treat each classification by each tester as a vote Majority makes final decision Almost Zero Experts False False False Positives Dataset Test Group Negatives PerformPositives Okay Chinese Expert 0% 3% • Renren False positive rates are excellent Chinese TurkerTurkers Miss 0% 63% • Turkers need extra help against false negatives US Expert Lots of Sybils 0% 10% Facebook • What can beUSdone to improve accuracy? US Turker 2% 19% Facebook India India Expert India Turker 0% 0% 16% 50% Eliminating Inaccurate Turkers 24 Most workers are >40% accurate False Negative Rate (%) 100 80 China Dramatic India Improvement From 60% to 10% US False Negatives • Only 60 a subset of workers are removed (<50%) • Getting rid of inaccurate turkers is a no-brainer 40 20 0 0 10 20 30 40 50 Turker Accuracy Threshold (%) 60 70 How Many Classifications Do You Need? 25 100 Error Rate (%) False Negatives • Only need a 4-5 classifications to converge 80 • Few classifications = less cost China India 60 40 False Positives 20 US 0 2 4 6 8 10 12 14 16 18 20 22 24 Classifications per Profile How to turn our results into a system? 26 1. Scalability 2. Performance 3. OSNs with millions of users Improve turker accuracy Reduce costs Preserve user privacy when giving data to turkers Out SystemFilter Architecture 27 Inaccurate Turkers Maximize Usefulness of High Accuracy Turkers Crowdsourcing Layer Rejected! OSN employee Turker Selection Very Accurate Turkers Accurate Turkers Sybils • Leverage Existing Techniques All Turkers • Help the System Scale • Continuous Quality Control • Locate Malicious Workers Social Network Heuristics User Reports ? Suspicious Profiles Filtering Layer Trace Driven Simulations 28 Classifications 2 Very Accurate Turkers Accurate Turkers Simulate 2000 profiles Error rates drawn from survey data Vary 4 parameters 20-50% Controversial Range Results++ Results • Average 6 8 classifications per profile 90% Threshold • <1% <0.1%false falsepositives positives • <1% <0.1%false falsenegatives negatives Classifications 5 Estimating Cost 29 Estimated cost in a real-world social networks: Tuenti 12,000 profiles to verify daily 14 full-time employees Annual salary 30,000 EUR (~$20 per hour) $2240 per day Crowdsourced Sybil Detection 20sec/profile, 8 hour day 50 turkers Facebook wage ($1 per hour) $400 per day Cost with malicious turkers Estimate that 25% of turkers are malicious 63 turkers $1 per hour $504 per day Takeaways 30 Humans can differentiate between real and fake profiles Crowdsourced Sybil detection is feasible Designed a crowdsourced Sybil detection system False positives and negatives <1% Resistant to infiltration by malicious workers Sensitive to user privacy Low cost Augments existing security systems Outline 31 Intro Crowdturfing Crowdsourced Sybil Detection Conclusion Summary of My Work Future Work Key Contributions 32 1. Identified novel threat: crowdturfing End-to-end spam measurements from customers to the web Insider knowledge of social spam 2. Novel defense: crowdsourced Sybil detection User study proves feasibility of this approach Build an accurate, scalable system Possible deployment in real OSNs – LinkedIn and RenRen Ongoing Works 33 1. Twitter follower markets 2. Locate customers who purchase bulk of Twitter followers Study the un-follow dynamics of customers Develop systems to detect customers in the wild Sybil detection using server-side click streams Build click models based on clickstream logs Extract click patterns of Sybil and normal users Develop systems to detect Sybil 34 Questions? Thank you! Potential Project Ideas 35 Malware distribution in cellular networks Identify malware related cellular network traffic Coordinated malware distribution campaigns Feature based detection Advertising traffic analysis on mobile Apps Characterize ads traffic How effective for app-displayed ads to get click-through? Are there malware delivered through ads? Preserving User Privacy 36 Showing profiles to crowdworkers raises privacy issues Solution: reveal profile information in context Crowdsourced Evaluation Friends Crowdsourced Evaluation Friend-Only Public Profile Profile Information Information Clickstream Sybil Detection 37 96% Initial 20% 64% 31% Share 9% 1. 2% 55% 27% Browse Profiles 68% Sybil Clickstream 87% Absolute ofMessage clicks Friend number 5% Invite 2. Time between clicks 14% 4% 3. Page 9% traversal order Friend Invite 15% Clickstream detection of 86%Sybils Final Initial 3% 22% Challenges 43% Photo 93% 10% Real-time 9% 14% Massive scalability Share Browse 21% Low-overhead Profiles 56% 56% Normal Clickstream 5% Final 29% Are Workers Real People? 38 Late Night/Early Morning Work Day/Evening 9 % of Reports from Workers 8 7 6 5 Lunch 4 Dinner 3 ZBJ Zhubajie 2 Sandaha SDH 1 0 0 5 10 15 Hours in the Day 20 Crowdsourced Sybil Detection 39 How to detect crowdturfed Sybils? Blur the line between real and fake Difficult to detect algorithmically Anecdotal evidence that people can spot Sybils 75% of friend requests from Sybils are rejected Can people distinguish in real/fake general? User studies: experts, turkers, undergrads What features give Sybils away? Are certain Sybils tougher than others? Integration of human and machine intelligence Survey Fatigue 40 100 80 80 80 80 60 60 60 60 40 40 40 40 20 20 20 20 0 0 0 0 0 10 20 30 40 Profile Order Time per Profile (s) 100 Accuracy (%) Time per Profile (s) 100 US Turkers Accuracy Time 0 2 4 6 8 10 Profile Order All testers speed up over time matters No fatigue Fatigue 100 Accuracy (%) US Experts Sybil Profile Difficulty Experts perform well on most difficult Sybils 41 Average Accuracy per Sybil (%) 100 90 80 70 60 Turker • Some Sybils are more stealthy 50 Expert 40 • Experts catch more tough Sybils than turkers 30 Really difficult profiles 20 10 0 0 5 10 15 20 25 30 Sybil Profiles Ordered By Turker Accuracy 35