Mainack Mondal
Peter Druschel †
† Bimal Viswanath
Krishna Gummadi †
† Allen Clement †
Alan Mislove ‡ Ansley Post
†*
† MPI-SWS ‡ Northeastern University *Now at Google
CoNEXT, December 2012
Lots of personal data on Online Social Networks (OSNs)
CoNEXT, December 2012 2
What is the concern with
of this
?
Aggregators can mine this large data
To infer attributes missing in the data, e.g. sexual orientation
Aggregators can republish this data in easily accessible form
Neither user nor OSN has control over usage of crawled data
Problem for OSN operators
In 2010, 171 M Facebook user’s data
User data is valuable asset to OSN operators published in BitTorrent
OSN operators are blamed for misuse of user data [NYTimes ’10]
OSNs need to limit large-scale aggregation of user data
CoNEXT, December 2012 3
Challenge
We are defending against a crawler who
Wants to crawl as many accounts as possible
Wants to crawl as fast as possible
Our goal is
Limit the rate of crawling
Make the crawlers as slow as possible
CoNEXT, December 2012 4
Existing solution: Simple rate-limiting
OSNs rate-limit on per-account or per IP address basis
Crawlers can defeat rate-limit using multiple accounts
The crawlers can create multiple fake accounts or Sybils
CoNEXT, December 2012 5
Our solution: Genie
Assumption: Social links to good users are harder to get than accounts
Replace user-account-based rate-limiting with link-based rate-limiting
CoNEXT, December 2012 6
Outline
Background and key idea
Genie design
Credit networks
How to use credit networks to defend against crawlers
Using difference between user and crawler activity
Genie evaluation
CoNEXT, December 2012 7
Credit Networks [EC ‘11]
Nodes trust each other by providing pair-wise credit
Credit is used to pay for the services received
1
2
A
4
5
B
CoNEXT, December 2012 8
Credit Networks [EC ‘11]
Nodes trust each other by providing pair-wise credit
Credit is used to pay the services received
2
3
5
6
A
2
3 C
3
4 B
To obtain a service, find path(s) with sufficient credits
CoNEXT, December 2012 9
How can we map OSN to credit networks ?
OSN operator forms credit network from the social network
Operator replenishes credit on each link at a fixed rate
Credit deducted from links to view another user’s profile
A
2
3
2
3
C
5
6
2
3
D
3
4
3
4
B
CoNEXT, December 2012 10
How do credit network defend against crawlers?
Rest of the
Network
(normal users)
Attack cut is small
Attack cut may be larger
Amount of crawling is proportional to attack cut
CoNEXT, December 2012 11 11
Difference between normal users and crawlers
Reciprocity in profile views
Normal users are more reciprocal than crawlers
Repeated profile views
Normal users repeatedly visit the same set of profiles
Locality of views
CoNEXT, December 2012 12
Difference in locality between normal users and crawlers
Renren graph and user browsing trace [IMC ‘10]
33 K users, 96 K activities (2 weeks)
90
80
70
40
30
20
10
0
1-hop 2-hop > 2-hop
Most of the normal views are local
CoNEXT, December 2012 normal user activity crawler activity
Flickr: Mislove et al. [WOSN ‘08]
Orkut: Cha et al. [IMC ‘09]
13
Genie design principles
Use a credit network to rate limit links
Exploit difference between normal and crawler activity to discriminate crawlers
Charge more for views further away
CoNEXT, December 2012 14
Genie design
New charging model: Pay more to view profiles far away
Credit charged per link = Shortest path distance between two nodes -1
3
1
- 2 6
4
- 2 4
2
- 2
2 + 2
4
2 + 2
4
3 + 2
5
A C D B
Rate of crawling decreases with increased path length
CoNEXT, December 2012 15
Outline
Background and key idea
Genie design
Credit networks
How to use credit networks to defend against crawlers
Using difference between user and crawler activity
Genie evaluation
CoNEXT, December 2012 16
Genie evaluation
Does Genie limit attackers while allowing normal users ?
The parameter to tweak: Credit replenishment rate per link
Replenishment rate too high: Crawlers will be allowed
Replenishment rate too low: Users will be heavily penalized
CoNEXT, December 2012 17
Experimental setup
Genie simulator written in C++
Input: social graph and user activity trace
Output: allowed/flagged for each activity
Normal user activity trace from Renren
Generated multiple synthetic traces for other graphs
We model a strong and efficient crawler
Crawler controls compromised user accounts
Each good user profile is crawled once
Crawlers try to crawl as many profiles as possible
CoNEXT, December 2012 18
Does Genie limit crawlers?
% of users crawled per week
Only 2.7% of the network is crawled in 1 week
Credits/week per link
The crawlers are slowed down ~3000 times
CoNEXT, December 2012 19
Does Genie penalize good users?
% of user activity flagged
2.6% of total activities from 0.8 %users flagged
Credit/week per link
CoNEXT, December 2012 20
Does Genie penalize good users?
% of user activity flagged
Trade-off point
Credit/week per link
CoNEXT, December 2012
2
0
10
8
6 % of users crawled
4 per week
21
Who are these flagged users?
3 Users with very high number of random profile views
Shows crawler like behavior
70% of the flagged activity are by these users
Users with normal # of profile views but very few friends
99% of flagged users have less than 5 friends
Adding 4 more friends unflags 97% of these users
CoNEXT, December 2012 22
Efficiency of Genie
In our Genie simulator
To scale up Genie we used Canal library [EuroSys ’12]
Multithreaded implementation
Used a 24-core, 48 GB physical memory machine for evaluation
For a million node social graph
Memory overhead 5 GB
Each view request processed in 0.65 ms on average
CoNEXT, December 2012 23
Summary
We propose rate-limiting links to defend against crawlers
We strengthen our defense using difference between normal user and crawler activities
We evaluated Genie on real world user activity trace
CoNEXT, December 2012 24
CoNEXT, December 2012 25