Robust overlays for privacy-preserving data dissemination over a social graph Abhishek Singh

advertisement
Robust overlays for privacy-preserving data
dissemination over a social graph
Abhishek Singh1 , Guido Urdaneta1 , Maarten van Steen2
& Roman Vitenberg1
2
1
University of Oslo, Norway
VU University Amsterdam, The Netherlands
20/06/2012
1 / 33
Motivation
Data exchange on sensitive topics;
e.g., politics or chronic illness
Data exchange
mechanism
Problem: Ensure privacy-preserving and robust interest-based data
exchange within a social community
2 / 33
Motivation
Data exchange on sensitive topics;
e.g., politics or chronic illness
Data exchange
mechanism
User’s interest or data
can be leaked
Problem: Ensure privacy-preserving and robust interest-based data
exchange within a social community
2 / 33
Motivation
Data exchange on sensitive topics;
e.g., politics or chronic illness
Data exchange
mechanism
Attacker can exploit this
knowledge to make profit
or harm users
Problem: Ensure privacy-preserving and robust interest-based data
exchange within a social community
2 / 33
Motivation
Data exchange on sensitive topics;
e.g., politics or chronic illness
Data exchange
mechanism
Privacy leakages have been
observed in centralized
architectures
Problem: Ensure privacy-preserving and robust interest-based data
exchange within a social community
2 / 33
Motivation
Data exchange on sensitive topics;
e.g., politics or chronic illness
Data exchange
mechanism
While decentralized architectures
have to address both privacy
and robustness
Problem: Ensure privacy-preserving and robust interest-based data
exchange within a social community
2 / 33
Our contribution
Robust communication overlay based on social relationships
I
Robust
I
I
Short path lengths
Low probability of partitions
I
Preserves privacy
I
Decentralized
I
Fast repair under churn
I
Low overhead
3 / 33
Basic idea: exploit trust relations between users
Friend-to-friend network
Edges represent
mutual trust
Trust graph
A connected friend-to-friend network formed by social interactions
Example: Facebook graph
4 / 33
Basic idea: exploit trust relations between users
Trust: mutual agreement between nodes to not disclose
each other’s identity and interests to unauthorized
enities
Edges represent
mutual trust
Trust graph
A connected friend-to-friend network formed by social interactions
Example: Facebook graph
4 / 33
Basic idea: exploit trust relations between users
Trust between nodes is symmetric and non-transitive
C
B
A and C do not
trust each other
A
B and C trust
each other
A and B trust
each other
Trust graph
A connected friend-to-friend network formed by social interactions
Example: Facebook graph
4 / 33
Basic idea: exploit trust relations between users
Nodes disclose their interests to their neighbors
out of band
Trust graph
A connected friend-to-friend network formed by social interactions
Example: Facebook graph
4 / 33
Naive overlay solutions that mimic trust graph are non robust
Example: Freenet in darknet mode [Clarke et al. 2010]
Fraction of disconnected nodes
Each node exchanges messages with only those nodes that it trusts
1
FB graph (1000 nodes)
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
Node Availability (fraction of time alive)
Insight: Social networks exhibit ‘small-world’ properties
Social networks tend to get partitioned when a small number of
high-degree nodes are removed [Mislove et al. IMC 2007]
5 / 33
Problem statement
For a group of users, provide a scalable approach to create and
maintain robust overlay for data-dissemination while preserving
following privacy requirements,
Entities
Comm. providers
External users
Participating
Users
Information
Content of
messages
Participation of
an user
Only trusted
neighbors of the
user are aware
Trust relations of
a participating
user
Trusted
neighbors of the
user are only
aware of their
relation with him
: entities
should not be
able to access
the required
information
: entities
must be able to
access the
required
information
Assumption: Participating users do not actively disrupt communication
6 / 33
Underlying idea for the solution
Extend the overlay based on trust graph towards a random-like graph
by adding edges in a privacy-preserving manner
Edges in
trust graph
Example
I
Fan-out of each node in the overlay is increased to 4
I
Extra links are randomly selected
7 / 33
Underlying idea for the solution
Extend the overlay based on trust graph towards a random-like graph
by adding edges in a privacy-preserving manner
Participating online nodes get
partitioned
Offline
nodes
Example
I
Fan-out of each node in the overlay is increased to 4
I
Extra links are randomly selected
7 / 33
Underlying idea for the solution
Extend the overlay based on trust graph towards a random-like graph
by adding edges in a privacy-preserving manner
Privacy
preserving
links
Example
I
Fan-out of each node in the overlay is increased to 4
I
Extra links are randomly selected
7 / 33
Underlying idea for the solution
Extend the overlay based on trust graph towards a random-like graph
by adding edges in a privacy-preserving manner
Privacy
preserving
links
Offline
nodes
Example
I
Fan-out of each node in the overlay is increased to 4
I
Extra links are randomly selected
7 / 33
Solution architecture
Application layer
Application-specific
data-dissemination
protocols
Overlay layer
Pseudonym
creation
and removal
Pseudonym
sampling
Pseudonym
distribution
Privacy-preserving link layer
Anonymity
service
Pseudonym
service
8 / 33
Solution architecture
Application layer
Application-specific
data-dissemination
protocols
Overlay layer
Pseudonym
creation
and removal
Pseudonym
sampling
Pseudonym
distribution
Our main
contribution
Privacy-preserving link layer
Anonymity
service
Pseudonym
service
8 / 33
Privacy-preserving link layer
Application layer
Application-specific
data-dissemination
protocols
Overlay layer
Pseudonym
creation
and removal
Pseudonym
sampling
Pseudonym
distribution
Privacy-preserving link layer
Anonymity
service
Pseudonym
service
9 / 33
Anonymity service
Application layer
Application-specific
data-dissemination
protocols
Overlay layer
Pseudonym
creation
and removal
Pseudonym
sampling
Pseudonym
distribution
Privacy-preserving link layer
Anonymity
service
Pseudonym
service
10 / 33
Anonymity service
Allows two nodes to hide communication between them from
adversaries who are monitoring traffic
Attacker monitors
communication
links
A
B
msg1
Anonymity
Service
msg2
Examples of anonymity services: TOR [Dingledine et al. SSYM 2004],
I2P, Email remailers [Danezis et al. 2003]
Requirement: node A must know the identity of node B
Implementations have latency versus security tradeoff
I Possible to decentralize [Schiavoni et al. ICDCS 2011]
I
I
10 / 33
Anonymity service
Allows two nodes to hide communication between them from
adversaries who are monitoring traffic
Attacker monitors
communication
links
Attacker is unable to
infer that A sent a
message to B
A
B
msg1
Anonymity
Service
msg2
Examples of anonymity services: TOR [Dingledine et al. SSYM 2004],
I2P, Email remailers [Danezis et al. 2003]
Requirement: node A must know the identity of node B
Implementations have latency versus security tradeoff
I Possible to decentralize [Schiavoni et al. ICDCS 2011]
I
I
10 / 33
Anonymity service
Allows two nodes to hide communication between them from
adversaries who are monitoring traffic
Attacker monitors
communication
links
Attacker is unable to
infer that A sent a
message to B
A
B
msg1
Anonymity
Service
msg2
Examples of anonymity services: TOR [Dingledine et al. SSYM 2004],
I2P, Email remailers [Danezis et al. 2003]
Requirement: node A must know the identity of node B
Implementations have latency versus security tradeoff
I Possible to decentralize [Schiavoni et al. ICDCS 2011]
I
I
10 / 33
Pseudonyms and pseudonym service
Application layer
Application-specific
data-dissemination
protocols
Overlay layer
Pseudonym
creation
and removal
Pseudonym
sampling
Pseudonym
distribution
Privacy-preserving link layer
Anonymity
service
Pseudonym
service
11 / 33
Pseudonyms and pseudonym service
Allows a node A to establish a short-lived identity PA (pseudonym)
such that no one else is aware of the identity of the node that created
PA
A
- Pseudonym PA is created at node P
- Node A is the owner of pseudonym PA
- Node P is not aware of the identity of node A
P
Pseudonym Service
Examples of pseudonym implementations: TOR rendezvous point, I2P
eepsite, Email address, DHT key
11 / 33
Pseudonyms and pseudonym service
Allows a node A to establish a short-lived identity PA (pseudonym)
such that no one else is aware of the identity of the node that created
PA
A
B
Node B can send a message to A
through PA
P
msg
msg
msg
msg
Pseudonym Service
Examples of pseudonym implementations: TOR rendezvous point, I2P
eepsite, Email address, DHT key
11 / 33
Need for limited pseudonym lifetime
A
Node P is aware that it acts as a proxy
for some unkown user
P
Pseudonym Service
Privacy versus robustness tradeoff:
I
For privacy, pseudonyms should have short duration
I
For robustness of the overlay, pseudonyms should last longer to
minimize churn handling
12 / 33
Need for limited pseudonym lifetime
A
Probability of successful traffic analysis is
higher if node A uses P as proxy for
long duration
P
P is malicious
Pseudonym Service
Privacy versus robustness tradeoff:
I
For privacy, pseudonyms should have short duration
I
For robustness of the overlay, pseudonyms should last longer to
minimize churn handling
12 / 33
Need for limited pseudonym lifetime
A
Thus, pseudonyms should last a limited time
P
P is malicious
Pseudonym Service
Privacy versus robustness tradeoff:
I
For privacy, pseudonyms should have short duration
I
For robustness of the overlay, pseudonyms should last longer to
minimize churn handling
12 / 33
Need for limited pseudonym lifetime
A
Thus, pseudonyms should last a limited time
P
P is malicious
Pseudonym Service
Privacy versus robustness tradeoff:
I
For privacy, pseudonyms should have short duration
I
For robustness of the overlay, pseudonyms should last longer to
minimize churn handling
12 / 33
Overlay layer
Application layer
Application-specific
data-dissemination
protocols
Overlay layer
Pseudonym
creation
and removal
Pseudonym
sampling
Pseudonym
distribution
Privacy-preserving link layer
Anonymity
service
Pseudonym
service
13 / 33
Pseudonym creation and removal
Application layer
Application-specific
data-dissemination
protocols
Overlay layer
Pseudonym
creation
and removal
Pseudonym
sampling
Pseudonym
distribution
Privacy-preserving link layer
Anonymity
service
Pseudonym
service
14 / 33
Pseudonym creation and removal
Node A
Own pseudonym
Pseudonym cache
(PC)
Overlay links
Generated pseudonym is composed of {PA , Tlife , PK (PA )}
I
PA : pseudonym identity
I
Tlife : lifetime of the pseudonym
I
PK (PA ): public key used to encrypt messages sent to PA
14 / 33
Pseudonym creation and removal
Node A
Own pseudonym
Pseudonym cache
(PC)
Pseudonym-creation
module generates a
new pseudonym
Overlay links
Generated pseudonym is composed of {PA , Tlife , PK (PA )}
I
PA : pseudonym identity
I
Tlife : lifetime of the pseudonym
I
PK (PA ): public key used to encrypt messages sent to PA
14 / 33
Pseudonym creation and removal
Node A
Own pseudonym
Pseudonym cache
(PC)
Pseudonym-removal
module purges
expired pseudonyms
Overlay links
When a pseudonym PR expires, it is purged from data structures of all
nodes
When the pseudonym PA expires, node A generates a new
pseudonym
14 / 33
Pseudonym distribution
Application layer
Application-specific
data-dissemination
protocols
Overlay layer
Pseudonym
creation
and removal
Pseudonym
sampling
Pseudonym
distribution
Privacy-preserving link layer
Anonymity
service
Pseudonym
service
15 / 33
Pseudonym distribution
Node A
Edge in the overlay
Node B
Own pseudonym
Own pseudonym
Pseudonym cache
(PC)
Pseudonym cache
(PC)
Overlay links
Overlay links
A modified version of the shuffle protocol in [Staurov et al. ICNP 2002]
is used
At each shuffle period, nodes exchange data with a random neighbor
in the overlay
15 / 33
Pseudonym distribution
Node A
Edge in the overlay
Node B
1. Shuffle request msg
Own pseudonym
Pseudonym cache
(PC)
Overlay links
Own pseudonym
Message contains a random
sample of node A’s
own pseudonym and its PC
Pseudonym cache
(PC)
Overlay links
SampleA is included in the message sent from node A to node B
SampleA contains a random sample of {{PA } ∪ PCA }
15 / 33
Pseudonym distribution
Node A
Edge in the overlay
Node B
1. Shuffle request msg
Own pseudonym
Own pseudonym
2. Shuffle response msg
Pseudonym cache
(PC)
Overlay links
Message contains a random
sample of node B’s
own pseudonym and its PC
Pseudonym cache
(PC)
Overlay links
SampleB is included in the message sent from node B to node A
SampleB contains a random sample of {{PB } ∪ PCB }
15 / 33
Pseudonym distribution
Edge in the overlay
Node A
Node B
1. Shuffle request msg
Own pseudonym
Own pseudonym
2. Shuffle response msg
Pseudonym cache
(PC)
Pseudonym cache
(PC)
PC of both nodes are updated
Overlay links
Overlay links
Each node updates its pseudonym cache after each shuffle exchange
PCA ← random sample of {PCA ∪ SampleB }
PCB ← random sample of {PCB ∪ SampleA }
15 / 33
Pseudonym distribution
Node A
Edge in the overlay
Node B
1. Shuffle request msg
Own pseudonym
Own pseudonym
2. Shuffle response msg
Pseudonym cache
(PC)
Overlay links
At both nodes, pseudonym
sampling module is invoked
that updates their overlay links
Pseudonym cache
(PC)
Overlay links
15 / 33
Pseudonym sampling
Application layer
Application-specific
data-dissemination
protocols
Overlay layer
Pseudonym
creation
and removal
Pseudonym
sampling
Pseudonym
distribution
Privacy-preserving link layer
Anonymity
service
Pseudonym
service
16 / 33
Pseudonym sampling
Node A
Own pseudonym
Pseudonym cache
(PC)
Uses pseudonym cache
Overlay links
Updates overlay links
I
Edges in the trust graph are retained in the overlay
I
Fan-out of nodes is extended to decrease the impact of skewed
degree distribution in the trust graph
16 / 33
Pseudonym sampling
Pseudonym sampling ensures that pseudonym links
for a node are selected uniformly at random
Privacy-preserving
overlay links
Trust graph
Pseudonyms are sampled using a protocol similar to Brahms
[Bortnikov et al. CN 2009]
16 / 33
Privacy preservation analysis
Privacy requirements
Entities
Comm. providers
External users
Participating
Users
Information
Content of
messages
Participation of
an user
Only trusted
neighbors of the
user are aware
Trust relations of
a participating
user
Trusted
neighbors of the
user are only
aware of their
relation with him
: entities
should not be
able to access
the required
information
: entities
must be able to
access the
required
information
17 / 33
Non-disclosure of content messages
Attack by external observers
I
Learn content of messages being exchanged
Defense
I
Messages from a node A to its trusted neighbor B
I Are encrypted by B’s PK (B )
I
Messages from node A to a pseudonym PB
I Are encrypted by PB ’s PK (PB )
18 / 33
Non-disclosure of participating nodes
Attack by internal or external observers
I
Traffic analysis attack to identify participating nodes
Defense
I
We rely on anonymity and pseudonym services
I
Distributing pseudonyms does not require low-latency guarantees
I
More secure anonymity services, such as email remailers, can be
used
19 / 33
More privacy preservation analysis
Non-disclosure of the edges in the trust graph
I
Timing analysis attack by a set of colluding internal observers
I
Low probability of a successful attack
I
Successful timing analysis attack is easier
I
I
If internal observers form a vertex cut in the trust graph, and
If internal observers deviate from the protocol
Non-disclosure of participating nodes
I
A participating node sends anomalous message or anomalously
timed messages
I
To counter the attack, we can use high-latency anonymity
services
20 / 33
More privacy preservation analysis
Non-disclosure of the edges in the trust graph
I
Timing analysis attack by a set of colluding internal observers
I
Low probability of a successful attack
I
Successful timing analysis attack is easier
I
I
If internal observers form a vertex cut in the trust graph, and
If internal observers deviate from the protocol
Non-disclosure of participating nodes
I
A participating node sends anomalous message or anomalously
timed messages
I
To counter the attack, we can use high-latency anonymity
services
20 / 33
Evaluation methodology
Goal
Evaluate if we can efficiently produce a robust overlay in a
privacy-preserving manner
Methodology
I
Friend-to-friend network sampled from Facebook social graph
I
Evaluate robustness of overlay under different node availability
I
Baselines for comparison
I
I
Trust graph (sampled Facebook social graph)
Erdös-Rényi random graph with an average out-degree similar to
that of our solution
21 / 33
Evaluation methodology
Goal
Evaluate if we can efficiently produce a robust overlay in a
privacy-preserving manner
Methodology
I
Friend-to-friend network sampled from Facebook social graph
I
Evaluate robustness of overlay under different node availability
I
Baselines for comparison
I
I
Trust graph (sampled Facebook social graph)
Erdös-Rényi random graph with an average out-degree similar to
that of our solution
21 / 33
System parameters
Churn settings
I
We use the churn model proposed in [Yao et al. ICNP 2006]
I
We model both online and offline times by exponential distribution
I
Average node availability, α = Ton /(Ton + Toff )
Parameters
Parameter
Number of nodes in trust graph
Toff , mean offline time (in shuffle periods)
α, average node availability
Ton , mean online time (in shuffle periods)
r , ratio of pseudonym lifetime and Toff
Target number of overlay links per node
Value
1000
30
Varies
Derived from α
3
50
Assumption: anonymity and pseudonym services are always available
22 / 33
System parameters
Churn settings
I
We use the churn model proposed in [Yao et al. ICNP 2006]
I
We model both online and offline times by exponential distribution
I
Average node availability, α = Ton /(Ton + Toff )
Parameters
Parameter
Number of nodes in trust graph
Toff , mean offline time (in shuffle periods)
α, average node availability
Ton , mean online time (in shuffle periods)
r , ratio of pseudonym lifetime and Toff
Target number of overlay links per node
Value
1000
30
Varies
Derived from α
3
50
Assumption: anonymity and pseudonym services are always available
22 / 33
System parameters
Churn settings
I
We use the churn model proposed in [Yao et al. ICNP 2006]
I
We model both online and offline times by exponential distribution
I
Average node availability, α = Ton /(Ton + Toff )
Parameters
Parameter
Number of nodes in trust graph
Toff , mean offline time (in shuffle periods)
α, average node availability
Ton , mean online time (in shuffle periods)
r , ratio of pseudonym lifetime and Toff
Target number of overlay links per node
Value
1000
30
Varies
Derived from α
3
50
Assumption: anonymity and pseudonym services are always available
22 / 33
Fraction of disconnected nodes
Robust overlay
1
0.8
Trust graph
Overlay
Random graph
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
1
Node Availability (fraction of time alive)
Fraction of disconnected nodes in the overlay is closer to the reference
random graph than to the trust graph
23 / 33
Normalized avg. path length
Short path lengths
1000
Trust graph
Overlay
Random graph
100
10
1
0
0.2
0.4
0.6
0.8
1
Node Availability (fraction of time alive)
The overlay has path lengths similar to the reference random graph
24 / 33
Fast convergence
Fraction of disconnected nodes
The overlay bootstraps from trust graph and it is extended towards a
more robust overlay
0.8
0.6
0.4
Trust graph
Overlay r = 3
Overlay r = 9
0.2
0
0
200
400
600
800
Time (shuffle periods)
1000
Node availability: 25%
The overlay converges in approx. 200 shuffle periods
25 / 33
More experimental results
Degree distribution of nodes in the overlay
I
Different than the reference random graph
I
Biased by the degree of a node in the trust graph
Low message overhead
I
Average messages sent per shuffle period is 2
I
Nodes with higher degree in trust graph sent more messages
I
However, this skewness is not large
Decreasing the lifetime of pseudonym
I
Decreases the robustness of the overlay
I
Increases the number of links to be replaced in the overlay
26 / 33
More experimental results
Degree distribution of nodes in the overlay
I
Different than the reference random graph
I
Biased by the degree of a node in the trust graph
Low message overhead
I
Average messages sent per shuffle period is 2
I
Nodes with higher degree in trust graph sent more messages
I
However, this skewness is not large
Decreasing the lifetime of pseudonym
I
Decreases the robustness of the overlay
I
Increases the number of links to be replaced in the overlay
26 / 33
More experimental results
Degree distribution of nodes in the overlay
I
Different than the reference random graph
I
Biased by the degree of a node in the trust graph
Low message overhead
I
Average messages sent per shuffle period is 2
I
Nodes with higher degree in trust graph sent more messages
I
However, this skewness is not large
Decreasing the lifetime of pseudonym
I
Decreases the robustness of the overlay
I
Increases the number of links to be replaced in the overlay
26 / 33
Conclusion
Problem
I
Provide a scalable approach to construct and maintain robust
overlays for a group of users while preserving their privacy
Our Approach
I
Extends the overlay from the underlying friend-to-friend network
towards a regular random graph network
I
Increases robustness
I
Preserves privacy
I
Decentralized solution
I
Low maintenance and fast convergence
Questions?
27 / 33
Conclusion
Problem
I
Provide a scalable approach to construct and maintain robust
overlays for a group of users while preserving their privacy
Our Approach
I
Extends the overlay from the underlying friend-to-friend network
towards a regular random graph network
I
Increases robustness
I
Preserves privacy
I
Decentralized solution
I
Low maintenance and fast convergence
Questions?
27 / 33
State of the art
Centralized approach
I
Example: Twitter
I
Single point of failure w.r.t availability as well as privacy
Use social relationships as bootstrap mechanism
I
Example: MCON [Vasserman et al. CCS 2009]
I
Overlay need not correspond to social relationships
I
Relies on trusted third party
Use social relationships as communication overlay
I
Example: Freenet in darknet mode [Clarke et al. 2010]
I
Resulting overlay may not be robust under node churn
28 / 33
Non-disclosure of the edges of the trust graph
Attack: predict if A and B have a trust
relationship between them?
B
A
Trust?
Attack by an internal observer
29 / 33
Non-disclosure of the edges of the trust graph
Methodology: attacker uses timing analysis to detect
presence of an overlay link between A and B
B
A
P
Attack by an internal observer
29 / 33
Non-disclosure of the edges of the trust graph
If A and B have more neighbors in the overlay, then
detecting presence of an overlay link between them
is more difficult
B
A
P
Attack by an internal observer
29 / 33
Non-disclosure of the edges of the trust graph
Also, the overlay link between A and B can be result
of pseudonym distribution over a path that does not
involve the attacker
B
A
Attack by an internal observer
29 / 33
Non-disclosure of the edges of the trust graph
Thus, an attacker can not be sure if the overlay link
between A and B is also a trust relationship
B
A
comm. link == trust ?
P
Attack by an internal observer
29 / 33
Degree distribution at 50% availability
Trust graph
Overlay
Random graph
Number of nodes
100
10
1
0
20
40
60 80 100 120 140
Degree
Nodes in the overlay have more neighbors than in the trust graph
Degree distribution of nodes in overlay is not similar to regular random
graph as it is biased by the degree of nodes in the trust graph
30 / 33
Message overhead
Max. out-degree in overlay
Degree in trust graph
Messages
8
100
4
10
2
1
1
10
100
Avg. messages sent
per shuffle period
(Out-) degree
1000
1
1000
Rank
Average number of messages sent per shuffle period by each node at
50% availability
However, nodes with higher degree in trust graph send more message
Maximum number of messages being sent is approximately twice the
total average (= 2)
31 / 33
Fraction of disconnected nodes
Effect of lifetime of pseudonym
1
Trust graph
r=1
r=3
r=9
r = Infinite
Random graph
0.8
0.6
0.4
0.2
0
0
0.2
0.4
0.6
0.8
Availability (fraction of time alive)
1
Lower pseudonym lifetime decreases the robustness of the overlay
32 / 33
Number of links replaced
per node per shuffle period
Number of links replaced over time
r=3
r=9
r = Infinite
16
14
12
10
8
6
4
2
0
0
2000 4000 6000 8000
Time (shuffle periods)
10000
Number of links replaced per node per shuffle period at 25%
availability
Lower pseudonym lifetime increases the number of links replaced in
the overlay
33 / 33
Download