Office 2010

advertisement
Messaging anonymity & the traffic analysis of hardened systems
FOSAD 2010 – Bertinoro, Italy

Networking
 Relation between identity and efficient routing
 Identifiers: MAC, IP, email, screen name
 No network privacy = no privacy!

The identification spectrum today
Full
Anonymity
Pseudonymity
“The Mess”
we are in!
Strong
Identification
NO ANONYMITY

Weak identifiers
everywhere:
NO IDENTIFICATION





 IP, MAC
 Logging at all levels
 Login names / authentication
 PK certificates in clear


 Application data leakage
Expensive / unreliable logs.
IP / MAC address changes
Open wifi access points
Botnets
Partial solution
 Authentication
Also:
 Location data leaked
Weak identifiers easy to
modulate

Open issues:
 DoS and network level attacks
Anthony F. J. Levi - http://www.usc.edu/dept/engineering/eleceng/Adv_Network_Tech/Html/datacom/
No integrity or
authenticity
MAC Address
3.1.
Internet Header Format
A summary of the contents of the internet header follows:
Link different
packets together
0
1
2
3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Version| IHL |Type of Service|
Total Length
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Identification
|Flags|
Fragment Offset
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Time to Live |
Protocol
|
Header Checksum
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Source Address
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Destination Address
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|
Options
|
Padding
|
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Example Internet Datagram Header
Weak identifiers
Same for TCP, SMTP, IRC, HTTP, ...
Figure 4.
No integrity / authenticity

Objective: foundations of anonymity & traffic analysis

Why anonymity?

Lecture 1 – Unconditional anonymity
 DC-nets in detail, Broadcast & their discontents

Lecture 2 – Black box, long term attacks
 Statistical disclosure & its Bayesian formulation

Lecture 3 – Mix networks in detail
 Sphinx format, mix-networks

Lecture 4 – Bayesian traffic analysis of mix networks

Specialized applications






Electronic voting
Auctions / bidding / stock market
Incident reporting
Witness protection / whistle blowing
Showing anonymous credentials!
General applications





Freedom of speech
Profiling / price discrimination
Spam avoidance
Investigation / market research
Censorship resistance

Sender anonymity
 Alice sends a message to Bob. Bob cannot know
who Alice is.

Receiver anonymity
 Alice can send a message to Bob, but cannot find
out who Bob is.

Bi-directional anonymity
 Alice and Bob can talk to each other, but neither
of them know the identity of the other.

3rd party anonymity
 Alice and Bob converse and know each other, but
no third party can find this out.

Unobservability
 Alice and Bob take part in some communication,
but no one can tell if they are transmitting or
receiving messages.

Unlinkability
 Two messages sent (received) by Alice (Bob)
cannot be linked to the same sender (receiver).

Pseudonymity
 All actions are linkable to a pseudonym, which is
unlinkable to a principal (Alice)
Point 1: Do not re-invent this
E(Junk)
Simple receiver
anonymity
E(Message)
E(Junk)
Point 2: Many ways to do
broadcast
- Ring
- Trees
It has all been done (Buses)
Point 3: Is your anonymity
system better than this?
Point 4: What are the
problems here?
E(Junk)
E(Junk)
Coordination
Sender anonymity
Latency
Bandwidth

DC-nets
 Dining Cryptographers (David Chaum 1985)

Multi-party computation resulting in a message
being broadcast anonymously

2 twists
 How to implement DC-nets through broadcast trees
 How to achieve reliability?

Communication cost...
“Three cryptographers are sitting down to
dinner at their favourite three-star restaurant.
 Their waiter informs them that arrangements
have been made with the maitre d'hotel for the
bill to be paid anonymously.
 One of the cryptographers might be paying for
the dinner, or it might have been NSA (U.S.
National Security Agency).
 The three cryptographers respect each other's
right to make an anonymous payment, but they
wonder if NSA is paying.”

I didn’t
Ron
I paid
Adi
Did the
NSA pay?
I didn’t
Wit
Toss coin
car
I didn’t
ma = 0
br = mr + car + crw
Ron
ba = ma + car + caw
Toss coin
crw
Adi
Combine:
Toss coin
B = ba + br + bw =
caw
ma + mr +mw = mr (mod 2)
bw = mw + crw + caw
I paid
mr = 1
Wit
I didn’t
mw = 0

Generalise
 Many participants
 Larger message size
▪ Conceptually many coins in parallel (xor)
▪ Or: use +/- (mod 2|m|)
 Arbitrary key (coin) sharing
▪ Graph G:
▪ nodes - participants,
▪ edges - keys shared

What security?
Toss m coins
car
ma = 0
br = mr + car - crw (mod 2m)
Ron
ba = ma - car + caw
(mod 2m)
I want to send
mr = message
Toss m coins
crw
Adi
Combine:
Toss m coins
B = ba + br + bw =
caw
ma + mr +mw = mr (mod 2m)
mw = 0
bw = mw + crw - caw (mod 2m)
Wit

Derive coins
 cabi = H[Kab, i]
for round i
 Stream cipher
(Kab)
C

B
Shared key Kab
A
Alice broadcasts
 ba = cab + cac + ma

If B and C corrupt

Alice broadcasts
 ba = cab + cac + ma
C

Adversary’s view
 ba = cab + cac + ma
B
Shared key Kab

A
No Anonymity

Adversary nodes partition the
graph into a blue and green
sub-graph

Calculate:

Bblue = ∑bj, j is blue
 Bgreen = ∑bi, i is green
C
B
Anonymity set size = 4
(not 11 or 8!)
A

Substract known keys
 Bblue + Kred-blue = ∑mj
 Bgreen + K’red-green = ∑mi

Discover the originating
subgraph.

Reduction in anonymity

bi broadcast graph
 Tree – independent of key sharing graph
 = Key sharing graph –
No DoS unless split in graph

Communication in 2 phases:
 1) Key sharing (off-line)
 2) Round sync & broadcast
Combine:
B = bi = mr (mod 2m)
Aggregator

bi broadcast graph
 Tree – independent of key sharing graph
 = Key sharing graph –
No DoS unless split in graph

Communication in 2 phases:
 1) Key sharing (off-line)
 2) Round sync & broadcast
(peer-to-peer?)
 Ring?
 Tree?
Toss m coins
car
ma = 0
br = mr + car - crw (mod 2m)
Ron
ba = ma - car + caw
(mod 2m)
Toss m coins
crw
Adi
Combine:
Toss m coins
B = ba + br + bw =
caw
ma + mr +mw = collision (mod 2m)
bw = mw + crw - caw (mod 2m)
I want to send
mr = message1
Wit
I want to send
mw = message2

Ethernet:
 detect collisions (redundancy)
 Exponential back-off, with random re-transmission

Collisions do not destroy all information
 B = ba + br + bw = ma + mr +mw =
= collision (mod m)
= message 1 + message2 (mod m)
 N collisions can be decoded in N transmissions
Trick 1: run 2 DC-net rounds in parallel
(1, m1)
Trick 2: retransmit if greater than average
Round 1
(5, m1+m2+m4+m5+m6)
(1, m2)
(0, 0)
Round 2
(3, m1+m2+m4)
(2, m5+m6)
(1, m4)
(1, m5)
(1, m6)
(0, 0)
Round 3
(1, m1)
Round 4
(1, m2)
(2, m2+m4)
Round 5
(1, m5)
(1, m4)
(1, m6)

Reliability  DoS resistance!
 How to protect a DC-network against disruption?

Solved problem?
 DC in a Disco – “trap & trace”
 Modern crypto: commitments, secret sharing, and
banning

Note the homomorphism: collisions = addition
 Can tally votes / histograms through DC-nets

Security is great!
 Full key sharing graph  perfect anonymity

Communication cost – BAD
 (N broadcasts for each message!)
 Naive: O(N2) cost, O(1) Latency
 Not so naive: O(N) messages, O(N) latency
▪ Ring structure for broadcast
 Expander graph: O(N) messages, O(logN) latency?
 Centralized: O(N) messages, O(1) latency

Not practical for large(r) N! 
 Local wireless communications?

Perfect Anonymity
“In the long run we are all dead” – Keynes

DC-nets – setting
 Single message from Alice to Bob, or between a set
group

Real communications
 Alice has a few friends that she messages often
 Interactive stream between Alice and Bob (TCP)
 Emails from Alice to Bob (SMTP)

Alice is not always on-line (network churn)

Repetition – patterns -> Attacks

Even perfect anonymity systems leak
information when participants change
 Remember: DC-nets do not scale well

Setting:
 N senders / receivers – Alice is one of them
 Alice messages a small number of friends:
▪ RA in {Bob, Charlie, Debbie}
▪ Through a MIX / DC-net
▪ Perfect anonymity of size K
 Can we infer Alice’s friends?
rA in RA= {Bob, Charlie, Debbie}
Alice
K-1 Senders
out of N-1
others
Anonymity
System
K-1 Receivers
out of N
others
(Model as random receivers)

Alice sends a single message to one of her friends

Anonymity set size = K
Entropy metric EA = log K

Perfect!
Alice
T1
Others
rA1
Anonymity
System
Alice
T2
Others
Others
Anonymity
System
Tt
Others
Observe many rounds
in which Alice
participates

Rounds in which Alice
participates will
output a message to
her friends!

Infer the set of friends!
Others
rA3
Anonymity
System
Alice
T4

rA2
Alice
T3
Others
Others
rA4
Anonymity
System
...
Others

Guess the set of friends of Alice (RA’)
 Constraint |RA’| = m

Accept if an element is in the output of each
round

Downside: Cost
 N receivers, m size – (N choose m) options
 Exponential – Bad

Good approximations...

Note that the friends of Alice will be in the sets
more often than random receivers

How often? Expected number of messages per
receiver:
 μother = (1 / N) ∙ (K-1) ∙ t
 μAlice = (1 / m) ∙ t + μother

Just count the number of messages per receiver
when Alice is sending!
 μAlice > μother

Parameters: N=20 m=3 K=5 t=45
Round Receivers
SDA
2
3
4
5
6
7
8
9
10
11
12
13
14
15
[19, 10, 17, 13, 8]
[0, 7, 0, 13, 5]
[16, 18, 6, 13, 10]
[1, 17, 1, 13, 6]
[18, 15, 17, 13, 17]
[0, 13, 11, 8, 4]
[15, 18, 0, 8, 12]
[15, 18, 15, 19, 14]
[0, 12, 4, 2, 8]
[9, 13, 14, 19, 15]
[13, 6, 2, 16, 0]
[1, 0, 3, 5, 1]
[17, 10, 14, 11, 19]
[12, 14, 17, 13, 0]
[13, 17, 19]
[0, 5, 13]
[5, 10, 13]
[10, 13, 17]
[13, 17, 18]
[0, 13, 17]
[0, 13, 17]
[13, 15, 18]
[0, 13, 15]
[0, 13, 15]
[0, 13, 15]
[0, 13, 15]
[0, 13, 15]
[0, 13, 17]
1
1
2
2
2
1
1
2
1
1
1
1
1
1
395
257
203
179
175
171
80
41
16
16
16
4
2
2
17
18
19
20
21
22
23
24
[4, 1, 19, 0, 19]
[0, 6, 1, 18, 3]
[5, 1, 14, 0, 5]
[17, 18, 2, 4, 13]
[8, 10, 1, 18, 13]
[14, 4, 13, 12, 4]
[19, 13, 3, 17, 12]
[8, 18, 0, 10, 18]
[0, 13, 19]
[0, 13, 19]
[0, 13, 19]
[0, 13, 19]
[0, 13, 19]
[0, 13, 19]
[0, 13, 19]
[0, 13, 18]
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1 [15, 13, 14, 5, 9]
16 [18, 19, 19, 8, 11]
[13, 14, 15]
[0, 13, 19]
KA={[0, 13, 19]}
SDA_error
2
#Hitting sets
685
Round 16:
Both attacks give correct result
0
1
SDA: Can give wrong results –
need more evidence
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
[19, 4, 13, 15, 0] [0, 13, 19]
[13, 0, 17, 13, 12] [0, 13, 19]
[11, 13, 18, 15, 14] [0, 13, 18]
[19, 14, 2, 18, 4] [0, 13, 18]
[13, 14, 12, 0, 2] [0, 13, 18]
[15, 19, 0, 12, 0] [0, 13, 19]
[17, 18, 6, 15, 13] [0, 13, 18]
[10, 9, 15, 7, 13] [0, 13, 18]
[19, 9, 7, 4, 6]
[0, 13, 19]
[19, 15, 6, 15, 13] [0, 13, 19]
[8, 19, 14, 13, 18] [0, 13, 19]
[15, 4, 7, 13, 13] [0, 13, 19]
[3, 4, 16, 13, 4] [0, 13, 19]
[15, 13, 19, 15, 12] [0, 13, 19]
[2, 0, 0, 17, 0]
[0, 13, 19]
[6, 17, 9, 4, 13]
[0, 13, 19]
[8, 17, 13, 0, 17] [0, 13, 19]
[7, 15, 7, 19, 14] [0, 13, 19]
[13, 0, 17, 3, 16] [0, 13, 19]
[7, 3, 16, 19, 5] [0, 13, 19]
[13, 0, 16, 13, 6] [0, 13, 19]
0
0
1
1
1
0
1
1
0
0
0
0
0
0
0
0
0
0
0
0
0
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
1
SDA: Can give wrong results –
need more evidence

Counter-intuitive
 The larger N the easiest the attack

Hitting-set attacks
 More accurate, need less information
 Slower to implement
 Sensitive to Model
▪ E.g. Alice sends dummy messages with probability p.

Statistical disclosure attacks
 Need more data
 Very efficient to implement (vectorised) – Faster partial results
 Can be extended to more complex models (pool mix, replies, ...)

Bayesian modelling of the problem
 A systematic approach to traffic analysis
Profile
Alice  
T1
Alice
Profile Others
Other  
Anonymity
System
rA1  Alice
rother  Other
Matching M1  M
Observation O1 = Senders ( Uniform) + receivers (as above)

Full generative model:
Pick a profile for Alice Alice and a Profile for others Other from prior distributions (once!)
 For each round:

▪ Pick a multi-set of senders uniformly from the multi-sets of size K of possible senders
▪ Pick a permutation M1 from kk uniformly at random
▪ For each sender pick a receiver according to the probability in their respective profiles
Profile
Alice  
T1
Alice
Profile Others
Other  
Anonymity
System
rA1  Alice
rother  Other
Matching M1  M
Observation O1 = Senders ( Uniform) + receivers (as above)

Full generative model:
Pick a profile for Alice Alice and a Profile for others Other from prior distribution (once!)
 For each round:

▪ Pick a multi-set of senders in Oi uniformly from the multi-sets of size K of possible senders
▪ Pick a permutation M1 from kk uniformly at random
▪ For each sender pick a receiver rAlice and rother according to the probability in their respective profiles
Profile
Alice  
T1
Alice
Profile Others
Other  
Anonymity
System
rA1  Alice
rother  Other
Matching M1  M
Observation O1 = Senders ( Uniform) + receivers (as above)

The Disclosure attack inference problem:

“Given all known and observed variables (priors , M, observations Oi) determine the distribution
over the hidden random variables (profiles Alice , Other and matching Mi )”

Using information from all rounds, assuming profiles are stable

How? (a) maths = Bayesian inference (b) computation = Gibbs sampling

What are profiles?
 Keep it simple: a multinomial distribution over all users.
 E.g.
Alice = [Bob: 0.4, Charlie: 0.3, Debbie: 0.3]
Other = [Uniform(1/n-1)]
 Prior distribution on multinomials: Dirichlet Distribution, i.e.
samples from Dirichlet distribution are multinomial
distributions
 Other profile models are also possible: restrictions on number
of contacts, cliques, mutual friends, …

Other quantities have known conditional probabilities.

What are those?
T1
Profile
Alice  
Alice
Pr[rA1 | Alice ,M1, O1]
Pr[Alice | ]
Anonymity
System
Profile Others
Other  
Pr[Other | ]
Pr[O1 | K]
Matching M1  M Pr[M1 | M]
rA1  Alice
rother  Other
Pr[rother | other ,M1, O1]
Observation O1 = Senders ( Uniform) + receivers (as above)

Full generative model:
Pick a profile for Alice Alice and a Profile for others Other from prior distribution (once!)
 For each round:

▪ Pick a multi-set of senders in Oi uniformly from the multi-sets of size K of possible senders
▪ Pick a permutation M1 from kk uniformly at random
▪ For each sender pick a receiver rAlice and rother according to the probability in their respective profiles

What we have:
Pr[Alice , Other, Mi , riA, riother | Oi, M, , K] =
= Pr[riA | Alice ,M1, Oi] x Pr[riother | other ,M1, Oi] x
Pr[Mi | M] x Pr[Alice | ] x Pr[Other | ]

What we really want:
Pr[Alice , Other, Mi | riA, riother, Oi, M, , K]

Pr[A=a, B=b]
= Pr[A=a | B=b] x Pr[B = b]
=Pr[B=b | A=a] x Pr[A = a]
Derivation
Inverse probability


Meaning
Pr[A=a | B=b]
= Pr[B=b | A=a ] x Pr[A=a ] / Pr[B = b]
Forward probability
Prior
Normalising factor
Pr[B=b] = (A=a) Pr[A=a, B=b]
= (A=a) Pr[A=a | B=b] x Pr[B = b]
Large spaces = no direct computation
Normalising
factor

A
Pr[ r , rB | 
Pr[
,A

B
,A
M O , M, , K] x
Pr[Alice , Other, Mi | riA, riother , Oi, M, , K] =
iA

iother
Alice
Alice
, Other
Other, Mi
i
i
| Oi, M, , K]



Pr[ riA, riother | Alice , Other, Mi Oi, M, , K] x
Pr[Alice , Other, Mi | Oi, M, , K]
All riA, riother
Large spaces = no direct computation

Good: can compute the probability sought up
to a multiplicative constant
 Pr[Alice , Other, Mi | riA, riother , Oi, M, , K] 
Pr[ riA, riother | Alice , Other, Mi Oi, M, , K] x Pr[Alice , Other, Mi | Oi, M, , K]

Bad: Forget about computing the constant –
in general

Ugly: Use computational methods to
estimate quantities of interest for traffic
analysis

Full hidden state: Alice , Other, Mi

Marginal probabilities & distributions
 Pr[Alice->Bob] – Are Alice and Bob friends?
 Mx – Who is talking to whom at round x?

How can we get information about those?
 Without computing the exact probability!

Consider a distribution Pr[A=a]

Direct computation of mean:
 E[A] = a Pr[A=a] x a

Computation of mean through sampling:
 Samples A1, A2, …, An  A
 E[A]  i Ai / n



Consider a distribution Pr[A=a]

Computation of mean through sampling:
 Samples A1, A2, …, An  A
 E[A]  i Ai / n

More:
 Estimates of variance
 Estimate of percentiles – confidence intervals
(0.5%, 2.5%, 50%, 97.5%, 99.5%)
Profile
Alice  
T1
Alice
Profile Others
Other  
rA1  Alice
Anonymity
System
rother  Other
Matching M1  M
Observation O1 = Senders ( Uniform) + receivers (as above)

The Disclosure attack inference problem:

Sample
“Given all known and observed variables (priors , M, observations Oi) determine the distribution
over the hidden random variables (profiles Alice , Other and matching Mi )”

Using information from all rounds, assuming profiles are stable

How? (a) maths = Bayesian inference (b) computation = Gibbs sampling

Typical approaches
 Direct sampling: Draw elements according to the
their probabilities
 Rejection sampling: Draw elements according to
another distribution, keep with the probability of
the element

Markov-Chain Monte Carlo (MCMC):
 Gibbs Sampling (simpler)
 Metropolis Hastings (more flexible)

Allows sampling from complex distributions
when their marginal distributions are easy to
sample from.

Example: Sample Pr[A,B | O]

For sample s in (0, SAMPLES):
 For iteration j in (0, ITERATIONS):
▪ aj  A with Pr[A|B=bj-1,O]
▪ bj  B with Pr[B|A=aj,O]
 Samples = (aSAMPLES, bSAMPLES)

Our distribution:
Pr[Alice , Other, Mi | riA, riother , Oi, M, , K]

Marginal distributions:
 Profiles:
Pr[Alice , Other | Mi ,riA, riother , Oi, M, , K]
(Direct sampling by sampling Dirichlet dist.)
 Mappings:
Pr[Mi | Alice , Other , riA, riother , Oi, M, , K]
(Direct sampling of the matching link by link)
Pr[Alice , Other, Mi | riA, riother , Oi, M, , K]
SAMPLING

Get typical values of all
hidden variables

Pros: determine the
distribution and confidence
intervals of each quantity
of interest.
Cons: not the “best
solution”

OPTIMIZING

Get the values of hidden
variables that maximize
the probability


Pros: “best solution”
Cons: information about
other solutions lost!

Near-perfect anonymity is not perfect enough!
 High level patterns cannot be hidden for ever
 Unobservability / maximal anonymity set size needed

Flavours of attacks
 Very exact attacks – expensive to compute
▪ Model inexact anyway
 Statistical variants – wire fast!
 Bayesian approach – systematic
▪ Defile a “forward” generative probability model
▪ “Invert” the model using Bayes theorem
▪ Use sampling to estimate quantities of interest
▪ Find their confidence intervals

References

Limits of Anonymity in Open Environments
by Dogan Kesdogan, Dakshi Agrawal, and Stefan Penz. In the Proceedings of Information Hiding Workshop (IH 2002),
October 2002.

Statistical Disclosure Attacks: Traffic Confirmation in Open Environments
by George Danezis. In the Proceedings of Security and Privacy in the Age of Uncertainty, (SEC2003), Athens, May
2003, pages 421-426.

The Hitting Set Attack on Anonymity Protocols
by Dogan Kesdogan and Lexi Pimenidis. In the Proceedings of 6th Information Hiding Workshop (IH 2004), Toronto,
May 2004.

Statistical Disclosure or Intersection Attacks on Anonymity Systems
by George Danezis and Andrei Serjantov. In the Proceedings of 6th Information Hiding Workshop (IH 2004), Toronto,
May 2004.

Practical Traffic Analysis: Extending and Resisting Statistical Disclosure
by Nick Mathewson and Roger Dingledine. In the Proceedings of Privacy Enhancing Technologies workshop (PET
2004), May 2004, pages 17-34.

Two-Sided Statistical Disclosure Attack
by George Danezis, Claudia Diaz, and Carmela Troncoso. In the Proceedings of the Seventh Workshop on Privacy
Enhancing Technologies (PET 2007), Ottawa, Canada, June 2007.

Perfect Matching Statistical Disclosure Attacks
by Carmela Troncoso, Benedikt Gierlichs, Bart Preneel, and Ingrid Verbauwhede. In the Proceedings of the Eighth
International Symposium on Privacy Enhancing Technologies (PETS 2008), Leuven, Belgium, July 2008, pages 2-23.

Vida: How to Use Bayesian Inference to De-anonymize Persistent Communications
by George Danezis and Carmela Troncoso. In the Proceedings of Privacy Enhancing Technologies, 9th International
Symposium, PETS 2009, Seattle, WA, USA, August 5-7, 2009, 2009, pages 56-72.

Traffic analysis exercise

http://www.cl.cam.ac.uk/~sjm217/projects/anon/ta-exercise.html
Anonymity, cryptography, network security and attacks
FINANCIAL CRYPTOGRAPHY
& DATA SECURITY 2011



St. Lucia – Caribbean
Submission deadline: October 1, 2010
Event: Feb-Mar 2011.
PRIVACY ENHANCING
TECHNOLOGIES 2011



Waterloo – Canada
Submissions deadline: March 2011
Event: July 2011

David Chaum (concept 1979 – publish 1981)
 Ref is marker in anonymity bibliography

Makes uses of cryptographic relays
 Break the link between sender and receiver

Cost
 O(1) – O(logN) messages
 O(1) – O(logN) latency

Security
 Computational (public key primitives must be secure)
 Threshold of honest participants
Alice
M->B: Msg
A->M: {B, Msg}Mix
Bob
The Mix
Adversary cannot
see inside the Mix
1) Bitwise unlinkability
Alice
A->M: {B, Msg}Mix
?
M->B: Msg
Bob
The Mix
?
2) Traffic analysis resistance

Bitwise unlinkability
 Ensure adversary cannot link messages in and out
of the mix from their bit pattern
 Cryptographic problem

Traffic analysis resistance
 Ensure the messages in and out of the mix cannot
be linked using any meta-data (timing, ...)
 Two tools: delay or inject traffic – both add cost!

Broken bitwise unlinkability
k
 The `stream cipher’ mix (Design 1)
PKMix
,
Stream Cipher
Message
 {M}Mix = {fresh k}PKmix, M  Streamk
A->M: {B, Msg}Mix
Alice

M->B: Msg
Bob
The
Mix
Active attack?
Tagging Attack
Adversary intercepts {B, Msg}Mix
and injects {B, Msg}Mix xor (0,Y).
The mix outputs message:
M->B: Msg xorY
And the attacker can link them.
k

Mix acts as a service
 Everyone can send messages to it; it will apply an
algorithm and output the result.
 That includes the attacker – decryption oracle,
routing oracle, ...

(Active) Tagging attacks
 Defence 1: detect modifications (CCA2)
 Defence 2: lose all information (Mixminion, Minx)
Input
Processing
inside
MIx
Output
George Danezis & Ian Goldberg. Sphinx: A Compact and Provably Secure Mix Format. IEEE S&P ‘09.

Broken traffic analysis resistance
 The `FIFO*’ mix (Design 2)
 Mix sends messages out in the order they came in!
A->M: {B, Msg}Mix
Alice
Bob
The
Mix
* FIFO = First in, First out

M->B: Msg
Passive attack?
The adversary simply counts the
number of messages, and assigns
to each input the corresponding
output.

Mix strategies – ‘mix’ messages together
 Threshold mix: wait for N messages and output them in a random order.
 Pool mix: Pool of n messages; wait for N inputs; output N out of N+n; keep
remaining n in pool.

Timed, random delay, ...
Threshold Mix
Pool Mix
Pool

“Hell is other people” – J.P. Sartre
 Anonymity security relies on others
 Problem 1: Mix must be honest
 Problem 2: Other sender-receiver pairs to hide amongst

Rely on more mixes – good idea
 Distributing trust – some could be dishonest
 Distributing load – fewer messages per mix

Two extremes
 Mix Cascades
▪ All messages are routed through a preset mix sequence
▪ Good for anonymity – poor load balancing
 Free routing
▪ Each message is routed through a random sequence of mixes
▪ Security parameter: L then length of the sequence
A->M2: {M4, {M1,{B, Msg}M1}M4}M2
Free route
mix network
Alice
The Mix
M1
M2
M3
M4
(The adversary should
5 more information
getMno
M6
than before!)
M7
Bob

Bitwise unlinkability
 Length invariance
 Replay prevention

How to find mixes?
 Lists need to be authoritative, comprehensive & common

Additional requirements – corrupt mixes
 Hide the total length of the route
 Hide the step number
 (From the mix itself!)

Length of paths?
 Good mixing in O(log(|Mix|)) steps = log(|Mix|) cost
 Cascades: O(|Mix|)

We can manage “Problem 1 – trusting a mix”

The (n-1) attack – active attack
 Wait or flush the mix.
 Block all incoming messages (trickle) and injects
own messages (flood) until Alice’s message is out.
1
Alice
Bob
The
Mix
Attacker
n

Strong identification to ensure distinct identities
 Problem: user adoption

Message expiry
 Messages are discarded after a deadline
 Prevents the adversary from flushing the mix, and injecting
messages unnoticed

Heartbeat traffic
 Mixes route messages in a loop back to themselves
 Detect whether an adversary is blocking messages
 Forces adversary to subvert everyone, all the time

General instance of the “Sybil Attack”

Malicious mixes may be dropping messages
 Special problem in elections

Original idea: receipts (unworkable)

Two key strategies to prevent DoS
 Provable shuffles
 Randomized partial checking

Bitwise unlinkability: El-Gamal re-encryption
 El-Gamal public key (g, gx) for private x
 El-Gamal encryption (gk, gkx ∙M)
 El-Gamal re-encryption (gk’ ∙ gk , gk’xgkx ∙M)
▪ No need to know x to re-encrypt
▪ Encryption and re-encryption unlinkable

Architecture – re-encryption cascade
 Output proof of correct shuffle at each step
Alice’s input
Mix 1
Mix 2
Mix 3
El-Gamal
Encryption
Reenc
Reenc
Reenc
Proof
Proof
Proof

Threshold
Decryption
Proof
Proof of correct shuffle
 Outputs are a permutation of the decrypted inputs
 (Nothing was inserted, dropped, otherwise modified!)
 Upside: Publicly verifiable – Downside: expensive

Applicable to any mix system

Two round protocol





Mix commits to inputs and outputs
Gets challenge
Reveals half of correspondences at random
Everyone checks correctness
Pair mixes to ensure messages get some
anonymity
Mix i
Mix i+1
Reveal half
Reveal other half

Rogue mix can cheat with probability at most ½

Messages are anonymous with overwhelming
probability in the length L
 Even if no pairing is used – safe for L = O(logN)

Cryptographic reply address
 Alice sends to bob: M1,{M2, k1,{A,{K}A}M2}M1
▪ Memory-less:
k1 = H(K, 1)
k2 = H(K, 2)
 Bob replies:
▪ B->M1: {M2, k1, {A,{K}A}M2}M1, Msg
▪ M1->M2: {A,{K}A}M2 , {Msg}k1
▪ M2->A: {K}A, {{Msg}k1}k2
 Security: indistinguishable from other messages

Anonymity requires a crowd
 Difficult to ensure it is not simulated – (n-1) attack

DC-nets – Unconditional anonymity at high
communication cost
 Collision resolution possible

Mix networks – Practical anonymous messaging




Bitwise unlinkability / traffic analysis resistance
Crypto: Decryption vs. Re-encryption mixes
Distribution: Cascades vs. Free route networks
Robustness: Partial checking

The anonymity set (size)
 Dining cryptographers
▪ Full key sharing graph = (N - |Adversary|)
▪ Non-full graph – size of graph partition
 Assumption: all equally likely

Mix network context
 Threshold mix with N inputs: Anonymity = N
Mix
Anonymity
N=4

Example: 2-stage mix

Option 1:
 3 possible participants
Alice
Bob
¼
¼
 => N = 3
Mix 1
 Note probabilities!

Charlie
½
Mix 2
Option 2:
 Arbitrary min probability
?
 Problem: ad-hoc

Alice
Bob
Example: 2-stage mix
¼
¼

Define distribution of senders
(as shown)

Entropy of the distribution is
anonymity
Mix 1
 E = -∑pi log2 pi

Example:
E
Charlie
½
Mix 2
?
= - 2 ¼ (-2) – (½) (-1)
= + 1 + ½ = 1.5 bits

(NOT N=3 => E = -log3 = 1.58 bits)

Intuition: missing information
for full identification!

Only the attacker can measure the anonymity of a
system.
 Need to know which inputs, output, mixes are controlled

Anonymity of single messages
 How to combine to define the anonymity of a systems?
 Min-anonymity of messages

How do you derive the probabilities? (Hard!)
 Complex systems – not just examples

Core:
 The Dining Cryptographers Problem: Unconditional Sender and Recipient
Untraceability by David Chaum.
In Journal of Cryptology 1, 1988, pages 65-75.
 Mixminion: Design of a Type III Anonymous Remailer Protocol by George
Danezis, Roger Dingledine, and Nick Mathewson.
In the Proceedings of the 2003 IEEE Symposium on Security and Privacy, May
2003, pages 2-15.
 Sphinx: A Compact and Provably Secure Mix Format by George Danezis and Ian
Goldberg. In the Proceedings of the 30th IEEE Symposium on Security and Privacy
(Samp;P 2009), 17-20 May, Oakland, California, USA, 2009, pages 269-282.

More
 A survey of anonymous communication channels by George Danezis, Claudia
Diaz and Paul Syverson
http://homes.esat.kuleuven.be/~gdanezis/anonSurvey.pdf
 The anonymity bibliography
http://www.freehaven.net/anonbib/
A systematic approach to measuring anonymity
Remixed slides contributed by Carmela Troncoso

Uncover who speaks to whom
 Observe all links (Global Passive Adversary)
 Long term disclosure attacks:
 Exploit persistent patterns
 Disclosure Attack [Kes03], Statistical Disclosure Attack [Dan03], Perfect Matching
Disclosure Attacks [Tron-et-al08]
 Restricted routes [Dan03]
 Messages cannot follow any route
 Bridging and Fingerprinting [DanSyv08]
 Users have partial knowledge of the network

Based on heuristics and specific models, not generic
86

Determine probability distributions input-output
A
A
MIX 1
B
( A, B, C )
1
1
or B
2
2
MIX 3
A
1
1
or B
2
2
A
MIX 2
C

3 3
(
Q 8,8,
3 3
R ( , ,
8 8
1
)
4
1
)
4
1
1
1
or B or C
4
4
2
S
(
1 1 1
,
, )
4 4 2
Threshold mix: collect t messages, and outputs them changing
their appearance and in a random order
87

Constraints, e.g. length=2
A
B
A
MIX 1
( A, B, C )
1
1
or B
2
2
MIX 3
A
1
1
or B
2
2
C
1 1 1
(
Q 4 , 4 , 2)
1 1 1
R ( , , )
4 4 2
C1
MIX 2
S
(
1 1
,
,0 )
2 2
Non trivial given
observation!!
88
How to compute probabilities
systematically??
Senders
Mixes (Threshold = 3)
Receivers
89

Find “hidden state” of the mixes
A
B
M1
Q
Pr[ HS | O, C ]
R
M3
C
S
M2
Prior information
Pr[O | HS , C ]  Pr[ HS | C ]
Pr[O | HS , C ]  K
Pr[ HS | O, C ] 

 Pr[ HS , O | C ] Too large to 
HS
enumerate!!

“Hidden State” + Observation = Paths
A
B
Q
M1
R
M3
C
S
M2
P1
P2
P3
A
B
C
M1
M1
M2
M2
M3
S
M3
Q
R
Pr[O | HS , C ]  K Pr[ Paths | C ]
Pr[ HS | O, C ] 




Actually… we want marginal probabilities Pr( A  Q | O, C )
A
B
A
A
( A, B, C )
1
1
or B
2
2
Q
R
1
1
or B
2
2
A
C
3 3 1
( , , )
8 8 4
3 3 1
( , , )
8 8 4
1
1
1
or B or C
4
4
2
S
(
1 1 1
, , )
4 4 2
Pr[ A  Q | O, C ]   I AQ ( HS j ) Pr[ HS | O, C ]
HS

But… we cannot obtain a full distribution on HS directly, and
cannot enumerate them all
92

Using sampling
HS1, HS2, HS3, HS4,…, HSN ~ Pr[ HS | O, C ]
(A → Q)?
0
1
0
1
… 1

Pr[ A  Q | O, C 

j
I AQ ( HS j )
N
Markov Chain Monte Carlo Methods
Pr[ HS | O, C ] 
Pr[ Paths | C ]

How does Pr[Paths|C] look like?

Evaluation: information theoretic metrics for
anonymity
H   Pr[ A  Ri | O, C ]  log Pr[ A  Ri | O, C ]
Ri
 Or min-Entropy, max-probability, (k-anonymity), …

Operations: estimating probability of arbitrary
events
 Input message to output message?
 Alice speaking to Bob ever?
 Two messages having the same sender?
How we define
Pr[ Paths | C ] ?
Pr[ L  l | C ] 
Lmax
1
 Lmin
Example uniform
path length lmin to lmax
x
( N mix  l )!
Pr[ M x | L  l , C ] 
N mix !
Example random
permutation of length l


1 If all distinct
I set ( M x )  
0 otherwise
Context: Senders and the time / round they send
Basic generative model:



Pr[ Paths | C ]   Pr[ Px | C ]
Users decide on paths independently
Path length sampled from a path length distribution
Set of nodes chosen as a random choice of distinct nodes
Pr[ Px | C ]  Pr[ L  l | C ]  Pr[ M x | L  l , C ]  I set ( M x )
x
The problem of
Unknown destinations
Observer has to stop at
some point!
End of observation
Destination
Destination
Problem of imputed observation:
Bayesian approach – fill in the data
Destination
 Lmax

Pr[ Px | C ]    Pr[ L  l | C ]  Pr[ M x | L  l , C ]  I set ( M x )
l  Lobs


Bridging: users can only send through mixes they know
Pr[ Px | C ]  Pr[ L  l | C ]  Pr[ M x | L  l , C ]  I set ( M x )  I bridging ( M x | x )

Non-compliant clients (with probability p c p )
 Do not respect length restrictions
( Lmin, c p , Lmax, c p )
 Choose l out of the Nmix node available, allow repetitions
Pr[ M x | L  l , C , I c p ( Path)] 
As before, paths are independent
1
N mix
l
Pr[ Paths | C ]   Pr[ Px | C ]
x
Decide whether a node is compliant or not to calculate the probability

 

Pr[ Paths | C ]    p c p Pr[ Pi | C , I c p ( Pi )]    (1  p c p ) Pr[ Pj | C ]
iPc p
  jPcp


Social network information
 Assuming we know sending profiles
Pr(Sen x  Rec x )
Pr[ Px | C ]  Pr[ L  l | C ]  Pr[ M x | L  l , C ]  I set ( M x )  Pr[Sen x  Rec x ]
Augment the traffic analysis
of mix networks with long-term
profiles

Other constraints
 Unknown origin
 Dummies
 Other mixing strategies
 ….
Take home lesson:
• Integrate more constraints / more
complexity by augmenting generative
model
• More constraints = less anonymity
How we sample
Pr[ Paths | C ]?

Sample from distribution that is difficult to sample
directly
Pr[ HS | O, C ] 
Pr[O | HS , C ]  Pr[ HS | C ]
Pr[O | HS , C ]  K Pr[ Paths | C ]


Pr[
HS
,
O
|
C
]



HS
Pr[ Paths | C ]   Pr[ Px | C ]
x

Best Paths vs. Paths Samples – 3 Key points
 Requires only a generative model / likelihood
 Provides full & true distributions – good error estimate
▪ Not false positives and negatives
 Systematic & extensible

We want to sample Pr[A] which is hard to sample directly
 But we can calculate Pr[A] / Z up to a multiplicative constant Z
Q(A=ai | A=ai-1)
A = ai
A = ai-1
Q(A=ai-1 | A=ai)

Metropolis Hastings algorithm
 Define a random walk Q on A, that is easy to sample
and calculate Q(A=ai | A=ai-1) and Q(A=ai-1 | A=ai)
 Select a random initial sample A=a0
 Given ai-1 draw a sample ai from Q(ai | ai-1)
 Calculate
Pr[ A  ai ]Q(ai | ai 1 )

Pr[ai 1 ]Q(ai 1 | ai )
 Accept ai with probability min(1, )

Constructs a Markov Chain with stationary distribution Pr( HS | O, C )
 Current state
Q
Candidate state
Q( HS candidate | HS current )
HScurrent
HScandidate
Q( HS current | HS candidate)
1.
Compute  
2.
If
 1
Pr( HS candidate)Q( HS candidate | HS current )
Pr( HS current )Q( HS current | HS candidate)
HS current  HS candidate Accept
else u ~ U (0,1)
if u   HS current  HS candidate
else
HS current  HS current
Accept with probability α
Q( Pathscandidate | Pathscurrent )
Pr[ HS | O, C ] 


Pr[ Paths | C ]
Z
Pahtscandidate
Pathscurrent
Q( Pathscurrent | Pathscandidate)
Pr[ Pathscandidate ]Q( Pathscandidate | Pathscurrent )
Pr[ Pathscurrent ]Q( Pathscurrent | Pathscandidate )
Transition Q: swap operation
Q
R
A
B
M3
M1
C
S
M2
 More complicated transitions for non-compliant clients
Q( Pathscandidate | Pathscurrent )
Pahtscandidate
Pathscurrent
Q( Pathscurrent | Pathscandidate)
Paths
Paths
PathsJ
Pathsi
Paths



Paths
Paths

Consecutive paths dependent
Sufficiently separated samples
guarantee independence
It works!
C. Troncoso - UTA - Nov 2009
106
Events should happen with the predicted probability
1.
2.
3.
4.
Create an instance of a network
Run the sampler and obtain P1,P2,…
Choose a target sender and a receiver
( Paths )
Predict probability
I
Sen  Rec
Pr(Sen  Rec ) 
5.
6.
j
j
N
Check if actually Sen chose Rec as receiver
I Sen Rec (network )
Choose new network and go to 2
Empirical probability
E ( I Sen Rec (network ))
Prempirical ( X  Y ) ~ Beta ( I X Y  1, I X Y  1)
Predicted probability
I
Sen  Rec
Paths
j
( Paths j )
100 msg

It scales well as networks get larger

As expected mix networks offer good protection
1000 msg


Nmix
t
Nmsg
Samples
RAM(Mb)
3
3
10
500
16
3
3
50
500
18
10
20
50
500
18
10
20
1 000
500
24
10
20
10 000
500
125
Size of network and population
Results are kept in memory during simulation
Nmix
t
Nmsg
iter
Full analysis (min)
One sample(ms)
3
3
10
6011
4.24
509.12
3
3
50
6011
4.80
576.42
5
10
100
7011
5.34
641.28
10
20
1 000
7011
5.97
706.12

Operations should be O(1)

Writing of the results on a file

Different number of iterations

Models with partial adversaries

Onion routing – low latency models

Location privacy

Use models to derive the anonymity systems

Traffic analysis is non trivial when there are constraints

Probabilistic model: incorporates most attacks
 Non-compliant clients
 Integrate with other inferences: long-term attacks & other

Monte Carlo Markov Chain methods to extract marginal
probabilities
 Systematic
 Only generative model needed

Future work:
 Model more constraints / less information for attacker
 Added value?

More info
 Vida: How to use Bayesian inference to de-anonymize
persistent communications. George Danezis and Carmela
Troncoso. Privacy Enhancing Technologies Symposium 2009
 The Bayesian analysis of mix networks. Carmela Troncoso and
George Danezis. 16th ACM Conference on Computer and
Communications Security 2009
 The Application of Bayesian Inference to Traffic analysis.
Carmela Troncoso and George Danezis Microsoft Technical Report
Sampling error, Beta distribution, …
Pr(O, HS | C )  Pr( HS | O, C )  Pr(O | C )
Pr(O, HS | C )  Pr(O | HS , C )  Pr( HS | C )
Pr(O | HS , C )  Pr( HS | C ) Pr(O | HS , C )  Pr( HS | C )
Pr( HS | O, C ) 

Pr(O | C )
 Pr( HS , O | C )
HS
Joint probability:
Pr( X , Y )  Pr( X | Y )  Pr(Y )  Pr(Y | X )  Pr( X )

We need to specify the prior knowledge (Pr( ) Pr( HS | C ) )



expresses our uncertainty
Beta(1,1)
Beta(5,1)
conforms to the nature of the parameter,Beta(0.5,0.5)
i.e. is continuous
but bounded
between 0 and 1
A convenient choice is the Beta distribution
(a  b) a 1
P( )  Beta (a, b) 
 (1   )b1
(a)(b)0.0 0.4 0.8 0.0
theta
Beta(0.5,0.5)
0.0
0.4
0.8
theta
Beta(1,1)
0.0
0.4
0.8
theta
Beta(5,5)
Beta(5,1)
0.0
0.4
0.8
theta
0.0
0.4
0.8
theta
0.4
0.8
theta
Beta(5,20)
0.0
0.4
0.8
theta
0.0
0.4
0.8
theta
Beta(50,20
0.0
0.4
0.8
theta

Combining a beta prior with the binomial likelihood gives a
posterior distribution
p( | total, successes )  p( successes |  , total)  p( )
  successes a 1 (1   )total successesb1
 Beta ( successes  a, total  successes  b)
Prior
knowledge
P1 P2 P3 …
Paths1
(A → Q)? 1 0 1 …
Pr( A  Q) 
I
AQ
( Pathsi )
i
j
Paths
Paths
Paths2
Paths

Error estimation

Bernouilli distribution
Pr[ I AQ ( P1 ), I AQ ( P2 ), I AQ ( P3 ),... | Pr( A  Q)]
Pr[Pr( A  Q) | I AQ ( P1 ), I AQ ( P2 ), I AQ ( P3 ),...]
Paths3

Paths
Prior Beta(1,1) ~ uniform
Pr( A  Q) ~ Beta (  I AQ ( Pi )  1,
Paths
I
Paths
AQ
( Pi )  1)
Pr(Sen  Rec )  0.4

Studying events with
I AB ( P1 )  0; I AB ( P2 )  1; I AB ( P3 )  0; I AB ( P4 )  0; I AB ( P5 )  1
 Network 1
Pr( A  B) 
I
AB
( Pj )
j
5
Pr( X  Y)  0.4;
Pr( X  Y)  0.4;




 0.4;
I A B ( Network1 )  0
I XY ( Network2 )  0
I XY ( Network3 )  1
Network 2Pr(X  Y)  0.4; I ( Network )  1
Network 3Pr(X  Y)  0.4; I ( Network )  0
Network 4Pr
(X  Y)  0.4
Network
Pr 5 ( X  Y ) ~ Beta (2  1, 3  1) X% Confidence intervals
sampled
empirical
X Y
4
X Y
5
Anonymous web browsing and peer-to-peer

Anonymising streams of messages
 Example: Tor

As for mix networks
 Alice chooses a (short) path
 Relays a bi-directional stream of traffic to Bob
Cells of traffic
Alice
Onion
Router
Onion
Router
Bi-directional
Onion
Router
Bob

Setup route once per connection
 Use it for many cells – save on PK operations

No time for delaying
 Usable web latency 1—2 sec round trip
 Short routes – Tor default 3 hops
 No batching (no threshold , ...)

Passive attacks!


Adversary observes all inputs and outputs of
an onion router
Objective link the ingoing and outgoing
connections (to trace from Alice to Bob)

Key: timing of packets are correlated

Two techniques:
 Correlation
 Template matching
1
T=0


3
2
1
2
2
INi
Number of cell
per time interval
Onion
Router
1
T=0
2
3
0
3
OUTi
Quantise input and output load in time
Compute:
 Corr = ∑i INi∙OUTi

Downside: lose precision by quantising
2
Input Stream
Onion
Router
Output Stream
INTemplate
Compare with template
vi
 Use input and delay curve to make template
 Prediction of what the output will be

Assign to each output cell the template value (vi) for its
output time

Multiply them together to get a score (∏ivi)

Cannot withstand a global passive adversary
 (Tracing attacks to expensive to foil)

Partial adversary
 Can see some of the network
 Can control some of the nodes

Secure if adversary cannot see first and last node of
the connection
 If c is fraction of corrupt servers
 Compromize probability = c2

No point making routes too long

Forward secrecy
 In mix networks Alice uses long term keys
A->M2: {M4, {M1,{B, Msg}M1}M4}M2
 In Onion Routing a bi-directional channel is
available
 Can perform authenticated Diffie-Hellman to
extend the anonymous channel

OR provides better security against
compulsion
Alice
OR1
OR2
OR3
Bob
Authenticated DH
Alice – OR1
K1
Authenticated DH, Alice – OR2
Encrypted with K1
K2
Authenticated DH, Alice – OR3
Encrypted with K1, K2
K3
TCP Connection with Bob, Encrypted with K1, K2, K3

Encryption of input and output streams under
different keys provides bitwise unlinkability
 As for mix networks
 Is it really necessary?

Authenticated Diffie-Hellman
 One-sided authentication: Alice remains anonymous
 Alice needs to know the signature keys of the Onion
Routers
 Scalability issue – 1000 routers x 2048 bit keys

Show that:
 If Alice knows only a small subset of all Onion
Routers, the paths she creates using them are not
anonymous.
 Assume adversary knows Alice’s subset of nodes.


Hint: Consider collusion between a corrupt middle and last node – then corrupt last
node only.
Real problem: need to ensure all clients know
the full, most up-to-date list of routers.

Anonymous routing immune to tracing
 Reasonable latency?

Yes, we can!
 Tracing possible because of input-output
correlations
 Strategy 1: fixed sending of cells
(eg. 1 every 20-30ms)
 Strategy 2: fix any sending schedule
independently of the input streams

Mixes and OR – heavy on cryptography

Lighter threat model
 No network adversary
 Small fraction of corrupt nodes
 Anonymity of web access

Crowds: a groups of nodes cooperate to
provide anonymous web-browsing
Probability p
(Send out request)
Reply
Probability 1-p
(Relay in crowd)
Bob
(Website)
Example:
p=1/4
Alice
Crowd – (Jondo)

Final website (Bob) or corrupt node does not
know who the initiator is
 Could be the node that passed on the request
 Or one before

How long do we expect paths to be?
 Mean of geometric distribution
 L = 1 / p – (example: L = 4)
 Latency of request / reply

Consider the case of a corrupt insider
 A fraction c of nodes are in fact corrupt

When they see a request they have to decide
whether
 the predecessor is the initiator
 or merely a relay

Note: corrupt insiders will never pass the
request to an honest node again!
Bob
(Website)
Corrupt node
Alice
Probability 1-p
(Relay in crowd)
What is the
probability my
predecessor is the
initiator?
Crowd – (Jondo)
p
Req
Initiator
1-p
c
Predecessor is initiator
& corrupt final node
Corrupt
Predecessor is random
& corrupt final node
Relay
Req
1-c
p
Honest
1-p
pI = (1-p) c / c ∑i=1..inf (1-p)i(1-c)i-1
pI = 1 – (1-p)(1-c)
Corrupt
c
Relay
Req
p
1-c
Honest
1-p
Corrupt
c
Relay
1-c
pI grows as (1) c grows (2) p grows
Honest
Exercise: What is the information theoretic amount of anonymity of crowds in this context

What about repeated requests?
 Alice always visits Bob
 E.g. Repeated SMTP connection to microsoft.com

Adversary can observe n times the tuple
 2 x (Alice, Bob)
 Probability Alice is initiator (at least once)
▪ P = 1 – [(1-p)(1-c)]n
 Probability of compromize reaches 1 very fast!

Fast routing = no mixing = traffic analysis attacks

Weaker threat models
 Onion routing: partial observer
 Crowds: insiders and remote sites

Repeated patterns
 Onion routing: Streams vs. Time
 Crowds: initiators-request tuples

PKI overheads a barrier to p2p anonymity

Core:
 Tor: The Second-Generation Onion Router by Roger Dingledine, Nick
Mathewson, and Paul Syverson. In the Proceedings of the 13th USENIX
Security Symposium, August 2004.
 Crowds: Anonymity for Web Transactions by Michael Reiter and Aviel Rubin.
In ACM Transactions on Information and System Security 1(1), June 1998.

More:
 An Introduction to Traffic Analysis by George Danezis and Richard Clayton.
http://homes.esat.kuleuven.be/~gdanezis/TAIntro-book.pdf
 The anonymity bibliography
http://www.freehaven.net/anonbib/
Download