PIR-Tor: Scalable Anonymous Communication Using Private

advertisement
Prateek Mittal
Femi Olumofin
Carmela Troncoso
Nikita Borisov
Ian Goldberg
Presented by Justin Chester

Background
◦ Tor – Anonymous Routing
◦ PIR: Private Information Retrieval







Related Work
PIR-Tor
Attack Resistance
Performance
Discussion
Conclusion
References




As more and more user services shift online,
privacy is a growing concern.
Traditional routing makes it possible to trace
online communication back to its source and
discover the identity of a sender.
Is this good or bad?
They who can give up essential liberty to
obtain a little temporary safety, deserve
neither liberty nor safety – Benjamin Franklin



How do we achieve anonymous routing?
The Tor framework is a popular onion based
solution that uses a network of ~2000 volunteer
relay nodes to move terabytes of data everyday for
hundreds of thousands of users, including
businesses, law enforcement and other government
agencies, journalists, whistleblowers, and many
private citizens.
Tor routing works by obtaining a global view of the
relay nodes (network consensus) and selecting
three to build a circuit. New circuits are
constructed every 10 minutes.
www.torproject.org
www.torproject.org
www.torproject.org




What is wrong with this approach?
It is inefficient! “In the near future the Tor
network could be spending more bandwidth
for maintaining a global view of the system
than for anonymous communication itself.”
Existing solutions to the scalability problem
advocate a peer-to-peer paradigm, but these
have all been proven insecure.
Instead, use Private Information Retrieval (PIR)


Naively, the only way to hide what records are wanted from a
server is to download the whole database. This is inefficient.
PIR achieves two things:
◦ Information-Theoretic PIR (ITPIR) allows a user to retrieve a
database record from a group of servers, each with a duplicate of
the database. Privacy is guaranteed irrespective of the
computational power of the servers assuming there is no
collusion.
◦ Computational PIR (CPIR) allows a user to privately retrieve records
from a single, computationally bound server.

PIR adds a small communication overhead as compared to
non-obfuscated database queries while achieving a large
savings over the naïve approach.


Background
Related Work
◦ Distributed Hash Table
◦ Random Walk






PIR-Tor
Attack Resistance
Performance
Discussion
Conclusion
References

Distributed Hash Table (DHT) based Peer to Peer
alternatives use a variety of mechanisms to obtain
relay info, but are all vulnerable to attack
(anonymity compromise).
◦ Salsa – information leak attack (lookup observation)
◦ NISAN – structure analysis (deterministic lookup)
◦ Torsk – zero-day buddy node compromise

Random Walk based Peer to Peer alternatives have
no global relay view, but are still vulnerable
◦ MorphMix – route capture attack (dirty relay collusion)
◦ ShadowWalker – hybrid approach, complicated weakness



Background
Related Work
PIR-Tor
◦ Design Goals
◦ System Architecture
◦ Database Organization





Attack Resistance
Performance
Discussion
Conclusion
References





Scalable Architecture – more relays = greater
anonymity
Security – preserve anonymity through easily
analyzed, well-understood mechanisms
Efficient Circuit Creation – reduce latency
Minimal Changes – leverage existing
implementations for incremental adoptions
Preserving Tor Constraints – uphold all
existing system policies


CPIR at Directory Servers – directory servers
are presumed to be untrustworthy
ITPIR at Guard Relays – guards are already
trusted in existing model
◦ Beats end-to-end timing analysis and selective
denial of service attacks in existing architecture
◦ All three guards must be compromised learn which
exit relays were selected
◦ Anonymity not broken unless exit node is also
compromised. No worse than existing architecture.

Tor constrains how relays are selected when forming a circuit
◦ First node – one of three entry guards
◦ Middle node – any relay
◦ Last node – exit relay with matching exit policy




Advocate separation into three databases, entry guard,
middle, and exit relay. Node can be guard and exit, so
entered in both.
Databases should be organized by bandwidth to assist with
load distribution. Relays with high bandwidth are chosen with
a higher probability.
Exit database should be organized by exit policy first and
then by bandwidth. Policies should have a higher degree of
standardization than the existing model.
Databases should be block based.
Figure from Mittal et. al




Background
Related Work
PIR-Tor
Attack Resistance
◦ Traffic Analysis & Route Fingerprinting
◦ Traffic Confirmation & Behavioral Profiling
◦ Reuse Analysis




Performance
Discussion
Conclusion
References

Traffic Analysis
◦ Adversary can observe part of the network and
disrupt traffic.
◦ Adversary can corrupt relays and insert relays.

Route Fingerprinting
◦ If clients only know a unique set of relays, this can
be used to identify them.
◦ Not a problem in current Tor (global view).
◦ Would be a problem for Peer-to-Peer methods.
◦ Not a problem for PIR-Tor. Even though clients only
have a limited knowledge of the network, PIR
techniques prevent attackers from learning.

Traffic Confirmation
◦ Adversary must control first and last relay in circuit.
Can happen with probability c2, where c is the
fraction of compromised bandwidth in network.
◦ This is the same probability as existing Tor

Behavioral Profiling
◦
◦
◦
◦
◦
Existing problem in Tor
Partitioning
Cookies
Session Timing
Frequently Accessed Hosts





PIR-Tor allows the reuse of relays when constructing circuits
If an adversary observing an exit relay can assemble an
aggregate behavioral profile for all clients using that exit.
The more clients that share knowledge of a block of relay
records, and therefore an exit relay, the less likely an
adversary can construct an accurate profile for a given user.
This means there is a tradeoff between load balancing and
potential loss of anonymity.
Analysis for b = 1
◦ Query returns a set B containing b blocks
◦ e: given exit relay, Pr[e] is probability that the returned B contains e. This
depends on the selection algorithm for relays.
◦ Fraction of clients that have knowledge of e is α
 α = (1 − (1 − Pr[e])b)
BW – Bandwidth based relay
selection
SB(s) – Snader Borisov relay
selection
SB(1) offers best protection,
but does not load balance the
network well
If, besides the exit relay, the adversary controls the destination of the client
traffic, then it can determine the middle relay from the circuit. The probability
of two clients sharing knowledge of an exit and middle is much smaller than
just sharing an exit. This greatly increases the chance of compromising
anonymity.
Figure from Mittal et al.





Background
Related Work
PIR-Tor
Attack Resistance
Performance
◦ CPIR
◦ ITPIR



Discussion
Conclusion
References





Standard CPIR security parameters (l0 = 19 and N = 50)
Hardware – dual Intel Xeon E5420 2.50 GHz quad-core
machine running Ubuntu Linux 10.04.1 using only one core
Relay descriptor size 2100 bytes (max found in current Tor)
Exit database set to half the size of middle database
Varied number of relays in PIR database and measured
a) Server computation
b) Total communication
c) Client computation
R denotes the CPIR recursion parameter. The communication cost in this
scheme is proportional to 8R * n1/(R+1) where n is number of relays in the
database.
Increasing R reduces communication (and client computation) drastically while
having only a small impact on server computation.
Figure from Mittal et al.




Hardware – dual Intel Xeon E5420 2.50 GHz quad-core
machine running Ubuntu Linux 10.04.1 using only one core
Relay descriptor size 2100 bytes (max found in current Tor)
Exit database set to half the size of middle database
Varied number of relays in PIR database and measured
a) Server computation
b) Total communication
c) Client computation
Compare the performance for when 18 circuits are setup vs. just 1
New downloads every 3 hours, new circuit setup every 10 min (6/hour)
3 * 6 = 18
In this architecture, block are not reused, so security is equivalent to Tor if all
guards are honest
Figure from Mittal et al.








Background
Related Work
PIR-Tor
Attack Resistance
Performance
Discussion
Conclusion
References









CPIR vs. ITPIR
Robustness
Scaling Strategies
Preventing Denial of Service (DoS)
Churn
Number of Circuits
Path Constraints
Optional Global View
Limitations




CPIR is more easily integrated into existing
Tor architecture.
CPIR is ideal for short browsing sessions or
when the client doesn’t care about circuit
linkability.
ITPIR results in great communication savings
for the client.
ITPIR supports a variety of workloads while
maintaining security



Each block of the descriptor database is
signed by a trusted directory authority to
prevent malicious servers from manipulating
values.
However, malicious servers can simply return
garbage or refuse to reply.
This type of attack is easily detected and
avoided by CPIR and ITPIR


Download new relay descriptors on demand
instead of periodically. This reduces
overhead, but increases circuit setup time.
Use micro-descriptors, relay descriptors that
contain rarely changing information.
Frequently changing info resides in the
network consensus (global view). This
reduces PIR database sizes with
computational and communication savings


Whenever a PIR sever begins to get congested
with queries, it can send a computationally
hard puzzle for the client to solve.
The server will not spend resources to service
the query until a correct answer is recieved


As the current Tor network experiences
churn, the number of network consensus
downloads increases.
As long as the rate of database updates is
less than 10 min, the number of PIR queries
made should not increase since only a small
number of directory servers or guards will
need to have the network consensus.



Communication overhead of PIR-Tor is
directly proportional to the number of circuits
constructed.
Tor developers are proposing the use of
separate circuits for each application used to
prevent certain types of profiling attacks.
To compensate, the timeout period can be
increased from 10 min to balance overhead

There are several proposals to increase the
constraints on relay selection
◦ Minimize end-to-end timing attacks
◦ Applications choose based on performance
 Node based selection
 Link based selection
 End-to-end based selection

PIR-Tor is easily able to incorporate these
constraints by adjusting the relay selection
algorithms

There are cases where it may be beneficial to
support global view download in addition to
selective download.
◦ Research
◦ Development
◦ Paranoia

Directory servers can be easily modified to
support this option.



Tor is supported by volunteer relays. PIR-Tor
requires a tradeoff between bandwidth use and
computation, requiring a different commitment
from volunteers. However, PIR-Tor reduces overall
resource consumption.
Peer-to-Peer alternatives offer better scaling
properties, but PIR-Tor offers much better security
properties.
The presented analysis of PIR-Tor relies on an
assumption about the standardization of exit
policies, though outlier can be tolerated








Background
Related Work
PIR-Tor
Attack Resistance
Performance
Discussion
Conclusion
References

This paper presents PIR-Tor, a new architecture for
the Tor network that leverages existing Private
Information Retrieval techniques.
◦ Reduces communication overhead by orders of magnitude
◦ Slightly increases computational requirement
◦ Preserves or improves security of existing Tor

Compares two roughly equivalent flavors
◦ Computational PIR-Tor (CPIR-Tor)
◦ Information Theoretic PIR-Tor (ITPIR-Tor)
Tor Project: Overview. 2011.
https://www.torproject.org/about/overview.html.en
Benjamin Franklin. 2011.
http://en.wikiquote.org/wiki/Benjamin_Franklin
Prateek Mittal, Femi Olumofin, Carmela Troncoso, Nikita Borisov, and
Ian Goldberg. PIR-Tor: Scalable anonymous communication using
private information retrieval. Technical Re- port CACR 2011-05,
Centre for Applied Cryptographic Research, 2011.
http://www.cacr.math.uwaterloo.ca/techreports/2011/cacr201105.pdf
Download