An anonymous P2P file sharing system

advertisement
CS 265 Project Report
Anonymous Peer-to-Peer File Sharing
Student: Jianning Yang (tomyang13@yahoo.com)
Advisor: Dr. Mark Stamp (stamp@cs.sjsu.edu)
1
Introduction
Recently lots of attention is given to a distributed framework called Peer-to-Peer (P2P) network. A P2P network
consists of a large number of peers interconnected together to share all kinds of digital content. In a P2P network, there
is no central or dedicated server and all peers are independent from each other. A key weakness of most existing P2P
systems is the lack of anonymity. Without anonymity, it is possible for third parties to identify the participants
involved. As a result, a considerable amount of research has being done in the field of anonymity.
In their seminal article, Bennet and Grothoff [4] state the following anonymity requirements in P2P system.
1. Anonymous P2P system should make it impossible for third parties to identify the participants involved.
2. Anonymous P2P system should guarantee that only the content receiver knows the content.
3. Anonymous P2P system should allow the content publisher to plausibly deny that the content originated from
him or her.
The aim of this project is to design an Anonymous P2P File Sharing System (APTPFS) by utilizing a variety of crypto
techniques learned from CS 265. The system operates as a distributed file system across many computers that allow
files to be stored and requested anonymously.
2
2.1
Overview of P2P Network
Network Topology
There are two major P2P network topologies: centralized topology, and decentralized topology. In centralized
topology, network functionality depends largely on the central server and each peer accesses a central server to upload
and download information. The central server coordinates the communication among peers. Decentralized P2P
topology does not contain any central server. In this kind of topology, peers are normally organized in an unstructured
fashion. The most popular decentralized P2P systems are Freenet, GNUnet, Gnutella, and MUTE. Decentralized
networks have many advantages over centralized ones. One of the advantages is that decentralized networks could
eliminate single points of failure. Another advantage is that decentralized networks are very scalable.
2.2
Peer Addressing
Like other kinds of networks, a P2P network provides addressing scheme to identify each individual peer. Because
communication is always initiated by each individual peer, peers are not required to have static addresses. For
example, each peer in the MUTE network is identified by a randomly generated virtual address [9].
2.3
Discovering Peers
In P2P network, new peer can easily discover existing peers via host listing and host caching. Host listing uses a
centralized server to maintain a list of active peers. When a new peer wants to join the network, it first registers with
the server. Following the registration, the server will return contact information of some active peers to which the new
peer should connect. Because the server randomly select active peers, host listing encourages sparseness and it creates
nearly optimal network structure [1] [6]. In contrast, host caching mechanism does not require any centralized server.
In host caching, information about active peers is exchanged among peers. The information is then stored in the local
cache and can be used later. However, the local cache is empty initially so that host listing should be used for the
initial peer discovery. Once the initial peers have been discovered, the host caching can be used exclusively.
Therefore, discovering peers in P2P networks always relies on some degree of centralization [1].
APTPFS – An Anonymous P2P File Sharing System
3
3.1
System Architecture
Figure 1 depicts the basic system architecture of APTPFS. When a new peer wants to join the APTPFS system, it first
establishes a SSL connection to a Peer Listing Server (PLS) and sends a registration request to that server. The main
purpose of PLS is to maintain a list that contains information regarding current active peers inside the system. Upon
receiving the request, the PLS adds the new peer’s information (e.g. IP and Port) into the list. PLS also randomly
selects a few peers from the list and sends their information back to the new peer. The new peer then creates and
maintains SSL connections to those selected peers. Consequently those selected peers become neighbors of the new
peer. Because APTPFS is a decentralized P2P system, there is no single server in the system to coordinate file sharing
process. PLS is not considered as a server since it only serves as a registry and is consulted only when a peer wants to
join or leave the system. Each peer, identified by a randomly generated virtual address, only keeps direct connections
with its neighbors. All queries are propagated through the network hop-by-hop, and responses are routed back through
paths discovered by queries.
AFTPFS
Peer Listing
Server (PLS)
SSL
Registration
Peer E
Registration
SSL
Registration
File
SSL
Query
Peer C
SSL
Query
SSL
Peer A
SSL
Query
Query
SSL
Peer D
SSL
Query
SSL
Peer B
Query
Peer F
File
Figure 1
3.1.1 Virtual Address
Inspired by MUTE [9], APTPFS hide peer’s identity with virtual address. When a peer starts up, it is assigned a virtual
address that is generated randomly. During starts up, the peer generates a random number, appends IP address to the
random number, and then uses SHA1 algorithm (e.g. SHA1 (RNG () + IP), where RNG is a random number
generation function) to generate a virtual address. Virtual address is the sole identification of peer in APTPFS.
Because virtual address is calculated through one-way hashing, peers could not identify each other’s true identity (e.g.
IP) based on virtual address.
In support of virtual address, each peer maintains a routing table that maps virtual addresses into its neighbor
connections (e.g. SSL connections). To populate routing table, each APTPFS peer records recent neighbor connection
information on which it has received messages from other peers. When routing table is full, the least used entry will be
purged. When a peer receives a message, it checks the virtual address inside the message header and tries to match that
virtual address with one of the routing table entries. If it finds an entry, then it randomly chooses a neighbor
connection from that entry and forwards the message to that neighbor; otherwise it rejects the message.
Peer Z
Peer B
Peer A
Peer X
Peer Y
Peer W
Figure 2
In figure 2, assume that peer X received messages sent by peer A through two of its neighbors (e.g. peer Y and peer
Z). Peer X would then populate its routing table as below (table 2). Later when peer X receives a message from peer B,
it checks virtual address of the target peer inside message header. Suppose that the target peer is peer A. Peer X knows
that it should forward this message to peer Z or peer Y based on routing table. Peer X then randomly chooses one of
these two neighbors and forwards the message to it.
Peer X’s Routing Table
Peers
Neighbor Connections
SSL Connection to Peer Z
Peer A’s virtual address
√
SSL Connection to Peer Y
SSL Connection to Peer W
√
Note: √ indicates that messages addressed to a particular peer could be forwarded to the neighbor connection.
Table 1
3.1.2 Hop-by-Hop Connection
Each APTPFS peer maintains SSL connections to all of its neighbors, and these connections are used for transporting
APTPFS messages (both queries and responses) hop-by-hop. In other words, there is no direct connection among peers
except connections to their neighbors.
In Figure 2, peer A maintains two connections to its neighbors (peer Z and peer Y). Suppose peer A wants to send out
a message to peer B, it can not directly establish a connection to peer B since it only knows peer B’s virtual address.
Therefore, it inserts peer B’s virtual address into the message header and forwards the message to one of its neighbors.
Let’s assume that it forwards the message to peer Z, which in turn forwards the message to peer X. Upon receiving the
message from Z, peer X forwards it to peer B that happens to be the message destination. In this example, the route of
this message is A->Z->X->B.
3.1.3 Query Hashing
In APTPFS, queries are hashed so that only query sender knows query content. Let’s suppose that peer A wants to
download a file called “dreams.mid” in figure 2. Peer A first calculates the hash value of “dreams.mid”. Then peer A
broadcasts the query along with the hash value. When an intermediary peer (e.g. peer Z or peer X) receives that query,
it does not know what file that peer A is looking for since hashing is a one-way function. When the query is
propagated to peer B, peer B finds a match since it registered a file under the same hash value. Similarly peer B also
does not know the file name that peer A is looking for.
One way for an attacker to reveal the original query is via dictionary attack where he or she could pre-compute hash
values of some popular keywords and save them into a dictionary for look up. To thwart dictionary attack, APTPFS
uses multiple rounds of SHA1 hashing (e.g. Hash(filename) = SHA1(SHA1(filename)+VirtualAddress)). Because an
attacker could still reveal the original query by simply re-computing data dictionary as long as he or she has enough
computation power, query hashing only serves to prevent normal users from revealing the query contents. However,
exposing query content does not break anonymity as long as the identity of query sender or query receiver is not
exposed; this is true in APTPFS since it uses virtual address to hide peer identity.
3.1.4 FORWARD_HASH Flag
One problem with the broadcast scheme is that it may expose the query sender’s identity. For example, if the
maximum of TTL is 7, then receiving a query with a TTL value 7 from a neighbor means that the neighbor is the
query’s originator.
MUTE provides a clever approach to deal with this issue. The idea is to send each broadcast message along with a
hash value in FORWARD mode for the first few hops. Each intermediary peer rehashes the hash value and uses the
last 2 digits of the hash as a random value to determine whether or not the message should be switched out of
FORWARD mode. If the FORWARD mode should continue, the new hash value should replace the old hash value.
While in FORWARD mode, the TTL will be kept intact [9].
APTPFS uses a similar approach to protect the query sender’s privacy. The FORWARD mode signaled by
FORWARD_ flag and a 20-byte SHA1 hash value. Each intermediary peer simply re-hashes the current hash value,
which generates another hash. A trigger value is taken from the last byte of the hash. For example, to switch the
FORWARD mode with a 1 in 3 chance, peers simply looks at the last byte of the newly-generated hash and switches
the mode if the last byte is less than or equal to 85 (there are 256 possible values for the last byte). SHA1 is a
cryptographically secure one-way function, so it is very difficult to obtain previous values from the current value, and
thus it is difficult to determine how far a message has traveled so far [9].
Figure 3 shows an example of how FORWARD_HASH works in APTPFS. For simplification, it only shows the last
byte of the forward hash value. The shaded nodes are colluding nodes while the dotted lines indicate hop-by-hop
message propagation paths.
Peer Z
Peer B
FORWARD_155
TTL: 5
Peer A
Peer X
FORWARD_231
FORWARD_211
Peer Y
Peer C
FORWARD_155
Peer W
TTL: 4
Figure 3
Suppose that peer A is the query sender in Figure 3. Peer A randomly calculates a forward hash and send the query
with the forward hash to peer Z and peer Y. Peer Z and peer Y can not determine the origin of this query from the
forward hash. To decide what to do with the query, they rehash the forward hash to generate a new forward hash.
Because the last byte of the new forward hash is 155, which is greater than the trigger value 85, they forward the query
to peer x with the new forward hash. Once receives the query, peer X again rehashes the received forward hash to
generate a new forward hash. Then it sends the query with the new forward hash to peer B since the last byte of the
new forward hash is 211, which is still greater than the trigger value 85. Now suppose peer B rehash the received
forward hash and finds that last byte of the new hash is 76, which is less than the trigger value 85. Peer B then
removes the forward flag from the query and sends the query to peer C in normal TTL mode.
3.1.5 DROP_CHAIN Flag
FORWARD_HASH scheme ensures that an attacker cannot easily determine the sender of a message, but it does
ensure the anonymity of peers that are responding with messages. For example, an attacker could send a search request
with TTL value 0 to force the neighbor to send back its own results without passing the message further [9].
MUTE provides an efficient way to thwart this kind of attacks. Once a MUTE node receives a message with TTL
value 0, it will switch the message into CHAIN mode, and the message will still travel through many peers before
being dropped. Each of these peers could send back results, so the attacker receives cannot be associated results with
the neighbor node. Whenever a peer startups, it decides randomly whether it will drop chain messages or pass them.
Each peer also picks one neighbor to pass chain messages to it as long as that neighbor remains connected [9].
APTPFS takes the same approach to protect message responder’s anonymity. It uses DROP_CHAIN flag to indicate
the CHAIN mode. Whenever a peer startups, it will randomly choose a neighbor so it can forward those chain
messages to that neighbor later. In APTPFS, each peer has a probability of 70% to accept chain message. Therefore
the average tail chain size is 2 peers (70%*70%=49%). One thing to note is that the query messages will only be
broadcasted while in FORWARD mode and TTL mode.
Figure 4 shows an example of how DROP_CHAIN works in APTPFS. The shaded node is a colluding node and dotted
lines indicate the drop chain path.
Peer Z
Peer B
DROP_CHAIN
DROP_CHAIN
Peer A
DROP_CHAIN
Peer X
TTL: 0
Peer Y
Peer S
Figure 4
Suppose that Peer S sends a query with TTL value 0 to peer A. Instead of sending the response back immediately to
peer S, peer A set DROP_CHAIN mode inside the query message and forward the message to one of its neighbors
(e.g. peer Z). Consequently the query message travel through peer X before being dropped by peer B. Because any one
of those peers in the drop chain path can send back the response, peer S can not link the response to peer A by simply
set the TTL value to 0 inside the query message.
Put it all together, a typical APTPFS query message travels through a random number of hops (range from 1 to 3) in
FORWARD mode, switches to TTL mode and travel through a maximum of 5 hops, and continue to travel through a
random number of hops (range from 1 to 4) in CHAIN mode before being dropped.
3.1.6
Anonymous Content Publishing
In order to publish a file, the user only needs to specify the name of that file, which should be inside a special APTPFS
folder. Upon receiving the publishing request, the peer uses the file name to calculate a hash value H1. Then it
generates a random number as the AES secret key SK to encrypt the file and use the encrypted file to calculate a hash
value H2. Finally it invokes a two-steps process to distribute the encrypted file and the secret key into other peers. In
the first step, it randomly chooses two peers from its routing table, and sends the hash values H1/H2 and the secret key
SK to each of those peers. In the second step, it randomly chooses another two peers from its routing table, and sends
the hash values H1/H2 and the encrypted file to each of those peers. If the publishing process is successful, it will
delete the encrypted file from the local APTPFS folder; otherwise, it will prompt the user with an error message.
In support of content publishing, each peer keeps two tables: Hash-to-Key table and Hash-to-File table. Hash-to-Key
table maps each hash pair (e.g. H1/H2) into a secret key (e.g. SK) used to decrypt the file. Hash-to-File table maps
hash pair (e.g. H1/H2) into a encrypted file.
This approach is similar to GNUnet’s content publishing approach [10]. The difference is that GNUnet splits the file
into small chunks and send those chunks to some selected peers so that each of those peers only keeps a portion of the
file while in APTPFS each of selected peers keep a complete encrypted file.
Figure 5 illustrates an anonymous content publishing scenario which shows that peer A distributes a encrypted file into
peer F and peer H as well as distributes a secret key into peer G and peer J.
Peer F
Peer E
Peer D
Peer H
H1/H2
AES (file, RN)
Peer B
H1/H2
AES (file, RN)
Peer I
Peer C
Peer A
Peer G
Peer J
H1/H2
RN
H1/H2
RN
H1: The hash of file name
H2: The hash of encrypted file
RN: The random AES key
AES (file, RN): the encrypted file
Note: Dotted lines form file distribution paths while thick links form key distribution paths
Figure 5
Using this scheme, the content publisher could plausibly deny that the content originated from him or her. It is true
because the key and the file only reside in other peers. On the other side, administrators of those peers in which the
key or the encrypted file resides can also deny that they have any knowledge about the published file since the key or
the file is sent to those peers without the administrator’s awareness. Furthermore, because it is unlikely that a peer can
have both the key and the file, they could claim that they don’t have both key and encrypted file for decryption.
3.1.7
Anonymous Content Retrieval
To download a file, the content receiver CR first broadcasts a search query with the hashed file name
SHA1(SHA1(name), CR’s virtual address). When a peer in the network receives such a query, it consults its Hash-toFile table to see if there is a match. If it finds a H1 value that matches SHA1(SHA1(name), CR’s virtual address) in
Hash-to-File table, it sends back the relevant file information (e.g. file size) along with H1/H2 hash pair to CR.
Needlessly to say, the search result is routed back hop-by-hop.
Because CR may receive the query result from many peers, it prompts the user to selects one file to download. After
the user selects a file, CR generates two queries: the download file query and the download key query. CR first
broadcasts the download key query along with SHA1(H1, CR’s virtual address) and H2. When a peer in the network
receives such a query, it consults its Hash-to-Key table to see if there is a match. If it finds an entry matches with
SHA1(H1,CR’s virtual address)/H2 pair in Hash-to-Key table, it sends back the corresponding secret key to the
content receiver. After receiving the key, CR sends a download file query along with SHA1(H1, CR’s virtual address)
and H2 to the peer that is holding the selected file. Upon receiving the download file query, the peer consults its Hashto-File table and sends the corresponding encrypted file to CR hop-by-hop. After CR receives both the encrypted file
and the secret key, it decrypts the file and copies the decrypted file into a public folder so the user could use it. Finally
it randomly chooses to either keep the secret key or the encrypted file in order to serve future requests from other
peers.
3.1.8
Miscellaneous
All communication links in APTPFS are secured by Secure Socket Layer (SSL) to prevent network eavesdropping.
Each APTPFS message size is fixed at 1KB. For messages larger than 1KB, the system would split the message into
1KB chunks and deliver those chunks one by one. For messages smaller than 1KB, the system may add some padding
into the message. The purpose of unified message size is to make it very difficult for an attacker to differentiate
different kinds of messages through traffic analysis. In addition, each peer randomly generates some bogus messages
and forwards those messages to its neighbors to further thwart traffic analysis attack.
4
Conclusion
This paper presents an anonymous P2P file sharing system that ensures anonymity for both producers and consumers
of file. By using SSL, the APTPFS system prevents network eavesdropping. Because each APTPFS peer uses virtual
address to hide its identity and always avoids direct connection to others, it is highly unlikely that third parties can
identify the participants involved in a file sharing session [4]. Additionally, APTPFS’s anonymous content publishing
allows the publisher to plausibly deny that the content originated from him or her [5] [3]. Finally, APTPFS takes many
measures (e.g. forward hash [9], and drop chain [9]) to protect anonymity against traffic analysis attacks.
5
1.
2.
3.
4.
5.
6.
7.
8.
9.
10.
11.
12.
13.
14.
15.
16.
References
Andy Oram, Peer-To-Peer: Harnessing the Power of Disruptive Technologies, O’Reilly & Associates, 2001
Brendon Wilson, JXTA, New Riders Publishing, 2002
Christian Grothoff, An Excess-Based Economic Model for Resource Allocation in Peer-to-Peer Networks, Purdue
University, 2003
Christian Grothoff, Ioana Patrascu, Krista Bennett, Tiberiu Stef, and Tzvetan Horozov, The GNet Whitepaper,
Purdue University, 2002
David Goldschlag, Michael Reed and Paul Syverson, Onion Routing for Anonymous and Private Internet
Connections, Naval Reaearch Laboratory, 1999
Dana Moor, John Hebeler, Peer-to-Peer: Building Secure, Scalable, and Manageable Networks, McGraw-Hill
Osborne Media, 2001
Dennis Kügler, An Analysis of GNUnet and the Implications for Anonymous, Censorship-Resistant Networks,
Federal Office for Information Security, 2003
Ian Clarke, Oskar Sandberg, Brandon Wiley, Theodore W. Hong, Freenet: Distributed Anonymous Information
Storage and Retrieval System, In Proc. of the ICSI Workshop on Design Issues in Anonymity and
Unobservability, Berkeley, CA, 2000
Jason Rohrer, MUTE: Simple, Anonymous File Sharing, http://mutenet.sourceforge.net, 2004
Krista Bennet, Christian Grothoff, Tzvetan Horozov, Ioana Patrascu, and Tiberiu Stef, GNUNET – A truly
anonymous networking infrastructure, Purdue University, 2002
Michael J. Freedman, Emil Sit, Josh Cates, and Robert Morris, Introducing Tarzan: A Peer-To-Peer Anonymizing
Network Layer, MIT Laboratory for Computer Science, 2002
Michael K. Reiter and Aviel D. Rubin, Crowds: Anonymity for Web Transactions, AT&T Labs, 1998
National Security Agency, CNSS Policy No. 15, Fact Sheet No. 1, National Security Agency, 2003
Rags Srinivas, Using AES with Java Technology,
http://java.sun.com/developer/technicalArticles/Security/AES/AES_v1.html, 2003
RSA Laboratories, RSA Laboratories' Frequently Asked Questions About Today's Cryptography, Version 4.1,
http://www.rsasecurity.com/rsalabs/node.asp?id=2152, 2004
Taher Elgamal, The Secure Sockets Layer Protocol (SSL), Danvers IETF Meeting, 1995
Download