Privacy Issues in Peer-To

advertisement
Privacy Issues in
Peer-To-Peer
Systems
Raj Dandage, Tim Gorton,
Ngozika Nwaneri, Mark Tompkins
6.805-p2p@mit.edu
4/26/01
Agenda
Introduction & Status Report
 Definition of peer-to-peer



Privacy Concerns (Threat Model)


What do we care about?
Legal Issues affecting privacy on P2P systems


What it is, what is isn’t, what it used to be, what it should do
What does that law care about?
A few examples of current P2P systems

Analyze w.r.t. goals, privacy concerns, legal issues, etc.
Recommendations
 Synthesis, and Conclusion

Status Report: Goals





develop criteria for evaluating peer-to-peer
applications and architectures with regard to
technical, business, and public policy goals
identify different peer-to-peer applications and
architectures
evaluate these applications and architectures in
terms of the goals set forth and privacy issues
explore legal issues surrounding p2p architectures
develop recommendations for the modification and
design of peer-to-peer systems in order to resolve
privacy concerns and encourage the design of
privacy-enhancing systems
What is P2P? What isn’t?

Old-school “P2P”
Usenet
 DNS
 WWW Hyperlinks


Today’s P2P
Leveraging a new Internet usage model
 Transient connectivity at the “fringes”

Peer-to-Peer Defined
Peer-to-peer is NOT simply illegally sharing
copyrighted material.
 Peer-to-peer computing is sharing of
computer resources and services by direct
exchange. It is about decentralized
networking applications. The “litmus test” for
peer-to-peer:



“does it allow for variable connectivity and
temporary network addresses?
does it give the nodes at the edges of the network
significant autonomy?”
Clay Shirky in Peer-to-Peer
Peer-to-Peer: Hybrid Systems

Hybrid Systems (brokered peer-to-peer
system) uses a centralized server to connect
to computers together before a direct
exchange takes place.


Repeater – someone who publicly shares files that
they are not authors of; Republishing someone
else’s work.
Metadata - the collection of information from
various sources, related and managed in a central
directory for the use of linkage and file sharing.
Privacy Concerns (Threat Models)

Anonymity
… of your identity
 … of your online activity
 … of your publications

Authentication
 Access to your data

data on your local machine
 data transmitted on the ‘net

Possible “Attackers”
Malicious hacker
 Governments (court order, wiretapping)
 Employers
 ISP’s
 Operators of P2P systems (ex Napster)
 Another everyday user

Legal Issues affecting P2P privacy

Arenas of Concern




Copyright
Libel
Censorship (more political than legal)
Who is liable/in danger?




ISP’s?
Service operators?
Individual developers?
End users?
Copyright

Direct Infringement


Contributory Infringement




when end users do Bad Things
Some act of direct infringement by someone else
Defendant “knew or should have known” of infringement
Defendant “materially contributed” to infringement
Vicarious Infringement (Napster)



Some act of direct infringement by someone else
Defendant had the “right or ability to control” the infringer
Defender derived a “direct financial benefit” from the
infringement (Napster has no business model.)
Digital Millennium Copyright
Act of 1998 (DMCA)
Prohibits “circumvent[ing] a technological measure
that effectively controls access to a work protected
under this title”
 Exempts “service providers” from copyright liability if:




they block copyrighted material after they are notified by a
copyright holder,
they identify an infringing user to a copyright holder upon
being issued a subpoena,
and they don’t interfere with “standard technical measures”
used to protect or identify copyrighted material
Who are “service providers”?
“an entity offering the transmission, routing,
or providing of connections for digital online
communications, between or among points
specified by a user, of material of the user's
choosing, without modification to the content
of the material as sent or received.” sec 512
(k)(1)
 Also “provider of online services or network
access, or the operator of facilities therefor”
 ISP’s, P2P system operators… end users?

Libel: CDA
CDA immunizes providers and users of “interactive
computer systems” from being treated as speakers or
publishers of information provided by a 3rd party
 “‘interactive computer system’ means any information
service, system, or access software provider that
provides or enables computer access by multiple users
to a computer server, including specifically a
service
or system that provides access to the Internet and
such systems operated or services offered by libraries
or educational institutions.”
 so… your computer might be a “server”

Censorship
Subverting censorship of authoritarian governments
by providing anonymous publication is a stated goal
of several P2P systems
 Examples of authoritarian governments:




Australian law would make supplying R-rated material
illegal
US Courts have ruled that the DMCA makes supplying the
DeCSS code or linking to a site that supplies the DeCSS
code illegal
Naturally, there are others…
Who’s in legal trouble?

P2P system operators


ISP’s


Users’ copyright violations--ISP’s must disable access when
notified by copyright holder
P2P system developers


Must disable access when notified of copyright infringement,
may serve as a circumvention of a TPM as per DMCA
DMCA: they may produce TPM circumvention technology
P2P users

They’re often doing Bad Things. But what if they’re just
forwarding content, perhaps unknowingly? Libel? Copyright?
Targeted by authoritarian regimes?
Example P2P Systems
Possible threats to privacy and usability
 Example P2P systems/protocols:

What is it?
 How does it work?
 What are its business and public policy
goals?
 How does it address the threats in our
model?

Possible Privacy Threats to
P2P Systems

Monitoring of transactions



Tracking systems placed on network
Monitoring of data at or going through a node
Manipulation of transactions


Forgery of data
Filtration of transaction information
Impersonation and misrepresentation
 Identification of individuals or nodes
 Legal action
 Social pressure and external action

Possible Usability Threats to
P2P Systems
Denial of service
 Unreliability and transient availability of
resources
 Blocking of access to network resources




Malicious content


Firewalls
NATs
Viruses
Freeloading and inequitable use of resources
Example P2P Applications and
Networks








Napster
Gnutella (BearShare)
SETI@home
Freenet (Espra)
FreeHaven
Mojo Nation
Jabber / AOL Instant Messenger
Groove.net
Napster: What is it?
“The largest, most diverse online
community of music lovers in history."
 A file transfer system for music lovers to
search for and trade mp3’s
 Also features:

user hotlist
 chatrooms
 instant messaging

Napster: How does it work?

“hybrid” P2P architecture
centralized server takes all file requests,
searches dynamically updated database
 server brokers connections between clients
for decentralized downloads

Napster: Original Business
and Public Policy Goals
create an easy way to search for and
share music for free over the internet
 take advantage of latent disk space on
edges of internet
 avoid copyright issues by having each
user responsible for their own content

Napster: Current Business
and Public Policy Goals

Avoid lawsuits!
Metallica
 Filename filtering
 Monthly fee?


Get musicians on their side


“empower yourself!”
Get activists on their side

Napster Action Network
Napster: How does it address
the threats in our model?

Monitoring of transactions, identifying
individuals




Tracking programs
Users can log usernames/files downloaded from
them
Possible to search entire shared file directory of a
user (hotlist)
Impersonation and misrepresentation

Only one username per program – cannot change
Napster: How does it address the
threats in our model? (cont’d)

Legal action


Denial of Service Attack


Very vulnerable, as we have seen
Would prevent searches, but not file
transfers
Malicious Content

Everything is mp3 format
Gnutella: What is it?
A protocol, not an actual program
 Completely decentralized architecture –
“pure” P2P
 Used for file transfer
 Open source, so many other programs have
built off of it




BearShare
LimeWare
GnuFrog
Gnutella: How does it work?
Works like the real world (gossip, wordof-mouth)
 Makes initial connection to other hosts
in cache (ping)
 Broadcasts, propagates queries to these
hosts
 Responses travel back along same path
 Connects directly to transfer files

Gnutella: How does it work?
(cont’d)
Gnutella: Business and Public
Policy Goals

“internet on top of the internet”


Decentralization


New real-time search engine model
No single point of failure
Open source code

Allows for new innovations, freelance
application development
Gnutella: How does it address
the threats in our model?

Monitoring of Transactions, Identification



Tracking programs
Users can see requests passed through their node,
but not original sender
Users can log IP’s of nodes with whom they
transfer files


Zeropaid.com’s Wall of Shame
Legal Action

Who can copyright holders realistically sue?
Gnutella: How does it address the
threats in our model? (cont’d)
Denial of Service Attacks
 Unreliability of resources



Malicious content




Finding initial group of peers
Mandragore scare
Know what you’re downloading
Trust who you’re downloading from
Freeloading


Increases the length of search requests
Some software, like LimeWare, allows users to
have “preferences” to nodes who are also sharing
material
Gnutella: Scalability Issues and
Bandwidth Inequity

Clip2 Reflectors – “super peers”
Gnutella: Scalability Issues and
Bandwidth Inequity (cont’d)

BearShare v. 3.0.0 Alpha

3 modes
Client (low bandwidth)
 Server/Defender (high bandwidth)
 Peer (normal)


Centralizes system somewhat, provides
targets, but increases efficiency
Copyright Violation Trackers
on Napster and Gnutella

Copyright Agent


Roy Orbison fans beware!
Media Tracker
Masquerades as a user
 Logs IP’s, ISP’s, files
 Operated from outside US, so not subject
to US privacy laws

Monitoring of Transactions on
Napster and Gnutella (cont’d)

Screenshot of Media-Tracker
SETI@home: What is it?
Allows PC owners to help in the search
for extraterrestrial intelligence
 Free screensaver, analyzes radio
telescope data when PC is idle

SETI@home: How does it
work?

Not “pure” P2P
Central server sends data to hosts
 Hosts compute FFT’s on data, send results
back to server
 No inter-host communication


Example of how processing power can
be shared among computers
SETI@home: What are its
business and public policy goals?
Find more aliens in less time
 Create a community of extraterrestrial
enthusiasts using a participatory
medium
 Other possible applications for
distributed computing

Code breaking
 Genetic analysis

SETI@home: How does it
address the threats in our model?

Manipulation of Transactions

Doctored versions
Trying to find better ways to compute FFT’s
 No open source code


Doctored result files

Encryption, checksums
SETI@home: How does it address
the threats in our model? (cont’d)
Identification of individuals or nodes
 Denial of Service
 Unreliability of resources



Redundant data units distributed
Malicious content

Downloads data, not executables
Freenet: What is it?
Distributed, decentralized, anonymous
publishing system
 Like one enormous, shared hard drive

Freenet: How does it work?

Every data has a key



Need to know key to access data
No effective search mechanism yet
Key search: uses a depth-first search along
nodes



If a node does not have a key, it directs to node
with “closest” key
Unique ID’s, routing data back, nodes cache data
along way
more scalable, efficient than broadcast – routes
you closer each hop
Freenet: How does it work?
(cont’d)
Every node allocates space to be used by
network
 Cannot update files
 Sends key request w/ unique ID


InsertRequest


Checks if data already exists
DataRequest
If next node contains key, returns data along
same path
 If not, finds the “closest key”, forwards to
that node

Freenet: How does it work?
(cont’d)

Key/data stack model
Freenet: What are its business
and public policy goals?
Prevent censorship of documents
 Provide anonymity of users
 Plausible deniability for node operators


Must trace back requests through every
node in path
Remove any single point of control
 Keep most requested data, not most
“acceptable” data

Freenet: How does it address
the threats in our model?

Monitoring of transactions


Manipulation of transactions



Hard unless you have control of many nodes
Attacker cannot forge data or update it
Every node checks key for validity of document
while it is being forwarded back
Impersonation and misrepresentation

No way to know where data comes from anyway
Identification of individuals or nodes
 Legal action


Plausible deniability for requests
Raj’s pictures
FreeHaven: What is it?
Network that allows users to publish
documents
 Provides anonymity, server
accountability, and equitability of
resource distribution

FreeHaven: How does it work?





Distributed network of servers
Servers communicate through anonymous
channels, such as reply blocks sent via remailers
Data enters and propagates through the
network through the process of trading
Files are divided into pieces and distributed
among servers, only a subset of which are
needed to reconstruct the file
All data is encrypted and signed before
transfer or storage
FreeHaven: What are its business
and public policy goals?

Business goals


To be used in conjunction with services such as
FreeHaven to provide long-term, popularity
independent data storage
Public policy goals



Anonymity of author, publisher, reader, document,
server, and query
System accountability (as opposed to user
accountability)
Equity of resource distribution
FreeHaven: How does it address
the threats in our model?

Monitoring of transactions



Manipulation of transactions



All FreeHaven traffic is encrypted in transit and in
storage
Document requests are forwarded through the
system via anonymous re-mailers
All data segments are signed
Only a subset of the segments are required to
reconstruct the data
Impersonation and misrepresentation
FreeHaven: How does it address
the threats in our model? (cont’d)

Identification of individuals or nodes



Author/publisher anonymity through trading
Server anonymity through pseudonyms and
anonymous communication via re-mailer reply
blocks
Legal action, social pressure, external action




No central authority to be held accountable
“Plausible deniability:” server does not know what
data it is storing or what is being requested
Only a subset of the servers must be available to
reconstruct the data
Data cannot be revoked from the network
FreeHaven: How does it address the
threats in our model? (Cont’d)

Denial of service, unreliability of resources


Only a subset of the servers must be available to
reconstruct data
Accountability mechanisms for servers
Blocking of access to network resources
 Malicious content
 Freeloading and inequitable resource use


Must donate space to publish data
Mojo Nation: What is it?
Distributed, micro-payment based
publishing/resource distribution system
 Resource consumers and providers
make “capitalist” exchanges of
resources (storage space, computation)

Mojo Nation: How does it
work?
Content trackers keep list of content
pieces and addresses of nodes that
have them
 Query different nodes until you have all
of the parts needed to reconstruct the
file

Mojo Nation: What are its business
and public policy goals

Business goals

Public policy goals
Mojo Nation: How does it address
the threats in our model?
Monitoring of transactions
 Manipulation of transactions
 Impersonation and misrepresentation
 Identification of individuals



May be addressed in future by payment for “hops”
over a number of nodes, but not currently
addressed
Legal action

“Plausible deniability” because server does not
have enough of a document to reconstruct it
Jabber/AIM: What are they?
Instant messaging platforms
 Jabber provides universal connectivity
to other IM services, including AIM,
ICQ, MSN Messenger
 Jabber designed as protocol to allow for
person-to-person as well as app-to-app
communication

Jabber/AIM: How do they
work?

AIM


Client/server: almost all data relayed through AOL
servers
Jabber



Distributed system of servers, each presiding over
a namespace
When a server receives a message, it will forward
it to its peers if recipient not in its namespace
Communicate via XML or proprietary protocols
where necessary
Jabber/AIM: What are their
business and public policy goals?

AIM

Business goals




Large scale IM solution, centralized
Supported by advertisements
Public policy goals
Jabber

Business goals



Open source, open structure for naming, presence, and
"roster" (buddy list) information
Allow users to have one client for multiple IM protocols
Public policy goals
Jabber/AIM: How do they address
the threats in our model?

Monitoring of transactions


Data generally sent clear-text through (possibly)
untrusted servers
Jabber’s XML structure allows for security for
certain apps using encryption and vCard, but not
supported in the standard
Manipulation of transactions
 Impersonation and misrepresentation




There have been several cases of ID theft and
password fraud on AIM
Jabber allows for dialback to prevent spoofing
Identification of individuals or nodes
Jabber/AIM: How do they address
the threats in our model? (Cont’d)

Legal action, social pressure, denial of
service
AIM servers all centralized
 Jabber servers distributed, each presides
over separate namespace

Blocking of access to resources
 Unreliability of resources
 Malicious content

Groove: What is it?
“Shared space” for real-time
collaboration
 Chat, IM, whiteboard, group web
browsing, calendar, discussion board,
integration with other applications

Groove: How does it work?
End-user application connects directly
with peers, but can use gateway
servers if necessary
 All data in XML format
 Different modes of operation to provide
different levels of anonymity of
participants

Groove: What are its business
and public policy goals?

Business goals

Public policy goals
Groove: How does it address
the threats in our model?

Monitoring of transactions


Manipulation of transactions


All data is signed so it cannot be manipulated
Impersonation and misrepresentation




All data is encrypted in transit and in storage
Key distribution system uses SDSI-type attributes
All invitation messages are signed and sent with
signer’s public key
Recipient can compute “fingerprint” from public
key and check it against previously known value
Identification, legal action, etc.
Groove: How does it address the
threats in our model? (Cont’d)

Denial of service


Blocking of access to network


Central servers used only when necessary
Can work through gateway servers designed to
tunnel through firewalls, etc.
Unreliability and transient availability

All communication is mirrored locally for all
participants
Malicious content
 Freeloading and inequity

Download