Privacy Issues in Peer-To-Peer Systems Raj Dandage, Tim Gorton, Ngozika Nwaneri, Mark Tompkins 6.805-p2p@mit.edu 4/26/01 Agenda Introduction & Status Report Definition of peer-to-peer Privacy Concerns (Threat Model) What do we care about? Legal Issues affecting privacy on P2P systems What it is, what is isn’t, what it used to be, what it should do What does that law care about? A few examples of current P2P systems Analyze w.r.t. goals, privacy concerns, legal issues, etc. Recommendations Synthesis, and Conclusion Status Report: Goals develop criteria for evaluating peer-to-peer applications and architectures with regard to technical, business, and public policy goals identify different peer-to-peer applications and architectures evaluate these applications and architectures in terms of the goals set forth and privacy issues explore legal issues surrounding p2p architectures develop recommendations for the modification and design of peer-to-peer systems in order to resolve privacy concerns and encourage the design of privacy-enhancing systems What is P2P? What isn’t? Old-school “P2P” Usenet DNS WWW Hyperlinks Today’s P2P Leveraging a new Internet usage model Transient connectivity at the “fringes” Peer-to-Peer Defined Peer-to-peer is NOT simply illegally sharing copyrighted material. Peer-to-peer computing is sharing of computer resources and services by direct exchange. It is about decentralized networking applications. The “litmus test” for peer-to-peer: “does it allow for variable connectivity and temporary network addresses? does it give the nodes at the edges of the network significant autonomy?” Clay Shirky in Peer-to-Peer Peer-to-Peer: Hybrid Systems Hybrid Systems (brokered peer-to-peer system) uses a centralized server to connect to computers together before a direct exchange takes place. Repeater – someone who publicly shares files that they are not authors of; Republishing someone else’s work. Metadata - the collection of information from various sources, related and managed in a central directory for the use of linkage and file sharing. Privacy Concerns (Threat Models) Anonymity … of your identity … of your online activity … of your publications Authentication Access to your data data on your local machine data transmitted on the ‘net Possible “Attackers” Malicious hacker Governments (court order, wiretapping) Employers ISP’s Operators of P2P systems (ex Napster) Another everyday user Legal Issues affecting P2P privacy Arenas of Concern Copyright Libel Censorship (more political than legal) Who is liable/in danger? ISP’s? Service operators? Individual developers? End users? Copyright Direct Infringement Contributory Infringement when end users do Bad Things Some act of direct infringement by someone else Defendant “knew or should have known” of infringement Defendant “materially contributed” to infringement Vicarious Infringement (Napster) Some act of direct infringement by someone else Defendant had the “right or ability to control” the infringer Defender derived a “direct financial benefit” from the infringement (Napster has no business model.) Digital Millennium Copyright Act of 1998 (DMCA) Prohibits “circumvent[ing] a technological measure that effectively controls access to a work protected under this title” Exempts “service providers” from copyright liability if: they block copyrighted material after they are notified by a copyright holder, they identify an infringing user to a copyright holder upon being issued a subpoena, and they don’t interfere with “standard technical measures” used to protect or identify copyrighted material Who are “service providers”? “an entity offering the transmission, routing, or providing of connections for digital online communications, between or among points specified by a user, of material of the user's choosing, without modification to the content of the material as sent or received.” sec 512 (k)(1) Also “provider of online services or network access, or the operator of facilities therefor” ISP’s, P2P system operators… end users? Libel: CDA CDA immunizes providers and users of “interactive computer systems” from being treated as speakers or publishers of information provided by a 3rd party “‘interactive computer system’ means any information service, system, or access software provider that provides or enables computer access by multiple users to a computer server, including specifically a service or system that provides access to the Internet and such systems operated or services offered by libraries or educational institutions.” so… your computer might be a “server” Censorship Subverting censorship of authoritarian governments by providing anonymous publication is a stated goal of several P2P systems Examples of authoritarian governments: Australian law would make supplying R-rated material illegal US Courts have ruled that the DMCA makes supplying the DeCSS code or linking to a site that supplies the DeCSS code illegal Naturally, there are others… Who’s in legal trouble? P2P system operators ISP’s Users’ copyright violations--ISP’s must disable access when notified by copyright holder P2P system developers Must disable access when notified of copyright infringement, may serve as a circumvention of a TPM as per DMCA DMCA: they may produce TPM circumvention technology P2P users They’re often doing Bad Things. But what if they’re just forwarding content, perhaps unknowingly? Libel? Copyright? Targeted by authoritarian regimes? Example P2P Systems Possible threats to privacy and usability Example P2P systems/protocols: What is it? How does it work? What are its business and public policy goals? How does it address the threats in our model? Possible Privacy Threats to P2P Systems Monitoring of transactions Tracking systems placed on network Monitoring of data at or going through a node Manipulation of transactions Forgery of data Filtration of transaction information Impersonation and misrepresentation Identification of individuals or nodes Legal action Social pressure and external action Possible Usability Threats to P2P Systems Denial of service Unreliability and transient availability of resources Blocking of access to network resources Malicious content Firewalls NATs Viruses Freeloading and inequitable use of resources Example P2P Applications and Networks Napster Gnutella (BearShare) SETI@home Freenet (Espra) FreeHaven Mojo Nation Jabber / AOL Instant Messenger Groove.net Napster: What is it? “The largest, most diverse online community of music lovers in history." A file transfer system for music lovers to search for and trade mp3’s Also features: user hotlist chatrooms instant messaging Napster: How does it work? “hybrid” P2P architecture centralized server takes all file requests, searches dynamically updated database server brokers connections between clients for decentralized downloads Napster: Original Business and Public Policy Goals create an easy way to search for and share music for free over the internet take advantage of latent disk space on edges of internet avoid copyright issues by having each user responsible for their own content Napster: Current Business and Public Policy Goals Avoid lawsuits! Metallica Filename filtering Monthly fee? Get musicians on their side “empower yourself!” Get activists on their side Napster Action Network Napster: How does it address the threats in our model? Monitoring of transactions, identifying individuals Tracking programs Users can log usernames/files downloaded from them Possible to search entire shared file directory of a user (hotlist) Impersonation and misrepresentation Only one username per program – cannot change Napster: How does it address the threats in our model? (cont’d) Legal action Denial of Service Attack Very vulnerable, as we have seen Would prevent searches, but not file transfers Malicious Content Everything is mp3 format Gnutella: What is it? A protocol, not an actual program Completely decentralized architecture – “pure” P2P Used for file transfer Open source, so many other programs have built off of it BearShare LimeWare GnuFrog Gnutella: How does it work? Works like the real world (gossip, wordof-mouth) Makes initial connection to other hosts in cache (ping) Broadcasts, propagates queries to these hosts Responses travel back along same path Connects directly to transfer files Gnutella: How does it work? (cont’d) Gnutella: Business and Public Policy Goals “internet on top of the internet” Decentralization New real-time search engine model No single point of failure Open source code Allows for new innovations, freelance application development Gnutella: How does it address the threats in our model? Monitoring of Transactions, Identification Tracking programs Users can see requests passed through their node, but not original sender Users can log IP’s of nodes with whom they transfer files Zeropaid.com’s Wall of Shame Legal Action Who can copyright holders realistically sue? Gnutella: How does it address the threats in our model? (cont’d) Denial of Service Attacks Unreliability of resources Malicious content Finding initial group of peers Mandragore scare Know what you’re downloading Trust who you’re downloading from Freeloading Increases the length of search requests Some software, like LimeWare, allows users to have “preferences” to nodes who are also sharing material Gnutella: Scalability Issues and Bandwidth Inequity Clip2 Reflectors – “super peers” Gnutella: Scalability Issues and Bandwidth Inequity (cont’d) BearShare v. 3.0.0 Alpha 3 modes Client (low bandwidth) Server/Defender (high bandwidth) Peer (normal) Centralizes system somewhat, provides targets, but increases efficiency Copyright Violation Trackers on Napster and Gnutella Copyright Agent Roy Orbison fans beware! Media Tracker Masquerades as a user Logs IP’s, ISP’s, files Operated from outside US, so not subject to US privacy laws Monitoring of Transactions on Napster and Gnutella (cont’d) Screenshot of Media-Tracker SETI@home: What is it? Allows PC owners to help in the search for extraterrestrial intelligence Free screensaver, analyzes radio telescope data when PC is idle SETI@home: How does it work? Not “pure” P2P Central server sends data to hosts Hosts compute FFT’s on data, send results back to server No inter-host communication Example of how processing power can be shared among computers SETI@home: What are its business and public policy goals? Find more aliens in less time Create a community of extraterrestrial enthusiasts using a participatory medium Other possible applications for distributed computing Code breaking Genetic analysis SETI@home: How does it address the threats in our model? Manipulation of Transactions Doctored versions Trying to find better ways to compute FFT’s No open source code Doctored result files Encryption, checksums SETI@home: How does it address the threats in our model? (cont’d) Identification of individuals or nodes Denial of Service Unreliability of resources Redundant data units distributed Malicious content Downloads data, not executables Freenet: What is it? Distributed, decentralized, anonymous publishing system Like one enormous, shared hard drive Freenet: How does it work? Every data has a key Need to know key to access data No effective search mechanism yet Key search: uses a depth-first search along nodes If a node does not have a key, it directs to node with “closest” key Unique ID’s, routing data back, nodes cache data along way more scalable, efficient than broadcast – routes you closer each hop Freenet: How does it work? (cont’d) Every node allocates space to be used by network Cannot update files Sends key request w/ unique ID InsertRequest Checks if data already exists DataRequest If next node contains key, returns data along same path If not, finds the “closest key”, forwards to that node Freenet: How does it work? (cont’d) Key/data stack model Freenet: What are its business and public policy goals? Prevent censorship of documents Provide anonymity of users Plausible deniability for node operators Must trace back requests through every node in path Remove any single point of control Keep most requested data, not most “acceptable” data Freenet: How does it address the threats in our model? Monitoring of transactions Manipulation of transactions Hard unless you have control of many nodes Attacker cannot forge data or update it Every node checks key for validity of document while it is being forwarded back Impersonation and misrepresentation No way to know where data comes from anyway Identification of individuals or nodes Legal action Plausible deniability for requests Raj’s pictures FreeHaven: What is it? Network that allows users to publish documents Provides anonymity, server accountability, and equitability of resource distribution FreeHaven: How does it work? Distributed network of servers Servers communicate through anonymous channels, such as reply blocks sent via remailers Data enters and propagates through the network through the process of trading Files are divided into pieces and distributed among servers, only a subset of which are needed to reconstruct the file All data is encrypted and signed before transfer or storage FreeHaven: What are its business and public policy goals? Business goals To be used in conjunction with services such as FreeHaven to provide long-term, popularity independent data storage Public policy goals Anonymity of author, publisher, reader, document, server, and query System accountability (as opposed to user accountability) Equity of resource distribution FreeHaven: How does it address the threats in our model? Monitoring of transactions Manipulation of transactions All FreeHaven traffic is encrypted in transit and in storage Document requests are forwarded through the system via anonymous re-mailers All data segments are signed Only a subset of the segments are required to reconstruct the data Impersonation and misrepresentation FreeHaven: How does it address the threats in our model? (cont’d) Identification of individuals or nodes Author/publisher anonymity through trading Server anonymity through pseudonyms and anonymous communication via re-mailer reply blocks Legal action, social pressure, external action No central authority to be held accountable “Plausible deniability:” server does not know what data it is storing or what is being requested Only a subset of the servers must be available to reconstruct the data Data cannot be revoked from the network FreeHaven: How does it address the threats in our model? (Cont’d) Denial of service, unreliability of resources Only a subset of the servers must be available to reconstruct data Accountability mechanisms for servers Blocking of access to network resources Malicious content Freeloading and inequitable resource use Must donate space to publish data Mojo Nation: What is it? Distributed, micro-payment based publishing/resource distribution system Resource consumers and providers make “capitalist” exchanges of resources (storage space, computation) Mojo Nation: How does it work? Content trackers keep list of content pieces and addresses of nodes that have them Query different nodes until you have all of the parts needed to reconstruct the file Mojo Nation: What are its business and public policy goals Business goals Public policy goals Mojo Nation: How does it address the threats in our model? Monitoring of transactions Manipulation of transactions Impersonation and misrepresentation Identification of individuals May be addressed in future by payment for “hops” over a number of nodes, but not currently addressed Legal action “Plausible deniability” because server does not have enough of a document to reconstruct it Jabber/AIM: What are they? Instant messaging platforms Jabber provides universal connectivity to other IM services, including AIM, ICQ, MSN Messenger Jabber designed as protocol to allow for person-to-person as well as app-to-app communication Jabber/AIM: How do they work? AIM Client/server: almost all data relayed through AOL servers Jabber Distributed system of servers, each presiding over a namespace When a server receives a message, it will forward it to its peers if recipient not in its namespace Communicate via XML or proprietary protocols where necessary Jabber/AIM: What are their business and public policy goals? AIM Business goals Large scale IM solution, centralized Supported by advertisements Public policy goals Jabber Business goals Open source, open structure for naming, presence, and "roster" (buddy list) information Allow users to have one client for multiple IM protocols Public policy goals Jabber/AIM: How do they address the threats in our model? Monitoring of transactions Data generally sent clear-text through (possibly) untrusted servers Jabber’s XML structure allows for security for certain apps using encryption and vCard, but not supported in the standard Manipulation of transactions Impersonation and misrepresentation There have been several cases of ID theft and password fraud on AIM Jabber allows for dialback to prevent spoofing Identification of individuals or nodes Jabber/AIM: How do they address the threats in our model? (Cont’d) Legal action, social pressure, denial of service AIM servers all centralized Jabber servers distributed, each presides over separate namespace Blocking of access to resources Unreliability of resources Malicious content Groove: What is it? “Shared space” for real-time collaboration Chat, IM, whiteboard, group web browsing, calendar, discussion board, integration with other applications Groove: How does it work? End-user application connects directly with peers, but can use gateway servers if necessary All data in XML format Different modes of operation to provide different levels of anonymity of participants Groove: What are its business and public policy goals? Business goals Public policy goals Groove: How does it address the threats in our model? Monitoring of transactions Manipulation of transactions All data is signed so it cannot be manipulated Impersonation and misrepresentation All data is encrypted in transit and in storage Key distribution system uses SDSI-type attributes All invitation messages are signed and sent with signer’s public key Recipient can compute “fingerprint” from public key and check it against previously known value Identification, legal action, etc. Groove: How does it address the threats in our model? (Cont’d) Denial of service Blocking of access to network Central servers used only when necessary Can work through gateway servers designed to tunnel through firewalls, etc. Unreliability and transient availability All communication is mirrored locally for all participants Malicious content Freeloading and inequity