p2p3

advertisement
Client/Server Distributed Systems
240-322, Semester 1, 2005-2006
3. Peer-to-Peer
Technologies (P2P)
 Objectives
– introduce P2P, discuss some current
systems, highlight key issues
240-322 Cli/Serv.: P2P/3
1
Contents
1.
2.
3.
4.
5.
6.
7.
8.
What is P2P (briefly) ?
The Early Days of the Internet
The Internet Now
Reader-Centric P2P
Publisher-Centric P2P
P2P Meme Map
P2P Issues
More Information
240-322 Cli/Serv.: P2P/3
2
1. What is P2P (briefly) ?
 Peer-to-peer
(P2P) allows everyone on the
Internet to share their resources with others.
 P2P is
a class of applications that uses
resources from the 'edges' of the Internet
– e.g. storage space, CPU time, files
 Some
P2P applications:
– Napster, Gnutella, FreeNet, FreeHaven
240-322 Cli/Serv.: P2P/3
3
Client/Server vs. P2P
server
clients
P2P
240-322 Cli/Serv.: P2P/3
4
2. The Early Days of the Internet
 Many
users think that P2P ideas are new.
 Infact,
the Internet was originally designed
to be P2P
– but due to the Web and browsers it has changed
into a client/server model during the 1990's
240-322 Cli/Serv.: P2P/3
5
The Arpanet (late 1960's)
 Arpanet's
aim was to share computing
resources around the main USA universities
and Government installations.
 'Killer'
applications of 1970-1980's:
– FTP, telnet, e-mail, chat
– mostly client/server but...
– usage patterns were symmetric
 most
240-322 Cli/Serv.: P2P/3
machines were both clients and servers
6
Usenet (since 1979)
 A decentralized
file sharing model for
distributing news.
news
server
news
server
news
server
rs
240-322 Cli/Serv.: P2P/3
rs
rs
readers (and senders)
news
server
news
server
rs
rs
continued
7
 No
central authority
– authority is localised in each news server
 But,
an unofficial Usenet backbone has
developed:
– these are servers which store more newsgroups,
have faster processing, better connectivity, etc.
– server inequalities cause a hierarchy to appear
240-322 Cli/Serv.: P2P/3
continued
8
 News
topic hierarchy (the ordering of the
news groups) is controlled by users in the
news.admin news group
– but new users find it hard to influence decisions
is an "anything goes" alt.* news
group hierarchy.
 There
 Main
240-322 Cli/Serv.: P2P/3
problem: lots of useless news items
9
DNS (Domain Name System)
Since 1983
 DNS
allows IP names to be mapped to IP
addresses
– used by almost all network applications
 Names
are organised into a hierarchy:
– psu.ac.th, ait.ac.th, foo.org.th
240-322 Cli/Serv.: P2P/3
continued
10
 The
name hierarchy has led to a mixed P2P
and hierarchical server model
th
name
server
ac name
server
psu name
server
rs
240-322 Cli/Serv.: P2P/3
rs
ait name
server
name org
server
lookups and
requests
name
server
name foo
server
rs
rs
continued
11
 Each
name server deals with part of the
namespace and passes other requests on.
 Each
name server does caching to reduce
network load.
 A hierarchy
240-322 Cli/Serv.: P2P/3
makes search easier.
12
3. The Internet Now
 Most
people use a browser to surf the Web
– the Web encourages a client/server model
 request
 Most
a page, get it
users do not run Web servers
– hard to setup
– many ISPs do not allow them
240-322 Cli/Serv.: P2P/3
continued
13
 The
present Web makes it hard for an
ordinary user to publish (serve) Web pages
– dynamic addresses
– firewalls
– asymmetric bandwidth (e.g. cable modems)
 Today's
Web in summary:
– easy to read, difficult to publish
240-322 Cli/Serv.: P2P/3
14
Accountability
 Many
of the restrictions on users
(e.g. firewalls) are quite recent
– they started appearing in the mid 1990's
 The
reason is lack of accountability
– an Internet user can send spam, attack
machines, etc.
 due
to the 'poor' design of the Internet protocol
 it assumes that users are responsible
240-322 Cli/Serv.: P2P/3
15
P2P Aims
 P2P has
a political and social component
– it aims to allow everyone to share resources
– this is quite different from today's Web where
business/governement/university servers
present information, and ordinary uses read it
 There
are many technological, political, and
social problems to be dealt with.
240-322 Cli/Serv.: P2P/3
16
4. Reader-Centric P2P
 Reader-centric
P2P systems distribute
content (information) by anyone, for anyone
to read.
 Example
systems:
– Napster, Gnutella, FreeNet
240-322 Cli/Serv.: P2P/3
17
4.1. Napster
 Napster
use to allow users to publish music
files which other users can download for free
– publishing is not the same as authoring
 Napster
is a hybrid of P2P and client/server
since a Napster server stores who is logged
onto the system and details about the files
they are publishing.
240-322 Cli/Serv.: P2P/3
18
Using Napster
3. request music file
John
Napster
client
0. login/upload
1. request
"Yesterday"
4. send the file
Hey Jude/Beatles. John
Yesterday/Beatles. Bob
Sgt. Pepper/Beatles. Carol
Yesterday/Beatles. Ted
2. send
:
matching
:
info.
240-322 Cli/Serv.: P2P/3
Napster server
Bob
0. login and upload
details
Carol
Ted
19
Features
 Downloading
a file makes it available from
a new machine (your machine)
– decentralizes file storage
– increases redundancy (good for reliability)
– reduces search time for nearby users
 Napster
can be 'attacked' easily
– music lawyers sued the Napster server owners,
and closed it down (changed it)
240-322 Cli/Serv.: P2P/3
continued
20
 The
peers (clients) are not equal
– some do not publish (they are freeloaders)
– some clients are recognized as being 'better'
 e.g.
more songs, better quality recordings
 people choose those clients first
 Client
inequality means business opportunities
– e.g. for the music industry
240-322 Cli/Serv.: P2P/3
21
4.2. Gnutella
 A network
of shared file stores
(servents)that can be searched.
info.
reply
my
servent
search by
broadcasting
240-322 Cli/Serv.: P2P/3
servents
22
Features
 No
central server (authority)
– so much harder to ‘switch off’ than Napster
 Illustrates
how to access dynamic,
heterogeneous file systems
 Queries
can be interpreted differently by
each servent
– results can be anything
240-322 Cli/Serv.: P2P/3
continued
23
 Gnutella
servents have been ported to iMode mobile phone in Japan
– servents are meant to run on anything
 InfraSearch
– a prototype search engine for Gnutella
– not fully developed as yet
– uses a broadcast model
240-322 Cli/Serv.: P2P/3
24
Broadcast Model Details
 Broadcasting
is done with TCP
– can avoid some firewall problems
– utilises TCP’s best effort packet sending
 TCP
will discard packets if the network is too
loaded
 Rebroadcasting/looping
is avoided by each
packet having a unique ID
– a servent does not transmit a packet with the
same ID more than once
240-322 Cli/Serv.: P2P/3
continued
25
 Packets
have a TTL (Time-To-Live) value
of 7 to get rid of old stuff
– TTL is the number of machines a packet can
travel between before ‘dying’
 A servent
replies to the node that sent it the
packet
– answers are routed back along the transmission
path of the query
240-322 Cli/Serv.: P2P/3
26
Network Shape
 A new
servent will search for connections to
nodes with similar bandwidths or higher
– this will cause the ‘shape’ of the Gnutella
network to change over time
– a backbone topology (shape) will develop with
high bandwidth nodes in the center, surrounded
by slower nodes
 This
dynamic behaviour is not implemented
in all servents.
240-322 Cli/Serv.: P2P/3
27
Problems
 Node
overloading
– many servents are hardwired to connect to the
same nodes
 these
nodes can easily become overloaded
 these nodes can be attacked to affect the Gnutella
network
– many sevents cannot be reconfigured to look
for other nodes
240-322 Cli/Serv.: P2P/3
continued
28
 Too
much broadcasting
– some servents use broadcasting when direct
node-to-node communication is possible
– degrades the network
 Once
a servent has found a node with the
file it wants, the file is downloaded using
ordinary HTTP
– no security or anonymity
240-322 Cli/Serv.: P2P/3
29
4.3. FreeNet
 FreeNet
supports disk space sharing
– it creates a geographically distributed collection
of hard drives
reply
new links
created
search
by key
240-322 Cli/Serv.: P2P/3
30
Features
 Documents
are encrypted, so the owners of
the hard drives do not (easily) know what
they are storing
– prevents document censorship
– provides owner deniability
 e.g.
if they are sued
 An
encrypted file comes with a unique key
which is used by search tools
240-322 Cli/Serv.: P2P/3
continued
31
 A search
returns links to the answer nodes,
so a search node collects new connections to
the network over time.
 Often
requested documents are cached (for
a certain time) by the search nodes
– increases network reliability
– decreases search time in the future for that doc.
240-322 Cli/Serv.: P2P/3
continued
32
 Nodes
only have a certain amount of space
– less popular files are deleted to make way for
the caching of new ones
– any unpopular files can be deleted, even ones
put there by the node owner
– may mean that FreeNet will end up storing
music and pornography while more serious
information disappears!
240-322 Cli/Serv.: P2P/3
33
Key Formats
 FreeNet
has a complex range of different
keys for documents:
– keys using a hash function on the data
– keys using data keywords (metadata)
– keys using public/private key encryption
 These
are likely to change/evolve as
FreeNet is developed.
240-322 Cli/Serv.: P2P/3
continued
34
 Keys
are very important to FreeNet
– they allow document contents to be hidden
– they allow more efficient search than the
Gnutella broadcast model
– they allow fake documents to be detected
 FreeNet
search engines are still being
developed
– early versions use a broadcast model like the
one in Gnutella
240-322 Cli/Serv.: P2P/3
35
4.4. Gnutella and FreeNet Compared
 Availability
of files:
– Gnutella nodes do not delete files
– cached behaviour of FreeNet means no
guarantees about what will be stored on a node
 Node
control
– Gnutella nodes only contain what their owners
put there
– FreeNet nodes may end up containing anything
240-322 Cli/Serv.: P2P/3
continued
36
 Anonymity
and Deniability
– Gnutella does not hide document contents
– FreeNet encrypts documents
 Scalability
– the caching and dynamic connectivity of
FreeNet nodes means it will probably scale
better than Gnutella
– Gnutella’s broadcast search model will not
scale
240-322 Cli/Serv.: P2P/3
37
4.5. MojoNation
 A distributed
file sharing system, but
searches use a complex micropayment
system
– a searcher must ‘compensate’ the node
containing the content it wants
 this
might be in the form of digital money or some
other resource such as disk space or CPU cycles
240-322 Cli/Serv.: P2P/3
38
Uses of Micropayments
 For
supporting P2P business applications.
 Micropayments
can solve many hacker
problems:
– spam, distributed denial of service, freeloaders
240-322 Cli/Serv.: P2P/3
39
Problems with Micropayments
 If
a node is based on a slow machine, its
network link is slow, and/or it does not
contain much space then:
– no one will use the node, and
– the node has nothing to give in compensation
when it uses other nodes (except money)
 One
solution is for a network of machines
to become a single MojoNation node.
240-322 Cli/Serv.: P2P/3
40
5. Publisher-Centric P2P
 Publisher-centric
systems concentrate on
anonymously preserving information
– derived from the ‘Eternity Service’ idea
– examples: FreeHaven, Publius
 Most
systems do not process queries quickly.
 Most systems assume a fixed network
topology (shape).
240-322 Cli/Serv.: P2P/3
41
5.1. FreeHaven
 A system
of distributed anonymous storage.
 Anonymity
for everything!
– authors, publishers, readers, servers, documents
 Documents
can only be deleted/changed by
their publishers not the servers
240-322 Cli/Serv.: P2P/3
continued
42
 FreeHaven
does allow servers to be
dynamically added/removed.
 Uses
complex reputation and accountability
systems
– to detect if documents are fakes/rubbish
– prevents over-publication
– the complexity is because author and publisher
anonymity must be maintained
240-322 Cli/Serv.: P2P/3
43
5.2. Publius
 A Web-based
publishing system that resists
censorship and tampering.
 A file
is replicated among many servers
– combats distributed denial of service attacks
 Files
are encrypted, but come with a ‘share’
– each copy of the document has its own unique
share
240-322 Cli/Serv.: P2P/3
continued
44
 A document
search returns one copy of the
encrypted doc and several shares
– the shares are combined to create a key which
is used to decrypt the file
– not all the shares for a document are needed to
create the key
240-322 Cli/Serv.: P2P/3
45
6. P2P Meme Map
File sharing
and caching
networked
devices
open source
email routing
IP routing
instant
messaging
distributed
computation
Strategic Positioning: an Internet OS
User Positioning: make a more capable computer
Core Competencies:
* metadata management
* seemless connectivity and communication
* self-organizing systems, zero administration
* security
client and
server
simple
joining
decoupling
from machine
240-322 Cli/Serv.: P2P/3
allow
unreliability
user power
P2P is more
fundamental
we create
communities
projects,
actions,
apps that
define
P2P
ideas
supported
by P2P
use edge
resources
decentralized
46
7. Some P2P Issues
 Decentralization
 Metadata
 Accountability
 Trust
240-322 Cli/Serv.: P2P/3
47
7.1. Decentralization
 Most
models are a hybrid with some
hierarchy and/or central servers
– e.g. Napster, DNS, ICQ
 ICQ
(from 1996) allows direct client-toclient communication where possible, but
has a server as a fallback.
240-322 Cli/Serv.: P2P/3
48
7.2. Metadata
 Metadata
is “information (data) about data”
– e.g. the column headings of a database
– e.g. Napster’s metadata is the artist and song
names used for searching for music files
240-322 Cli/Serv.: P2P/3
49
Why is Metadata Useful?
 Metadata
about a resource (e.g. about a Web
page, a music file, a video) gives extra
information about the resource
– explains the resource in a clear way
 Metadata
can be used by search engines to
search faster, and give more accurate
results.
240-322 Cli/Serv.: P2P/3
continued
50
 Metadata
can be used for information
addressing and more clever routing
– e.g. use country information about a music file
to direct a request to better servers
240-322 Cli/Serv.: P2P/3
51
Metadata in the Web
 HTTP added
support for metadata late in
1997:
– the <meta> tags
– not widely used, can be misused
– meant to contain information such as a
description (keywords) about the file, the
creator name, date, etc.
240-322 Cli/Serv.: P2P/3
continued
52
 XML
– allows the creation of new tags which more
accurately reflect the meaning of the data
 e.g.
 RDF
<author>, <publisher>, <owner>, ...
(Resource Description Framework)
– talks about the properties of resources
– e.g. the meaning of a link can be made more
accurate: was_written_by, is_interesting
240-322 Cli/Serv.: P2P/3
53
7.3. Accountability
 “The
Tragedy of the Commons”
– a commonly owned resource will be overused
until it degrades due to the user’s putting selfinterest first
 One
solution:
– give ownership of the resource to its users so
that caring for the resource prolongs the user’s
income
240-322 Cli/Serv.: P2P/3
54
Accountability Problems in P2P
 Peers
are often anonymous / hard to track /
transient
– so there is no reason for them to look after the
resources they use
 Assigning
IDs to users to encourage
responsibility often fails
– e.g. pseudospoofing
240-322 Cli/Serv.: P2P/3
continued
55
 Pseudospoofing
– a person creates many fake IDs in the
system (e.g. at eBay)
 each
fake ID is used to give a high positive
rating to the other fake IDs
 then one of the IDs is used to steal
– also used to get extra free disk space on
GeoCities Web sites
240-322 Cli/Serv.: P2P/3
56
Some Solutions
 A micropayments
model
– based on digital cash or compensation of
resources
– e.g. MojoNation
 A reputation
system
– relatively new idea
– based on encrypted ‘signatures’ and third party
verifiers
240-322 Cli/Serv.: P2P/3
continued
57
 Tolerate
a certain amount of bad behaviour
– use mirroring of resources, and other forms of
redundancy
240-322 Cli/Serv.: P2P/3
58
7.4. Trust (between peers/servers)
 Trust
increases based on the reputation of
peers and servers
– how to implement reputation?
 Trust
increases when less people are
involved in the transaction
– e.g. buying directly without a middleman
240-322 Cli/Serv.: P2P/3
59
 Trust
increases when the environment
contains less risk
– e.g. applet execution inside a JVM sandbox
240-322 Cli/Serv.: P2P/3
60
Trust Issues in Censorship-resistent
Publication Systems
 Risk
logging your requests
server-altered content
fake updating of doc.
multiple fake docs.
legal deletion
240-322 Cli/Serv.: P2P/3
Solution
use secure channels
use multiple servers
encryption/
multiple copies
impose pub. limits/
micropayments
multiple copies
61
8. More Information

in our library
Peer-to-Peer: Harnessing the Power of
Disruptive Technologies
Andy Oram (ed.)
O'Reilly, 2001
– very good non-technical overview
– contains chapters on the main projects
(e.g. Gnutella, FreeNet) and main issues
– the P2P meme map is explained in chapter 3
240-322 Cli/Serv.: P2P/3
continued
62
 O'Reilly's P2P Web site:
http://www.openp2p.com/
– articles about P2P
 Google's long list of P2P links (~150):
http://directory.google.com/Top/
Computers/Software/Internet/
Clients/File_Sharing/
240-322 Cli/Serv.: P2P/3
63
Download