Here. - iLab!

advertisement
A look at Peer-to-Peer File Sharing with
Gnutella
Prof. Ellis Horowitz
November 25, 2002
Copyright 2002 Ellis Horowitz
Outline
•
•
•
•
P2P file sharing clients
Gnutella protocol
Gnutella network properties
Gnutella protocol issues
– topolgy mismatch
– scalability
– free riding
– query types
– anonymity
– security
• Conclusions
Copyright 2002 Ellis Horowitz
Peer-to-Peer File Sharing is all about the trading of
copyrighted music and videos without paying anything to
the authors
query
music
category
Kazaa
Native
Windows
Application
banner
ad
3 million users online
Copyright 2002 Ellis Horowitz
sharing 4 PetaBytes of data
Kazaa Survives By Legal Manuvering
•
•
•
•
•
•
•
•
March 2001, Kazaa is founded by two Dutchmen, Niklas
Zennstrom and Janus Friis in a company called Computer
Empowerment
The software is based upon their FastTrack P2P Stack, a
proprietary algorithm for peer-to-peer communication
Kazaa licenses FastTrack to Morpheus and Grokster
Oct. 2001 MPAA and RIAA sue Kazaa, Morpheus and Grokster
Nov. 2001, Consumer Empowerment is sued in the Netherlands
by the Dutch music publishing body, Buma/Stemra. The court
orders KaZaA to take steps to prevent its users from
violating copyrights or else pay a heavy fine.
Jan. 2002, Zennstrom&Friis sell Kazaa software and website
to Sharman Networks, based in Vanuatu, an island in the
Pacific, but operating out of Australia
Feb. 2002, Kazaa cuts off Morpheus clients from FastTrack
April 2002, Sharman Networks agrees to let Brilliant
Digital bundle their own stealth P2P application called
AltNet within KaZaA. This network would be remotely
switched on, allowing KaZaA users to trade Brilliant
Digital content throughout FastTrack
Copyright 2002 Ellis Horowitz
Morpheus File Sharing Software
shopping, web browser
a Java
application
Search Power
searches
over
multiple
categories,
metadata
banner
ad
behind a firewall
Copyright 2002 Ellis Horowitz
Morpheus adopts the
Jtella version of Gnutella
There are many Gnutella Clients
See http://gnutella.wego.com/
Copyright 2002 Ellis Horowitz
Gnutella History
•
Originally conceived of by Justin Frankel, 21 year old founder
of Nullsoft
• March 2000, Nullsoft posts Gnutella to the web
• A day later AOL removes Gnutella at the behest of Time Warner
• The Gnutella protocol version 0.4
http://www9.limewire.com/developer/gnutella_protocol_0.4.pdf
and version 0.6
http://rfcgnutella.sourceforge.net/Proposals/Ultrapeer/Ultrapeers.htm
• there are multiple open source implementations at
http://sourceforge.net/
including:
– Jtella
– Gnucleus
• Software released under the Lesser Gnu Public License (LGPL)
• the Gnutella protocol has been widely analyzed
Copyright 2002 Ellis Horowitz
Gnutella Protocol Messages
• Broadcast Messages
– Ping: initiating message (“I’m here”)
– Query: search pattern and TTL (time-to-live)
• Back-Propagated Messages
– Pong: reply to a ping, contains information
about the peer
– Query response: contains information about
the computer that has the needed file
• Node-to-Node Messages
– GET: return the requested file
– PUSH: push the file to me
Copyright 2002 Ellis Horowitz
Gnutella search mechanism
Steps:
• Node 2 initiates search for file A
7
1
A
4
2
6
3
5
Copyright 2002 Ellis Horowitz
Gnutella Search Mechanism
A
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
7
1
4
2
3
A
6
A
5
Copyright 2002 Ellis Horowitz
Gnutella Search Mechanism
A
A
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
• Neighbors forward message
7
1
4
2
6
3
A
A
5
Copyright 2002 Ellis Horowitz
Gnutella Search Mechanism
A:7
A
7
1
4
2
6
3
A:5
A
A
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
• Neighbors forward message
• Nodes that have file A initiate a
reply message
5
Copyright 2002 Ellis Horowitz
Gnutella Search Mechanism
7
1
4
2
3
A:7
A:5
A 6
A
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
• Neighbors forward message
• Nodes that have file A initiate a
reply message
• Query reply message is backpropagated
5
Copyright 2002 Ellis Horowitz
Gnutella Search Mechanism
7
1
A:7
2
4
A:5
6
3
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
• Neighbors forward message
• Nodes that have file A initiate a
reply message
• Query reply message is backpropagated
5
Copyright 2002 Ellis Horowitz
Gnutella Search Mechanism
download A
1
7
4
2
6
3
5
Steps:
• Node 2 initiates search for file A
• Sends message to all neighbors
• Neighbors forward message
• Nodes that have file A initiate a
reply message
• Query reply message is backpropagated
• File download
• Note: file transfer between
clients behind firewalls is not
possible; if only one client, X, is
behind a firewall, Y can request
that X push the file to Y
Copyright 2002 Ellis Horowitz
Other Gnutella Issues
• GUID:
Short for Global Unique Identifier, a randomized
string that is used to uniquely identify a host or
message on the Gnutella Network. This prevents
duplicate messages from being sent on the network.
• GWebCache:
a distributed system for helping servents connect to
the Gnutella network, thus solving the
"bootstrapping" problem. Servents query any of
several hundred GWebCache servers to find the
addresses of other servents. GWebCache servers are
typically web servers running a special module.
• Host Catcher:
Pong responses allow servents to keep track of active
gnutella hosts
• On most servents, the default port for Gnutella is
6346
Copyright 2002 Ellis Horowitz
Network growth statistics
Number of nodes in the largest
network component ('000)
50
Gnutella Network Growth
.
40
30
20
Growth Factors
 DSL and cable modem nodes grew
substantially
 Multiple client implementations
became available
10
05/12/01
05/16/01
05/22/01
05/24/01
05/29/01
02/27/01
03/01/01
03/05/01
03/09/01
03/13/01
03/16/01
03/19/01
03/22/01
03/24/01
11/20/00
11/21/00
11/25/00
11/28/00
-
 There was significant growth in the Gnutella network in 2001
 5,000 nodes on February 2001,
 10,000 nodes on March 19, 2001 Statistics due to Matei Ripeanu, see
http://people.cs.uchicago.edu/~matei/
 20,000 nodes on May 12, 2001
PAPERS/gnutella-rc.pdf
 40,000 nodes on May 29, 2001
Copyright 2002 Ellis Horowitz
Limewire Count of Gnutella Hosts in 2002
Green graph
represents
unique hosts
Copyright 2002 Ellis Horowitz
Growth invariants (1): avg. node connectivity
Number of links ('000)
 3.4 links per node on average
200
150
100
50
0
0
10000
20000
30000
40000
50000
Number of nodes
Copyright 2002 Ellis Horowitz
graph due to Matei Ripeanu
Growth invariants (2): network diameter
Percent of node pairs (%)
 Node-to-node distance maintains similar distribution
 Average node-to-node distance grew 25% while the network
grew 50 times over 6 months
50%
40%
30%
20%
10%
0%
1
2
3
4
5 6 7 8 9 10 11 12
Node-to-node shortest path (hops)
Copyright 2002 Ellis Horowitz
graph due to Matei Ripeanu
Is Gnutella a power-law network?
Power-law networks: the number of links per node
follows a power-law distribution
Num. of nodes (log scale)
10000
Examples:
 the Internet,
 in/out links to/from
HTML pages,
 citation network,
 US power grid
November 2000
1000
100
10
1
1
10
Number of links (log scale)
100
graph due to Matei Ripean
Implications: High tolerance to random node failure but low
reliability when facingCopyright
of an2002
‘intelligent’
Ellis Horowitz adversary
Total Generated Traffic
Ripeanu has determined that Gnutella traffic totals 1Gbps
(or 330TB/month)!
– Compare to 15,000TB/month in US Internet backbone
(Dec. 2000)
– this estimate excludes actual file transfers
Reasoning:
 QUERY and PING messages are flooded. They form more than
90% of generated traffic
 predominant TTL=7
 >95% of nodes are less than 7 hops away
 measured traffic at each link about 6kbs
 network with 50k nodes and 170k links
Statistics due to Matei Ripeanu
Copyright 2002 Ellis Horowitz
Mapping between Gnutella Network and
Internet Infrastructure
A
B
F
D
E
C
G
H
Perfect Mapping
Copyright 2002 Ellis Horowitz
Mismatch between Gnutella Network and
Internet Infrastructure
A
B
F
D
C
E
G
H
• Inefficient mapping
• Link D-E needs to support six times higher
traffic.
Copyright 2002 Ellis Horowitz
Topology mismatch
The overlay network topology doesn’t
match the underlying Internet
infrastructure topology!
 40% of all nodes are in the 10 largest
Autonomous Systems (AS)
 Only 2-4% of all TCP connections link nodes
within the same AS
 Largely ‘random wiring’
• Most Gnutella generated traffic crosses AS
border, making the traffic more expensive
• May cause ISPs to change their pricing scheme
Copyright 2002 Ellis Horowitz
Scalability
• Whenever a node receives a message, (ping/query)
it sends copies out to all of its other
connections.
• existing mechanisms to reduce traffic:
– TTL counter
– Cache information about messages they
received, so that they don't forward
duplicated messages.
Copyright 2002 Ellis Horowitz
Free Riding on Gnutella
•
•
•
•
•
70% of Gnutella users
share no files
90% of users answer no
queries
Those who have files to
share may limit number of
connections or upload
speed, resulting in a high
download failure rate.
If only a few individuals
contribute to the public
good, these few peers
effectively act as
centralized servers.
see Adar and Huberman at
http://www2.cs.cmu.edu/~kunwadee/res
earch/p2p/gnutella.html
Copyright 2002 Ellis Horowitz
Free Riding on Gnutella
More than 25% of
Gnutella clients share
no files; 75% share
100 files or less
Conclusion: Gnutella
has a high percentage
of free riders
* Statistics due to
S. Gribble
Copyright 2002 Ellis Horowitz
Anonymity
• Gnutella provides for anonymity by masking the
identity of the peer that generated a query.
• However, IP addresses are revealed at various
points in its operation: HITS packets includes
the URL for each file, revealing the IP
addresses
• Clients claim that they have no control, but..
– they support bootstrapping
– they may control message flow
– they may control metadata searches
– they may control program updates
Copyright 2002 Ellis Horowitz
Query Expressiveness
• Format of query not standardized
• No standard format or matching semantics for
the QUERY string. Its interpretation is
completely determined by each node that
receives it.
• String literal vs. regular expression
• Directory name, filename, or file contents
• Malicious users may even return files unrelated
to the query
Copyright 2002 Ellis Horowitz
Gnutella Queries
• "The popularity of Gnutella queries and its
implications on scalability" Kunwadee Sripanidkulchai,
see http://www2.cs.cmu.edu/~kunwadee/research/p2p/gnutella.html
Examining over 5 million queries
Copyright 2002 Ellis Horowitz
Security
• Recently there have been P2P viruses and worms
constructed
– the Benjamin virus uses Kazaa to spread itself,
see
http://www.viruslist.com/eng/viruslist.html?id=497
90
• Kazaa now includes virus checking software that is
applied before upload/after download
• There have been several Gnutella worms:
Gnutella.worm, VBS/GWV.a, VBS_GNUTELWORM, VBS.Gnut.A,
VBS/Gnu
• A Gnutella worm spreads by making a copy of itself in
the Gnutella program directory, then making that
directory available for sharing files on the Gnutella
network.
Copyright 2002 Ellis Horowitz
Conclusions
 Gnutella is a self-organizing, large-scale, P2P
application that produces an overlay network on top
of the Internet; it appears to work
 Growth is hindered by the volume of generated
traffic and inefficient resource use
 since there is no central authority the open source
community must commit to making any changes
 Suggested changes have been made by
– Peer-to-Peer Architecture Case Study: Gnutella
Network, by Matei Ripeanu
– Improving Gnutella Protocol: Protocol Analysis and
Research Proposals by Igor Ivkovic
Copyright 2002 Ellis Horowitz
Legal Questions
• Do US courts have jurisdiction over P2P
companies?
• Do P2P companies really contribute to copyright
infringement, cite: Sony BetaMax case?
• Do P2P companies affect file sharing?
• If Kazaa, Grokster and Morpheus are stopped,
will that stop file sharing or copyright
infringement?
Copyright 2002 Ellis Horowitz
Some References
•
•
[1] Eytan Adar and Bernardo A. Huberman, Free Riding on
Gnutella http://www.firstmonday.dk/issues/issue5_10/adar/
[2] Igor Ivkovic, Improving Gnutella Protocol: Protocol
Analysis And Research Proposals
http://www9.limewire.com/download/ivkovic_paper.pdf
•
[3] Jordan Ritter, Why Gnutella Can't Scale. No, Really.
http://www.monkey.org/~dugsong/mirror/gnutella.html
•
[4] Matei Ripeanu, Peer-to-Peer Architecture Case Study:
Gnutella network.
http://www.cs.uchicago.edu/%7Ematei/PAPERS/gnutella-rc.pdf
•
[5] The Gnutella Protocol Specification v0.4
http://www9.limewire.com/developer/gnutella_protocol_0.4.pdf
Copyright 2002 Ellis Horowitz
Download