p2p nsf tutorial

advertisement
“ 计算机科学面临的挑战” 高层研讨会
新一代对等网络系统的拓扑性研究
▅ 为什么要P2P
▅ 第一代无结构的P2P系统有什么问题
▅ 新一代有结构的P2P系统有什么好处
▅ 新一代P2P系统的拓扑结构模型
▅ 新一代P2P系统的研究问题
南京大学软件新技术国家重点实验室 陈贵海
2003年12月27日
Why
P2P
1st Generation
2nd Generation
Generic Model
Problems
Conclusion
What is P2P Network—one version
---M. Ripeaunu, A. Lamnitchi, and I. Foster, “Mapping the Gnutella Network”, IEEE IC, No.1, 2002.




[Dynamic operability] P2P applications must keep
operating transparently although hosts join and leave the
network frequently.
[Performance and scalability] P2P applications exhibit
what economists call the “network effect” in which a
network’s value to an individual user scales with the total
number of participants.
[Reliability] External attacks should not cause significant
data or performance loss.
[Anonymity] The application should protect the privacy of
people seeking or providing sensitive information.
Why
P2P
1st Generation
2nd Generation
Generic Model
Problems
Conclusion
2
What is P2P Network— My version





[Equality] All peers assume equal role.
[Non Centralized] No centralized server in the
space.
[Robust] Highly robust, resilient, and selforganizing.
[Zero Hardware Cost] No further investments in
hardware or bandwidth.
[A hot topic] But huge investment in research,
e.g, IRIS got $ 12M.
Why
P2P
1st Generation
2nd Generation
Generic Model
Problems
Conclusion
3
How Did it Start?

A killer application: Napster
- Free music over the Internet

Key idea: share the storage and bandwidth of
individual (home) users
Internet
Why
P2P
1st Generation
2nd Generation
Generic Model
Problems
Conclusion
4
Napster: Example
m5
E
m6
F
E?
E
E?
m5
m1
m2
m3
m4
m5
m6
B
m1
P2P
1st Generation
m4
C
A
Why
D
A
B
C
D
E
F
m3
m2
2nd Generation
Generic Model
Problems
Conclusion
5
Napster: History

history:
- 5/99: Shawn Fanning (freshman, Northeasten U.) founds
Napster Online music service
- 12/99: first lawsuit
- 3/00: 25% UWisc traffic Napster
- 2000: est. 60M users
- 2/01: US Circuit Court of
Appeals: Napster knew users
violating copyright laws
- 7/01: # simultaneous online users:
Napster 160K, Gnutella: 40K,
- Now: try to come back: http://www.napster.com
Why
P2P
1st Generation
2nd Generation
Generic Model
Problems
Conclusion
6
Napster: problems

centralized server:
-

single logical point of failure
can load balance among servers using DNS notation
potential for congestion
Napster “in control” (freedom is an illusion)
no security:
- passwords in plain text
- no authentication
- no anonymity
Why
P2P
1st Generation
2nd Generation
Generic Model
Problems
Conclusion
7
Gnutella



Distribute file location and decentralize lookup.
Idea: multicast the request
Hot to find a file:
- Send request to all neighbors
- Neighbors recursively multicast the request
- Eventually a machine that has the file receives the request,
and it sends back the answer

Advantages:
- Totally decentralized, highly robust

Disadvantages:
- Not scalable; the entire network can be swamped with
request (to alleviate this problem, each request has a TTL)
Why
P2P
1st Generation
2nd Generation
Generic Model
Problems
Conclusion
8
Gnutella: Example

Assume: m1’s neighbors are m2 and m3; m3’s
neighbors are m4 and m5;…
m5
E
m6
F
E
D
E?
E?
m4
E?
E?
C
A
B
m1
Why
P2P
1st Generation
m3
m2
2nd Generation
Generic Model
Problems
Conclusion
9
Gnutella: problems
 Not scalable: the entire network can be swamped with
request (to alleviate this problem, each request has a TTL)
 Not anonymous: The person you are getting the file from
knows who you are.
 Not anymore than it’s non-centralized.
 What we care about:
How much traffic does one query generate?
how many hosts can it support at once?
What is the latency associated with querying?
Is there a bottleneck?
Why
P2P
1st Generation
2nd Generation
Generic Model
Problems
Conclusion
10
New Solutions to the Location Problem

Overlay Networks:
- applications, running at various sites
- create “logical” links (e.g., TCP or UDP connections) pairwise between each
other
- each logical link: multiple physical links, routing defined by native Internet
routing


Goal: Scalability, Resilient, Security.
Abstraction: a distributed hash-table data structure + routing table
-

Key = hash(data);
Key = hash(IP)
data= lookup(key);
Note: data can be anything: a data object, document, file, pointer to a file…
Proposals
-
CAN (ACIRI/Berkeley)
Chord (MIT)
Pastry (Rice)
Tapestry (Berkeley)
Why
P2P
1st Generation
- Koorde[MIT]
- Viceroy[Weizman]
- Cycloid[南京大学]
2nd Generation
Generic Model
Problems
Conclusion
11
Overlay Networks: Consistent Hashing
David Karger, Eric Lehman, Tom Leighton, Mathhew Levine, Daniel
Lewin, Rina Panigrahy, Consistent Hashing and Random Trees:
Distributed Caching Protocols for Relieving Hot Spots on the
World Wide Web, ACM Symposium on Theory of Computing, 1997
SHA-1: http://www.w3.org/PICS/DSig/SHA1_1_0.html
Why
P2P
1st Generation
2nd Generation
Generic Model
Problems
Conclusion
12
Overlay Networks: Typical Systems 1
Ring
Mesh
Hypercube
Systems
Chord[MIT]
CAN[Berkeley]
Pastry[Rice],
Tapestry[Berkeley]
Persons
Dabek
Kaashoek
Stoica
Ratnasamy,
Shenker
Stoica(formerly in MIT)
Druschel,
Rowstron
Applications
CFS
Key space
1-dimensional cycle
Space-time
complexity
O (log N )
Data
distribution
Each node holds a segment
of data keys between
predecessor and itself.
Each node holds a zone
of data keys where itself
resides
Each node holds a segment of
data keys that are the closest
numerically.
Data
location
Routing
table
lookup(k)successor(k)
lookup(k)region(k)
lookup(k) nearest(k)
PAST, SCRIBE, OceanStore
2 or d-dimensional torus
O (log N )
Successor set +
O (log N ) fingers
O(d )
1-dimensional cycle
O(d  d N )
O(d ) neighbors
O (log N )
O (log N )
O(| L |) leaf set +
O(| M |) proximity set +
O (log N ) neighobrs
Why
P2P
1st Generation
2nd Generation
Generic Model
Problems
Conclusion
13
Overlay Networks: Typical Systems 2
DeBruijn
Butterfly
CCC
Systems
Koorde[MIT]
ODRI[Texas A&M]
Viceroy[Weizman]
Cycloid[NJU,Wayne]
Persons
Kaashoek, Karger,
Malkhi, Naor, Ratajczak
Guihai Chen
Chengzhong Xu
Loguinov,Kumar, Rai
??
?
Applications
???
Key space
1-dimensional cycle
Space-time
complexity
O(d )
O (log N )
???
1-dimensional cycle
O(d )
O (log N )
2-dimensional cycle
O(d )
O (log N )
Data
distribution
Each node holds a segment
of data keys between
predecessor and itself.
Each node holds a
segment of keys that are
the closest numerically.
Each node holds a segment of
data keys that are the closest
numerically.
Data
location
Routing
table
lookup(k)successor(k)
lookup(k) nearest(k)
lookup(k) nearest(k)
Why
Successor set +
O(d ) fingers
P2P
1st Generation
2nd Generation
7 neighbors
Generic Model
5 neighbors
Problems
Conclusion
14
Overlay Networks: a generic model
Over l ay Net wor k
peer 1
peer 2
peer n
r out i ng and
l ocat i ng
r out i ng and
l ocat i ng
r out i ng and
l ocat i ng
al gor i t hm
al gor i t hm
al gor i t hm
r out i ng t abl e
r out i ng t abl e
r out i ng t abl e
Dat a St or age
Dat a St or age
Dat a St or age
Dat a Cache
Dat a Cache
Dat a Cache
I nt er net : Suppor t i ng Net or k
A Gener i c Topol ogi cal Model of P2P Syst ems
Why
P2P
1st Generation
2nd Generation
Generic Model
Problems
Conclusion
15
Overlay Networks: criteria, issues or topics
Can one network be modified as a P2P overlay network?
•
•
Ordered Key Space: a necessary measurement of distance.
Convergent Routing Algorithm: arriving at the destination after
fixed number of steps
Resilient Connection Pattern: node maintain continuous
connections to neighbors.
•
Factors affecting the performance of P2P systems
•
•
•
•
Degree: the number of neighbors
Routing length: the number of hops
fault tolerance: what fraction of nodes can fail
Maintenance overhead: how many messages are passed to
maintain coherence
load balance: how evenly keys are distributed, how often each
node works as an intermediate node for other routs.
•
Why
P2P
1st Generation
2nd Generation
Generic Model
Problems
Conclusion
16
P2P Issues
Security and Protection
Trust
Anonymity
Reputation
Business and
Legal Issues
Business Models
Intellectual Property Rights
Sociometry
Small World Phenomena
Power-Law Networks
P2P
Distributed Databases
Query Decomposition
Query Distribution
Mediation
Why
P2P
1st Generation
Network Architecture
and Design
Network Topology
Routing
Overlay Networks
Intelligent Agents/
Web-based Services
Matchmaking
Service Description
2nd Generation
Generic Model
Distributed Data
Structures
Distributed Hash Tables
Scalable Distributed
Data Structures
Problems
Conclusion
17
P2P Issues
1) What topologies can be used for P2P systems?
2) How to determine the dimension of overlay networks?
3) Tradeoff between degree and routing length?
4) Fault tolerance in quantitative formulation ?
5) More flexible Hash function?
6) Proximity problem ?
7) Big peer and small peer problem?
Why
P2P
1st Generation
2nd Generation
Generic Model
Problems
Conclusion
18
conclusion

Next generation of the Internet is Grid;

Next generation of the Grid is P2P;

Next generation of P2P is structured;
Why
P2P
1st Generation
2nd Generation
Generic Model
Problems
Conclusion
19
Download