Peer-to-Peer Computing

advertisement
Peer-to-Peer Computing
CSC8530 – Dr. Prasad
Jon A. Preston
April 21, 2004
Agenda

Overview of Peer-to-peer computing
Parallel Downloading
Peer-to-Peer Media Streaming
References

Collaborative Software Engineering



Peer-to-Peer Computing





Autonomy from centralized servers
Dynamic (peers added & removed
frequently)
File Sharing (KaZaA – outpaces Web traffic,
3,000 terabytes, 3 million up peers)
Communication (instant messenger)
Computation (seti@home)
Peer-to-Peer Computing (cont)



De-centralized data sharing
Dynamic growth of system capacity
Various data lookup/discovery schemes
–
–
–

Centralized directory servers (Napster)
Controlled request flooding (Gnutella)
Hierarchy with supernodes (KaZaA)
Heterogeneous collection of peers
–
Need a way of encouraging reporting of true outgoing
bandwidth
Worldwide Computer
(P2P Computation)







“Moonlight” your computer
Share/lease processor and storage
Process others’ simulations, etc.
Archive other’s files (even when computer off)
Receive micropayments for services rendered
PC is component of worldwide computer
“Internet-scale OS” – centralized structure
–
Must allocate resources, coordination, security/privacy, etc.
Parallel Downloading



Potential widespread utilization on P2P
networks
Past work shows parallel downloading (PD)
has higher aggregated downloading
throughput
Shorter download times by clients
Communication in PD


Client must determine segments of file for
each server request
Alternative: “Tornado Code”
–
–
–
Servers keep sending until client says “enough”
Requires less communication about quantity and
which part of the file the client wants
Does require high buffering on client (entire file)
Parallel vs. Sequential Download

Parallel incurs non-trivial cost
–
–
–

Synchronization
Coordination
Encoding/decoding
Adopt PD if download performance improves
significantly…
Large-Scale Deployment of PD

Koo et al developed a model in May 2003
that shows SD is better than PD
–
–
–
–
Assumes that Capacityservers >> Capacityclients
Homogenous network
Analyzed average download time
Performance is similar, but SD requires less
overhead
Peer-to-Peer Media Streaming

Peer-to-peer file sharing
–
–

Act as server and client
“Open-after-download”
Media Streaming
–
–
–
–
“Play-while-downloading”
Subset of peers “owns” a media file
These peers stream media to requesting peers
Recipients become supplying peers themselves
Characteristics of P2P Media
Streaming Systems




Self-growing – requesting peers become supplying
peers (total system capacity grows)
Serverless – each peer is not to act as server (open
large number of simultaneous/client connections)
Heterogeneous – peers contribute different
outbound connection bandwidths
Many-to-one – many supplying peers to one realtime playing client (hard deadlines)
Two Problems

Media data assignment

Fast amplification
Media Data Assignment

Given
–
–
–

Requesting peer
Multiple supplying peers
Heterogeneous outbound bandwidth on suppliers
Determine
–
Subset of media to request from each supplier
A
B
C
D
Variable
Buffer Delays
Buffer delay depends
upon the ordering
of which segments of
the media file to obtain
from each supplying
peer.
Fast Amplification

Differential selection algorithm
–
–
–
–
–
Favor higher-class (higher outbound bandwidth)
Ultimately benefit all requesting peers
Should not starve any lower-class peer
Enforced via pure distributed algorithm
Probability of selection proportional to requesting
peer’s promised outbound bandwidth
Variable
Capacity
Growth
Selection Algorithm

Each supplying peer
–
–
–
Determines which requesting peer to serve
Maintains probability vector – one entry per class
of peers (class defined by bandwidth)
Receives “reminders” from peers


If supplier (Ps) is busy, it can receive a reminder from
requesting peer (Pr)
This reminder tells the supplier to remember the
requesting peer (Pr) and not elevate other peers in
classes below Pr when current service complete
Admission Probability Vector



One entry per class-i set of peers
If not busy, Ps grants request of Pr with probability
Pr[i], where i = class of Pr
If Ps is a class-k peer, Pr[i] defined as follows
–
–


For i < k, Pr[i] = 1.0 (favored class)
For i >= k, Pr[i] = 1/(2i-k)
If idle, elevate non-favored (and non-served) entries
by factor of 2 (i.e. Pr[i] = Pr[i] * 2)
Use reminders to effect what happens after service
completed (raise or not)
Making a Request


Knows candidate supplying peers {Ps1, Ps2, … Psn}
Pr will be admitted if it obtains permission from
enough suppliers such that aggregated outbound
bandwidth sufficient to service request
–


Requesting peer then computes media data assignment
If not admitted, send “reminders” to busy supplying
peers that favor Pr. Backoff exponentially.
When request is finished, Pr becomes a supplying
peer, increasing the overall system capacity.
Differential Acceptance Results
Non-differential Acceptance Results
References





Simon Koo, Catherine Rosenberg, Dongyan Xu, "Analysis of Parallel
Downloading for Large File Distribution", Proceedings of IEEE
International Workshop on Future Trends in Distributed Computing
Systems (FTDCS 2003), San Juan, PR, May 2003.
Dongyan Xu, Mohamed Hefeeda, Susanne Hambrusch, Bharat
Bhargava, "On Peer-to-Peer Media Streaming", Proceedings of IEEE
International Conference on Distributed Computing Systems (ICDCS
2002), Wien, Austria, July 2002
Ripeanu, M. Peer-to-peer architecture case study: Gnutella network. In
International Conference on Peer-to-peer Computing (2001).
J. Kangasharju, K.W. Ross, D. Turner, Adaptive Content Management
in Structured P2P Communities, 2002,
http://cis.poly.edu/~ross/papers/AdaptiveContentManagement.pdf
Androutsellis-Theotokis S. Whitepaper: A Survey of Peer-to-Peer File
Sharing Technologies, Athens University of Economics and Business,
Greece, 2002.
Collaborative Software Engineering







Overview of Collaborative Computing
Synchronous and Asynchronous
Notification Algorithms
Distributed Mutex
Achieving “undo” and “redo”
Transparencies vs. Awareness
Distributed Software Engineering
Overview of Collaborative Computing

Utilize computing to improve workflow and
coordination/communication
–
–
–
–

Shared displays/applications
Online meetings
Collaborative development (configuration
management)
Minimize impact of physical distance
Collaboratories
–
Emulate scientific labs
Synchronous and Asynchronous

Synchronous
–
–
–

Same time, different place
ICQ, Chat, etc.
Can store session
Asynchronous
–
–
–
Different time, same/different place
Email, newsgroups, web forums
Store session, replay
Notification Algorithms

Unicast
–

Multicast
–
–

Significant bandwidth consumption
Network flooding
Frequency
–
–

Latency potential issue
Synchronous implies high frequency of change notifications
Asynchronous implies low frequency of change notifications
Granularity
–
–
Differentials or whole state
How to incorporate new users (latecomers)
Distributed Mutex

Token-based
–
–
–

Only the process that holds the token can enter the critical
section
Transmission of token algorithm (round-robin, hold & wait
for request)
How does a process know where to request token?
Permission-based
–
–
–
Sends request to enter CS to other processes
Other processes get to “vote”
Process enters CS only if it achieves enough votes
Achieving “undo” and “redo”

Particularly important in collaborative systems
–
–



High level of “what if” inherent in the system
Others might adversely affect someone else’s work
In OO-based systems, undo and redo are inverses of each
other
In text-based systems, insert and delete are inverses of each
other
In bitmap-based systems, undo and redo are not so easy
–
–
Save entire image (too much space)
Save only differential area (replay sequence of actions to recreate
state)
Transparencies vs. Awareness

Does the application know about the collaboration or
not?
–
Transparencies




–
Communication layer sits on top of the application
Useful for sharing legacy systems
Have no access to source (or cannot modify it)
Negative – no concurrency (one input/output at a time)
Aware Applications



Collaboration integrated into the application
Requires centralized execution with distributed I/O
Or requires a homogeneous architecture (same client on each
users’ machine)
Distributed Software Engineering






Synchronous and asynchronous
collaboration
Provide meta view of others in system
Allow for viewing of entire current system
Fine-grain source locking/check-out
Provide sandbox for developers to test/build
local source
How do we improve concurrency?
Handling Concurrent Development


Split-combine (low level of concurrent
development)
Copy-merge (high level of concurrency,
problematic to merge)
Download