p2p-unstructured - Computer Science and Engineering

advertisement

1

Overview of Peer-to-Peer Systems,

Grids, and Clouds (Part 1)

Unstructured P2P Systems

COP 6390.all

02/21/2011

Anda Iamnitchi

2

Google trends: Grid computing ( blue ), peer-to-peer ( red ), cloud computing ( orange )

3

What they have in common

Real instances of distributed systems

 Passed the test of deployment and usability

Large-scale distributed systems

Interesting phenomena at large scale

Scale: users, resources, geographical

Why Peer-to-Peer Systems?

Wide-spread user experience

Large-scale distributed application, unprecedented growth and popularity

 KaZaA – 389 millions downloads

(1M/week) one of the most popular applications ever!

Heavily researched in the last 10 years with results in:

User behavior characterization

Scalability

Novel problems: reputation, trust, incentives for fairness

Commercial impact eDonkey

FastTrack (Kazaa)

Gnutella

Cvernet

3,108,909

2,114,120

2,899,788

691,750

Filetopia 3,405

Number of users for file-sharing applications

(estimate www.slyck.com, Sept ‘06)

What Is a P2P System?

Node

Node

Internet

Node

Node Node

5

A distributed system architecture:

 No centralized control (debatable: Napster?)

 Nodes are symmetric in function (debatable: New Gnutella protocol?)

Large number of unreliable nodes

Initially identified with music file sharing

P2P Definition(s)

A number of definitions coexist:

 Def 1 : “A class of applications that takes advantage of resources — storage, cycles, content, human presence — available at the edges of the Internet.”

 Edges often turned off, without permanent IP addresses

 Def 2 : “A class of decentralized, self-organizing distributed systems, in which all or most communication is symmetric.”

6

Lots of other definitions that fit in between

The Promise of P2P Computing

High capacity through parallelism:

 Many disks

Many network connections

Many CPUs

Reliability:

 Many replicas:

 of data of network paths

Geographic distribution

Automatic configuration

Useful in public and proprietary settings

7

Popularity since 2004

Britney Spears 1.00

p2p 0.40

8

Normalized and compared to the popularity of Britney Spears as shown by Google Trends

Napster: History

 Program for sharing files over the Internet

 History:

 5/99: Shawn Fanning (freshman, Northeasten U.) founds Napster Online music service

12/99: first lawsuit

3/00: 25% UWisc traffic Napster

2000: est. 60M users

2/01 : US Circuit Court of Appeals:

9

Napster knew users violating copyright laws

 7/01: # simultaneous online users:

Napster 160K, Gnutella: 40K, Morpheus: 300K

Basic Primitives for File Sharing

10

Join: How do I begin participating?

Publish: How do I advertise my file(s)?

Search: How do I find a file?

Fetch: How do I retrieve a file?

Napster: How It Works

napster.com

11

• Client-Server: Use central server to locate files

• Peer-to-Peer: Download files directly from peers

12

Napster

1. File list is uploaded

(Join and Publish) napster.com

users

13

Napster

2. User requests search at server (Search).

Request and results napster.com

user

14

Napster

3.

User pings hosts that apparently have data.

Looks for best transfer rate.

ping napster.com

ping user

15

Napster

4.

User retrieves file

(Fetch) napster.com

Download file user

Lessons Learned from Napster

 Strengths: Decentralization of Storage

 Every node “pays” its participation by providing access to its resources

 physical resources (disk, network), knowledge (annotations), ownership (files)

Every participating node acts as both a client and a server (“servent”): P2P

Decentralization of cost and administration = avoiding resource bottlenecks

 Weaknesses: Centralization of Data Access Structures (Index)

 Server is single point of failure

Unique entity required for controlling the system = design bottleneck

Copying copyrighted material made Napster target of legal attack increasing degree of resource sharing and decentralization

Centralized

System

Decentralized

System

16

17

Gnutella: File-Sharing with No Central

Server

Gnutella: History

18

Developed in a 14 days “quick hack” by Nullsoft (winamp)

 Originally intended for “exchange of cooking recipes”

Evolution of Gnutella

 Published under GNU General Public License on the Nullsoft web server

Taken off after a couple of hours by AOL (owner of Nullsoft): too late

Gnutella protocol was reverse engineered from the original

Protocol published

Third-party clients were published and Gnutella started to spread

 Many iterations to fix poor initial design

High impact:

 Many versions implemented

Many different designs

Lots of research papers/ideas

19

Gnutella: Search in an Unstructured

Overlay

I have file A.

I have file A.

Reply

Flooding

Query

Where is file A?

Gnutella: Overview

Join: on startup, client contacts a few other nodes; these become its “neighbors”

 Initial list of contacts published as gnutellahosts.com:6346

Outside the Gnutella protocol specification

Default value for number of open connections (neighbors): C = 4

Publish: no need

Search:

Flooding: ask neighbors, who ask their neighbors, and so on...

Each forwarding of requests decreases a TTL. Default: TTL = 7

When/if found, reply to sender

Drop forwarding requests when TTL expires

One request leads to

2 *

TTL i

0

C * ( C

1 ) i 

26 , 240

Back-propagation in case of success (Why?) messages

Fetch: get the file directly from peer (HTTP)

Gnutella: Protocol Message Types

Type

Ping

Pong

Query

QueryHit

Push

Description

Announce availability and probe for other servents

Response to a ping

Search request

Returned by servents that have the requested file

File download requests for servents behind a firewall

Contained Information

None

IP address and port# of responding servent; number and total kb of files shared

Minimum network bandwidth of responding servent; search criteria

IP address, port# and network bandwidth of responding servent; number of results and result set

Servent identifier; index of requested file; IP address and port to send file to

22

What would you ask about a Gnutella network?

Gnutella: Tools for Network Exploration

Eavesdrop traffic - insert modified node into the network and log traffic.

Crawler - connect to active nodes and use the membership protocol to discover membership and topology.

23

24

Gnutella: Heterogeneity

All Peers Equal? (1)

1.5Mbps DSL 1.5Mbps DSL

56kbps Modem

1.5Mbps DSL

10Mbps LAN

1.5Mbps DSL

56kbps Modem

56kbps Modem

25

Gnutella Network Structure: Improvement

Gnutella Protocol 0.6

Two tier architectures of ultrapeers and leaves

Ultrapeers

Leaves

Data transfer (file download)

Control messages (search, join, etc)

26

Déjà vu?

Gnutella Protocol 0.6

Two tier architectures of ultrapeers and leaves

Ultrapeers

Leaves

Data transfer (file download)

Control messages (search, join, etc)

Gnutella: Free Riding

All Peers Equal? (2)

27

More than 25% of Gnutella clients share no files; 75% share 100 files or less

Conclusion: Gnutella has a high percentage of free riders

If only a few individuals contribute to the public good, these few peers effectively act as centralized servers.

Outcome:

Significant efforts in building incentive-based systems

BitTorrent?

Adar and Huberman (Aug ’00)

Flooding in Gnutella: Loops?

28

Seen request already

29

Improvements of Message Flooding

Expanding Ring

 start search with small TTL (e.g. TTL = 1)

 if no success iteratively increase TTL (e.g. TTL = TTL +2) k-Random Walkers

 forward query to one randomly chosen neighbor only, with large TTL

 start k random walkers random walker periodically checks with requester whether to continue

 Experiences (from simulation)

 adaptive TTL is useful

 message duplication should be avoided flooding should be controlled at fine granularity

30

Gnutella Topology (Mis)match?

31

Gnutella: Network Size?

Explosive growth in 2001, slowly shrinking thereafter

• High user interest

– Users tolerate high latency, low quality results

• Better resources

– DSL and cable modem nodes grew from 24% to 41% over first 6 months.

Is Gnutella a Power-Law Network?

Power-law networks: the number of links per node follows a power-law distribution N = L -k

10000

1000

100

10

November 2000

Examples of power-law networks:

– The Internet at AS level

– In/out links to/from HTML pages

– Airports

– US power grid

– Social networks

32

1

1 10

Number of links (log scale)

100

Implications : High tolerance to random node failure but low reliability when facing of an ‘intelligent’ adversary

Network Resilience

33

Partial Topology Random 30% die Targeted 4% die from Saroiu et al ., MMCN 2002

Is Gnutella a Power-Law Network?

(Later Data)

Later, larger networks display a bimodal distribution

Implications :

– High tolerance to random node failures preserved

– Increased reliability when facing an attack.

10000

May 2001

1000

100

From Ripeanu, Iamnitchi, Foster, 2002

34

10

1

1 10

Number of links (log scale)

100

35

Discussion on Unstructured P2P

Networks

Performance

 Search latency: low (graph properties)

 Message Bandwidth: high

 improvements through random walkers, but essentially the whole network needs to be explored

Storage cost: low (only local neighborhood)

Update cost: low (only local updates)

Resilience to failures good: multiple paths are explored and data is replicated

Qualitative Criteria

 search predicates: very flexible, any predicate is possible

 global knowledge: none required peer autonomy: high

36

BitTorrent

37

BitTorrent Components

Torrent File

 Metadata of file to be shared

Address of tracker

List of pieces and their checksums

Tracker

 Lists peers interested in the distribution of the file

Peers

 Clients interested in the distribution of the file

 Can be “seeds” or “leachers”

38

A BitTorrent Swarm

• A “seed” node has the file

• A “tracker” associated with the file

• A “.torrent” meta-file is built for the file: identifies the address of the tracker node

• The .torrent file is published on web

• File is split into fixed-size segments (e.g.,

256KB)

39

Choking Algorithm

Each connected peer is in one of two states

Choked: Download requests by a choked peer are ignored

Unchoked: Download requests by an unchoked peer are honored

Choking occurs at the peer level

Each peer has a certain number of unchoke slots

 4 regular unchoke slots (per BitTorrent standard)

 1 optimistic unchoke slot (per BitTorrent standard)

Choking Algorithm

 Peers unchoke connected peers with best service rate

 Service rate = rolling 20 second average of its upload bandwidth

Optimistically unchoking peers prevents a static set of unchoked peers

The choking algorithm runs every 10 seconds

Peers optimistically unchoked every 30 seconds

 New peers are 3 times more likely to be optimistically unchoked

40

Piece Selection

Random First Piece

 Piece download at random

 Algorithm used by new peers

Rarest Piece First

 Ensures > 1 distributed copies of a piece

Increases interest of connected peers

Increases scalability

Random Piece vs. Rarest Piece

 Rarest has probabilistically high download time

 New peers want to reduce download time but also increase their interest

41

BitTorrent: Overview

Join: nothing

 Just find that there is a community ready to host your tracker

Publish: Create tracker, upload .torrent metadata file

Search:

 For file: nothing

 the community is supposed to provide search tools

For segments: exchange segment IDs maps with other peers.

Fetch: exchange segments with other peers (HTTP)

42

Gnutella vs. BitTorrent: Discussion

Architecture

 Decentralization?

System properties

 Reliability?

Scalability?

Fairness?

Overheads?

Quality of Service

Search coverage for content?

Ability to download content fast?

Download