Adriana Iamnitchi (Anda) anda@cse.usf.edu
Contact Info
Email: anda@cse.usf.edu
Office: ENB 334
Office hours: Wednesdays, 10:45 – 1:00 and by appointment
Course page: http://www.csee.usf.edu/~anda/CIS6930.5
CIS6930.5: Federated Distributed Systems (Fall 2005)
2
Examples of Distributed Systems
ATT web Gnutella network
A Sensor Network
CIS6930.5: Federated Distributed Systems (Fall 2005)
The Internet
3
Definition (a version)
A distributed system is a collection of autonomous, programmable, failure-prone entities that are able to communicate through a communication medium that is unreliable.
– Entity=a process on a device (PC, PDA, mote)
– Communication Medium=Wired or wireless network
“Federated” – spanning multiple institutional or network (DNS) domains
4
CIS6930.5: Federated Distributed Systems (Fall 2005)
Outline
Case study: Seti, Napster, Gnutella
Administravia
CIS6930.5: Federated Distributed Systems (Fall 2005)
5
CIS6930.5: Federated Distributed Systems (Fall 2005)
6
SETI@home Operations tape backup tape archive, delete redundancy checking data recorder
DLT tapes garbage collector user DB acct.
queue splitters science DB result queue screensavers
WU storage data server
CGI program web page generator web site master DB
RFI elimination repeat detection
CIS6930.5: Federated Distributed Systems (Fall 2005)
7
Master-worker architecture
How does it work?
SETI@home
Fixed-rate data processing task
Low bandwidth/computation ratio
Independent parallelism
Error tolerance
CIS6930.5: Federated Distributed Systems (Fall 2005)
8
History and Statistics
Conceived 1995, launched April 1999
“scientific experiment that uses Internet-connected computers in the Search for Extraterrestrial
Intelligence (SETI). You can participate by running a free program that downloads and analyzes radio
telescope data. “
No ET signals yet, but other results
Total Last 24 Hours
(as of Wed Feb 23 07:04:51)
Users 5,361,313 4,391
Results received 1,779 millions
Total CPU time 2.2 million years
Average CPU time/work unit
10 hr 58 min 14.0 sec
CIS6930.5: Federated Distributed Systems (Fall 2005)
5 million
3610.717 years
6 hr 19 min 30.1 sec
9
Public-resource computing
Utilizes idle computing cycles over Internet
Other systems:
– Original: GIMPS, distributed.net
– Commercial: United Devices, Entropia,
Porivo, Popular Power
– Academic, open-source
> Cosm, folding@home
CIS6930.5: Federated Distributed Systems (Fall 2005)
10
None of the popularity of SETI!
ET
How to get and retain users (from David Anderson, the leader of the SETI@home project)
– Graphics are important (but monitors do burn in)
– Teams: users recruit other users
– Keep users informed
Science news
System management news
Periodic project emails
Reward users:
– PDF certificates
– Milestone pages and emails
– Leader boards (overall, country, …)
CIS6930.5: Federated Distributed Systems (Fall 2005)
11
Millions and millions of computers!
(Problems)
Server scalability
Dealing with excess CPU time
Cheating
Bad behavior:
– Team recruitment by spam
– Sale of accounts on eBay
Malfunctions
Network bandwidth costs money
CIS6930.5: Federated Distributed Systems (Fall 2005)
12
SETI@home: Summary
Master-worker design
– Centralized solution
> Master=central point of control
> Single point of failure
> Performance bottleneck
Incentives for participation
– Mean sometimes incentives for cheating
Massive (“embarrassing”) parallelism
Low bandwidth/computation ratio
Users do donate real resources: $1.5M / year consumed power
More information: http://setiathome.ssl.berkeley.edu
CIS6930.5: Federated Distributed Systems (Fall 2005)
13
Outline
Case study: Seti, Napster, Gnutella
Administravia
CIS6930.5: Federated Distributed Systems (Fall 2005)
14
The File Location Problem
(Napster and Gnutella)
Where is file A?
CIS6930.5: Federated Distributed Systems (Fall 2005)
15
Napster: How It Works napster.com
• Client-server: Use central server to locate files
• Download files directly from peers
CIS6930.5: Federated Distributed Systems (Fall 2005)
16
1.
File list is uploaded
Napster napster.com
CIS6930.5: Federated Distributed Systems (Fall 2005) users
17
2.
User requests search at server.
Napster napster.com
Request and results user
CIS6930.5: Federated Distributed Systems (Fall 2005)
18
Napster
3.
User pings hosts that apparently have data.
Looks for best transfer rate.
pings napster.com
user pings
CIS6930.5: Federated Distributed Systems (Fall 2005)
19
4.
User retrieves file
Napster napster.com
Retrieves file user
CIS6930.5: Federated Distributed Systems (Fall 2005)
20
Napster: History
Program for sharing files over the Internet
History:
– 5/99: Shawn Fanning (freshman, Northeasten U.) founds Napster Online music service
– 12/99: first lawsuit
– 3/00: 25% UWisc traffic Napster
– 2000: est. 60M users
– 2/01 : US Circuit Court of
Appeals: Napster knew users violating copyright laws
– 7/01: # simultaneous online users:
Napster 160K, Gnutella: 40K, Morpheus: 300K
CIS6930.5: Federated Distributed Systems (Fall 2005)
21
Napster: Summary
Centralized server:
– Client-server architecture
– Single logical point of failure
– Potential for congestion (bottleneck)
– Napster “in control” (freedom is an illusion)
No security:
– Passwords in plain text
– No authentication
– No anonymity
CIS6930.5: Federated Distributed Systems (Fall 2005)
22
Outline
Public-resource computing
– Case study: Seti@home
Peer-to-peer systems
– Case study 1: Napster
– Case study 2: Gnutella
Discuss:
– Characteristics
– Impact
– Architecture
– Killer application
CIS6930.5: Federated Distributed Systems (Fall 2005)
23
Gnutella: Search for Files with No
Central Server napster.com
CIS6930.5: Federated Distributed Systems (Fall 2005)
24
Ideas?
Where is file A?
CIS6930.5: Federated Distributed Systems (Fall 2005)
25
I have file A.
Gnutella: Search
I have file A.
Reply
Flooding
Query
Where is file A?
CIS6930.5: Federated Distributed Systems (Fall 2005)
26
Gnutella: History and Statistics
Gnutella history:
– 3/14/00: release by AOL, almost immediately withdrawn
– too late: 1,859,340 users on Gnutella on August 25, 2am
– many iterations to fix poor initial design
High impact:
– Versions implemented
– Different designs
– Lots of research papers/ideas
Network eDonkey2K
FastTrack
Users
4,123,688
2,521,887
Gnutella
Overnet
DirectConnect
MP2P
1,516,762
1,146,880
294,255
251,137
(www.slyck.com, 06/24/’05)
27
CIS6930.5: Federated Distributed Systems (Fall 2005)
What would you ask about Gnutella?
…
…
CIS6930.5: Federated Distributed Systems (Fall 2005)
28
Gnutella: Heterogeneity
All Peers Equal? (1)
1.5Mbps DSL
1.5Mbps DSL
1.5Mbps DSL
56kbps Modem
1.5Mbps DSL
56kbps Modem
10Mbps LAN
56kbps Modem
CIS6930.5: Federated Distributed Systems (Fall 2005)
29
Gnutella: Free Riding
All Peers Equal? (2)
More than 25% of
Gnutella clients share no files; 75% share 100 files or less
Conclusion: Gnutella has a high percentage of free riders
If only a few individuals contribute to the public good, these few peers effectively act as centralized servers.
CIS6930.5: Federated Distributed Systems (Fall 2005)
Adar and Huberman (Aug ’00)
30
Flooding in Gnutella: Loop Prevention
Seen request already
CIS6930.5: Federated Distributed Systems (Fall 2005)
31
Gnutella Topology Mismatch
CIS6930.5: Federated Distributed Systems (Fall 2005)
32
Gnutella Summary
Search by flooding
Self-configuring
Phenomena:
– Not all peers equal
– Free riding
Problems:
– Topology mismatch
– Duplicates due to flooding
Good source for technical info/open questions:
– http://www.limewire.com/index.jsp/tech_papers
CIS6930.5: Federated Distributed Systems (Fall 2005)
33
Problems in Distributed Systems
…
Communication
– Routing [IP,BGP]
– Multicast [IP multicast, SRM, RMTP]
Post and retrieve [Usenet]
Search [Gnutella, Kazaa, etc., Google]
Storage [Databases]
Coordination [SETI@Home]
…
CIS6930.5: Federated Distributed Systems (Fall 2005)
34
…
Failures
Scale
Asynchrony
Security
Deployment
Adoption
…
Challenges
CIS6930.5: Federated Distributed Systems (Fall 2005)
35
Challenges (2)
…
Learn from usage
– Example 1: The Internet
– Example 2: Napster
Conflicting requirements:
– Light but adaptable?
– Light but data-consistent? (think transactions)
– … (other examples?)
… (other examples?)
CIS6930.5: Federated Distributed Systems (Fall 2005)
36
Course Organization/Syllabus/etc.
CIS6930.5: Federated Distributed Systems (Fall 2005)
37
Administravia: Grading
Reviewing:30%
Discussion leading: 15%
Project: 55%
– Aim high!
– Have fun!
CIS6930.5: Federated Distributed Systems (Fall 2005)
38
Administravia:
Paper Reviewing (1)
Goals:
– Think of what you read
– Get used to writing paper reviews
Reviews due by midnight before class
Follow the form when relevant .
State the main contribution of the paper
Critique the main contribution.
– Rate the significance of the paper on a scale of 5
(breakthrough), 4 (significant contribution), 3 (modest contribution), 2 (incremental contribution), 1 (no contribution or negative contribution). Explain your rating in a sentence or two.
39
CIS6930.5: Federated Distributed Systems (Fall 2005)
Administravia:
Paper Reviewing (2)
Rate how convincing the methodology is.
Do the claims and conclusions follow from the experiments?
Are the assumptions realistic?
Are the experiments well designed?
Are there different experiments that would be more convincing?
Are there other alternatives the authors should have considered?
(And, of course, is the paper free of methodological errors?)
CIS6930.5: Federated Distributed Systems (Fall 2005)
40
Administravia:
Paper Reviewing (3)
What is the most important limitation of the approach?
What are the three strongest and/or most interesting ideas in the paper?
What are the three most striking weaknesses in the paper?
Name three questions that you would like to ask the authors.
Detail an interesting extension to the work not mentioned in the future work section.
Optional comments on the paper that you’d like to see discussed in class.
41
CIS6930.5: Federated Distributed Systems (Fall 2005)
Paper Reviewing (final)
Be professional in your writing
Have an eye on the writing style:
– Clarity
– Beware of traps: learn to use them in writing and detect them in reading
– Detect (and stay away from) trivial claims.
E.g., 1 st sentence in the Introduction:
“The tremendous/unprecedented/phenomenal growth/scale/ubiquity of the Internet…”
42
CIS6930.5: Federated Distributed Systems (Fall 2005)
Administravia:
Discussion leading
Come prepared!
– Prepare discussion outline
– Prepare questions:
> “What if”s
> Unclear things
> …
– Similar ideas in different contexts
– Initiate short brainstorming sessions
Leaders do NOT need to submit paper reviews
Main goals:
– Keep discussion flowing
– Keep discussion relevant
– Engage everybody (I’ll have an eye on this, too)
CIS6930.5: Federated Distributed Systems (Fall 2005)
43
Administravia:
Projects
Combine with your research if relevant to the class
Get approval from all instructors if you overlap final projects:
– Don’t sell the same piece of work twice
– You can get more than twice as many results with less than twice as much work
Aim high!
– Put one extra month and get a publication out of it
– It is doable
Try ideas that you postponed out of fear: it’s just a class, not your PhD.
CIS6930.5: Federated Distributed Systems (Fall 2005)
44
Administravia:
Project deadlines (tentative)
Sept. 15: 1-page project proposal
Oct. 11: 3-page literature survey
– Know relevant work in your problem area
– If implementation project, list tools, similar projects
Nov. 11: 5-page Midterm project due
– Have a clear image of what’s possible/doable
– Report preliminary results
Last class(es):In-class project presentation
– Demo, if appropriate
Dec. 16:
– 10-page write-up
CIS6930.5: Federated Distributed Systems (Fall 2005)
45
Next Class (Wed, August 31)
Read the 4 chapters from the Grid book
Send brief summaries (lists of ideas/problems discussed, etc)
– Do not follow the reviewing form
– Be brief and efficient!
– Be BRIEF and EFFICIENT!
In-class discussion + some project ideas
Need discussion leader to team up with me for the class next week:
– The structure of networks (pick 2):
1.
Small-world file sharing communities, Iamnitchi, Ripeanu, Foster.
Infocom 2004.
2.
On Power-Law Relationships of the Internet Topology, Faloutsos,
Faloutsos, and Faloutsos, SIGCOMM 1999
3.
Mapping the Gnutella network, M. Ripeanu et al, IEEE Computing
Journal 2002.
CIS6930.5: Federated Distributed Systems (Fall 2005)
46
Questions?
CIS6930.5: Federated Distributed Systems (Fall 2005)
47