Research Issues for Building and Integrating Peer-based and Grid Systems Xiaodong Zhang

advertisement
Research Issues for Building and
Integrating Peer-based and Grid Systems
Xiaodong Zhang
National Science Foundation
This talk does not necessarily reflect NSFs official opinions
Hardware Cost and Implications
.
$400,000/MIPS (Cray-I)
z
z
z
.
$250/MIPS (i860)
1980
1990
.
$2/MIPS or less
2002
Storages are large and cheap.
Information and computing
available everywhere.
Major Challenges:
distributed resource management
security and privacy
Impact on US Computer Exports
z
z
Speed Limits on Computer Exports
- Russia, China, India, and Middle East Countries
- Millions of Theoretical Operations Per Second (MTOPS)
Before 2001, MTOPS = 28,000
- less powerful than a cluster of ten 1.5 GHz/2-way PCs.
z
2001, MTOPS = 85,000
- less powerful than a cluster of ten 2.2 GHz/4-way PCs.
z
2002, MTOPS = 195,000 MTOPS
-less powerful than a cluster of ten 3 GHz/8-way PCs.
MTOPS Hardly Reflects Reality
z
z
z
MTOPS views a computer as a high
performance calculator.
- ignores the deep memory hierarchy,
- ignores the fast internel interconnections,
- ignores the power of clusters, and
- ignores resource sharing using Internet.
Senete passed a bill to remove MTOPS on 9/6/01.
The computing power is mainly determined
by effective utilization of aggregated
networked resources.
Client/Server based IT Infrastructure
z
Services provided by data/computing centers.
z
Grid and Web search engines are server-based.
z
Each server can be built by a distributed cluster.
z
Inter- and intra resource coordination.
z
Services are guaranteed and trusted
z
Security is enforced within each server.
Current and Coming Internet Systems
z
z
z
The rapid growing Internet services are
provided by an increasing number of peers.
Variety of devices: ranging from a cell
phone to a Supercomputer Center.
Pervasive computing: access information
and services anytime and anywhere.
Client/Server Model is Being Challenged
No single server or search engine can
sufficiently cover increasing Web contents.
z
z
z
2×1018 Bytes/year generated in Internet.
But only 3×1012 Bytes/year available to
public (0.00015%).
Google only searches 1.3×108 Web pages.
(Source: Gong, IEEE Internet Computing, 2001)
Client/Server (continued)
Client/server model seriously limits utilization
of available bandwidth and service.
z
z
z
Popular servers and search engines become
traffic bottlenecks.
But high speed networks connecting many
clients become idle.
Computing cycles and information in clients
are ignored.
A New Paradigm: Peer-oriented Systems
z
Both client (consumer) & server (producer).
z
Has the freedom to join and leave any time.
z
z
Huge peer diversity: service ability, storage
space, networking speed, and service demand.
A widely decentralized system opening for both
opportunities and new concerns.
Peer-oriented Systems
Client/server
Server
a search
Pure P2P
engine/grid
e.g. Freenet & Gnutella
Hybrid P2P
directory
e.g.
Napster
Peer-oriented Applications
z
z
z
File Sharing: document sharing among peers
with no or limited central controls.
Instant Messaging (IM): Immediate voice and
file exchanges among peers.
Distributed Processing: One can widely utilize
resources available in other remote peers.
Problem 1: Loosing Security and Privacy
z
Providing a conduit for evil code and viruses.
z
Providing loopholes for information leakage.
z
Relaxing the privacy protection by exposing
peer identities.
Problem 2: Weak Resource Coordinations
z
z
z
With limited or no central control, but
mainly reply on self-organization.
Lacking communication monitoring and
scheduling: cause unnecessary traffic jams.
Lacking access and service coordinations:
unbalanced loads among peers.
Demanded Solution (1): Fast Peer Services
z
z
z
Dynamically identifying and collecting trusted
and guaranteed peers as the backbones.
Establishing adaptive self-organization and
monitoring for resource coordinations.
Fast data and service searching in low-diameter
region.
(2): Allowing Distrustful Peers Exist
z
Ensure that peer interactions
do not become intrusive (monitoring/scheduling)
do protect privacy (communication anonymity)
not used for denial-of-service attacks (security)
(3): Measurable Security Metrics
z
Benchmarks for security measurement.
z
Stochastical models for security analysis.
z
Validating systems and quantifying security
degrees.
(4): Understanding the Trade-offs
z
z
z
Analyzing the impact of centralized controls to
performance and security.
Quantifying the security loss and performance
gain/loss by decentralization.
Optimizing peer-oriented systems for individual
and combined objectives:
high performance, highly secured, balanced of both,
for a given performance objective, finding...
(5): Utilizing Existing Infrastructure
z
z
z
Avoid establishing new standards and
protocols.
Avoid modifying commonly used and
general purpose software.
Peer-oriented processing should be
automatic with little user involvement.
Application Differences: Grid & P2P
z
z
Grid: providing a global problem solving
environment for large and critical scientific
applications and professional collaborations,
where each grid is a server.
P2P: providing a general and commercial
information/computing services, where each
peer can be both server and client.
Operation Differences: Grid & P2P
z
z
Grid: direct access to computing, software,
and data resources in remote & targeted
sites. (Servers-based)
P2P: random accesses to available
computing, software, and data resources
without a specifict target. (Clients-based)
Different Participants: Grid & P2P
z
z
Grid: pre-determined and registered clients and
servers.
P2P: clients and servers are not distinguished
and registered, which can come and go by their
choices.
Different QoS: Grid & P2P
z
z
Grid: guaranteed and reliable services are
required for each grid server.
P2P: only partially reliable, because services
from some peers are not guaranteed and trusted.
Security Differences: Grid & P2P
z
z
Grid: authentication, authority, and firewall
protection to each grid.
P2P: privacy, anonymity, authentication,
authority, and fire wall protection to each
peer is not guaranteed.
Different Controls: Grid & P2P
z
z
Grid: centralized control plays important
role in resource monitoring/allocations and
job scheduling.
P2P: limited or no central controls, mainly
rely on self-organization.
Changing of NSF Sponsored High
Performance Computing Efforts
z
1986 to 1996:
5 Indepdendent Supercomputer Centers:
Illinois, San Diego, Pittsburgh, Cornell, and Princeton
Science & Technology Research Centers:
CRPC and Visualization & Graphics
z
Missions:
- providing high performance resources
- developing new technologies
-Advancement of scientifc discovery.
Changing of NSF Sponsored High
Performance Computing Efforts
z
1997 to 2002:
Two Parnerships for Adv. Comp. Infras. (PACI)
NCSA at Illinois and NPACI at San Diego
leading 60+ institutions from 27 states.
z
Missions:
- prividing grid computing and data resources
- developing grid software tools
- applications on grids
- education outreach and training.
Changing of NSF Sponsored High
Performance Computing Efforts
z
2001 to 2004:
Distributed Terascale Facility (DTF)
4 DTF sites: NCSA, NPACI, Argonne, and Caltech
providing aggregated 14+ teraflops and 450+ terabytes.
z
Tasks:
- NCSA: 6+ TFs & 240+TBs Linux cluster of Itanium’s
- NPACI: 4+ TFs & 225+ TBs
- Angonne: 1+ TF IBM cluster, grid & viz. software
- Caltech: 86 TB on-line storage.
Large NSF Sponsored Grid Projects
z
GIOD (Globally Interconnected Object Databases)
global data storage and accesses of particle collider experiments
z
GriPhyN (Grid Physics Network)
building global grids for experimental physics studies.
z
iVDgL (international Virtual-Data grid Lab)
grids for physics/astronomy experiments
data-intensive science, US & EU collaboration
z
NEES (Network for Earthquake Engineering Simulation)
shifting from physical tests to simulation (20 grid sites)
Changing of NSF Sponsored High
Performance Computing Efforts
z
2003 to 2005:
Enhanced Distributed Terascale Facility
4 original DTF sites plus Pittsburgh SC.
z
Tasks:
- Enhancing the existing DTFs’ software and hardware
- Testing large scale applications.
- Widely connecting to users.
Merging P2P and Grid
z
Envisioning 2005 and later:
Decentralized Distributed Terascale Facility
- many DTF sites both large (servers) and small (peers).
- Pervasive computing: application scope beyond SC.
z
Tasks:
Merging with peer-oriented technology
- developing security & privacy protocols.
- coordinating heterogeneous Internet resources.
-
Future of Distributed Computing
z
z
z
z
Grid infrastructure will provide reliable
computing resources for large applications.
Within a grid region, peer-oriended techniques
will be integrated.
Peer-oriended paradigm will play a major role
for information retrievals.
The demand for data accesses/transfers will be
higher than cycles.
Suggestions To China Grid Projects
z
z
z
Building Internet and system infrastructure.
- using open standand for resource sharing
- both grid and peer-oriented systems
Joining and becoming a part of international
efforts to learn and to contribute.
Identifying key domestic applications.
Identifying Key Domestic Applications
z
z
Distributed digital libraries
- management of natural/human resources.
- national security data archives.
- ...
Large collaborations operations
- large distributed simulation
- collaborative designs and manufacture
- bioinformatics
- global information searching & procesing, ...
Download