tools - Computer Science and Engineering

advertisement
Grid and Cloud Computing:
Real-life Instances of Distributed Systems
Adriana Iamnitchi
University of South Florida
anda@cse.usf.edu
http://www.cse.usf.edu/~anda
Grid: Definitions
Definition 1: Infrastructure that provides dependable,
consistent, pervasive, and inexpensive access to high-end
computational capabilities (1998)
Definition 2: A system that coordinates resources not
subject to centralized control, using open, general-purpose
protocols to deliver nontrivial Quality of Service (2002)
Grid: Resource-Sharing Environment
•
Users:
– 1000s from 10s institutions
– Well-established communities
•
Resources:
– Computers, data, instruments, storage,
applications
– Owned/administered by institutions
•
•
Applications: data- and compute-intensive
processing
Approach: common infrastructure
The Globus Toolkit®
Includes slides borrowed freely from
The Globus team
How It Started
While helping to build/integrate a diverse range of distributed applications,
the same problems kept showing up over and over again.
– Too hard to keep track of authentication data (ID/password) across
institutions
– Too hard to monitor system and application status across institutions
– Too many ways to submit jobs
– Too many ways to store & access files and data
– Too many ways to keep track of data
– Too easy to leave “dangling” resources lying around (robustness)
Grid Architecture in a Nutshell
Forget homogeneity!
Grid Services vs. Web Services
•
Web Services Resource Framework, a specification developed by OASIS,
specifies how to make web services statefull.
– Joint effort between Grid and Web Services communities
•
OGSA: Open Grid Services Architecture
– standardizes all common services used in grid application (job management
services, resource management services, security services, etc.) by specifying
a set of standard interfaces for these services.
•
Grid services: implement OGSA
Statefull vs. Stateless Services
Stateless Service
Stateful vs. Stateless Services
Stateful Service
The Globus Toolkit
•
•
The Globus Toolkit (GT) is a collection of solutions to problems that
frequently come up when trying to build collaborative distributed
applications.
Not turnkey solutions, but building blocks and tools for application
developers and system integrators.
– Some components (e.g., file transfer) go farther than others (e.g., remote job
submission) toward end-user relevance.
•
•
To date, the Toolkit has focused on simplifying heterogeneity for
application developers.
The goal has been to capitalize on and encourage use of existing
standards (IETF, W3C, OASIS, GGF).
– The Toolkit also includes reference implementations of new/proposed
standards in these organizations.
Globus Toolkit Components
G
T
4
G
T
3
G
T
2
G
T
3
G
T
4
Community
Scheduler
Framework
[contribution]
Delegation
Service
Python WS Core
[contribution]
C WS Core
Community
Authorization
Service
OGSA-DAI
[Tech Preview]
WS
Authentication
Authorization
Reliable
File
Transfer
Grid
Resource
Allocation Mgmt
(WS GRAM)
Monitoring
& Discovery
System
(MDS4)
Java WS Core
GridFTP
Grid
Resource
Allocation Mgmt
(Pre-WS GRAM)
Monitoring
& Discovery
System
(MDS2)
C Common
Libraries
Pre-WS
Authentication
Authorization
Web
Services
Components
Components
Replica
Location
Service
XIO
Credential
Management
Security
Data
Management
Non-WS
Execution
Management
Information
Services
Common
Runtime
How it Really Happens
Web
Browser
Compute
Server
Simulation
Tool
Web
Portal
Registration
Service
Data
Viewer
Tool
Chat
Tool
Credential
Repository
Telepresence
Monitor
Application services
organize VOs & enable
access to other services
Camera
Camera
Database
service
Data
Catalog
Database
service
Database
service
Certificate
authority
Users work
with client
applications
Compute
Server
Collective services
aggregate &/or
virtualize resources
Resources implement
standard access &
management interfaces
How it Really Happens (without Globus)
Simulation
Tool
Web
Browser
Web
Portal
Application
Developer
10
Off the
Shelf
12
Globus
Toolkit
0
Grid
Community
0
Compute
Server
B
Compute
Server
Registration
Service
Data
Viewer
Tool
Chat
Tool
Credential
Repository
Application services
organize VOs & enable
access to other services
Camera
Telepresence
Monitor
Data
Catalog
Certificate
authority
Users work
with client
applications
A
Collective services
aggregate &/or
virtualize resources
Camera
C
Database
service
D
Database
service
E
Database
service
Resources implement
standard access &
management interfaces
How it Really Happens (with Globus)
Globus
GRAM
Simulation
Tool
Web
Browser
Globus
GRAM
Globus Index
Service
CHEF
Application
Developer
2
Off the
Shelf
9
Globus
Toolkit
4
Grid
Community
4
Data
Viewer
Tool
CHEF Chat
Teamlet
MyProxy
Users work
with client
applications
Application services
organize VOs & enable
access to other services
Camera
Globus
DAI
Globus
DAI
Globus
Certificate
Authority
DAI
Collective services
aggregate &/or
virtualize resources
Compute
Server
Camera
Telepresence
Monitor
Globus
MCS/RLS
Compute
Server
Database
service
Database
service
Database
service
Resources implement
standard access &
management interfaces
Building a Grid (in Practice)
•
Building a Grid system or application is currently an exercise in software
integration.
–
–
–
–
–
–
–
–
•
Define user requirements
Derive system requirements or features
Survey existing components
Identify useful components
Develop components to fit into the gaps
Integrate the system
Deploy and test the system
Maintain the system during its operation
This should be done iteratively, with many loops and eddys in the flow.
Relationships between Globus and Web Services
Globus Components: GridFTP
•
A high-performance, secure data
transfer service optimized for highbandwidth wide-area networks
–
–
–
–
–
•
FTP with extensions
Uses basic Grid security (control and data
channels)
Multiple data channels for parallel transfers
Partial file transfers
Third-party (direct server-to-server)
transfers
GGF recommendation GFD.20
OSGCC 2008
Globus Primer: An Introduction to Globus Software
Basic Transfer
One control
channel, several
parallel data
channels
Third-party
Transfer
Control channels
to each server,
several parallel
data channels
between servers
18
Globus Components: Striped GridFTP
•
GridFTP supports a striped (multi-node)
configuration
–
–
–
•
Requires shared/parallel filesystem
on all nodes
–
OSGCC 2008
Establish control channel with one node
Coordinate data channels on multiple
nodes
Allows use of many NICs in a single
transfer
On high-performance WANs, aggregate
performance is limited by filesystem
data rates
Globus Primer: An Introduction to Globus Software
19
Globus Components: Reliable File Transfer
•
•
A WSRF service for queuing file
transfer requests
–
–
Server-to-server transfers
Checkpointing for restarts
–
Database back-end for failovers
Allows clients to request transfers
and then “disappear”
–
–
No need to manage the transfer
Status monitoring available if desired
OSGCC 2008
Globus Primer: An Introduction to Globus Software
20
Globus Components: Replica Location Service
•
A distributed system for tracking
replicated data
–
–
•
Consistent local state maintained in Local Replica
Catalogs (LRCs)
Collective state with relaxed consistency
maintained in Replica Location Indices (RLIs)
Simple Hierarchy
The most basic
deployment of RLS
Performance features
–
–
–
Soft state maintenance of RLI state
Compression of state updates
Membership and partitioning information
maintenance
Fully Connected
High availability of
the data at all sites
Tiered Hierarchy
For very large
systems and/or very
Large collections
OSGCC 2008
Globus Primer: An Introduction to Globus Software
21
From Grids to Cloud Computing?
From Grids to Cloud Computing
•
Logical steps:
– Make the grids public
– Provide much simpler interfaces (and more limited control)
– Charge usage of resources
• Instead of relying on implicit incentives from science collaborations
• Ideally, a “pay-as-you-go” rate
•
In reality:
– Different history
• Cloud computing as utility computing (1966 paper)
•
However, the promise of cloud computing finds a great user base in
science grids due to:
– Intense computations
– Huge amounts of storage needs
– Yet…
P2P vs. Grid vs. Cloud Computing: Google Trends
All regions
US only
Last 12 Months, World-Wide
A Yahoo in 'cloud computing' research with HP-Intel, WA today - Jul 29 2008
B How Cloud Computing Is Changing The World, KMBC.com - Aug 4 2008
C 3Tera Brings Windows to Cloud Computing, Earthtimes (press release) - Oct 1 2008
D Infrastructure Cloud Computing, SYS-CON Media - Oct 28 2008
E Cloud Computing Expo: Cloud Reaches Washington, DC, SYS-CON Brasil (Assinatura) Jan 26 2009
F Acumen Solutions First to Launch Cloud Computing Practice to Deliver Innovative
Solutions to Government, Trading Markets (press release) - Feb 25 2009
What is Cloud Computing?
•
Old idea: Software as a Service (SaaS)
– Def: delivering applications over the Internet
•
Recently: “[Hardware, Infrastructure, Platform] as a service”
– Poorly defined so we avoid all “X as a service”
•
Utility Computing: pay-as-you-go computing
– Illusion of infinite resources
– No up-front cost
– Fine-grained billing (e.g. hourly)
Cloud computing: a new term for the long-held dream of utility computing
(first defined in 1966)
– Refers to both the application delivered as services over the Internet and the
hardware and software systems in the datacenters that provide those services.
26
Why Now?
•
Experience with very large datacenters
– Unprecedented economies of scale
•
Other factors
–
–
–
–
Pervasive broadband Internet
Fast x86 virtualization
Pay-as-you-go billing model
Standard software stack
27
Amazon S3 for Science Grids: a Viable Solution?
Joint work with
Mayur Palankar (USF)
Matei Ripeanu (UBC)
Simson Garfinkel (Harvard)
Overview
Science Grids
•
•
Data-intensive scientific collaborations
Produce, analyze, and archive huge volumes of data (PetaBytes)
– High data management and maintenance costs
– Files are often used by groups of users and not individually
Amazon Simple Storage Service (S3)
•
Novel storage ‘utility’:
•
Self-defined performance targets:
•
Pay-as-you go pricing:
– Direct access to storage
Keeps
decreasing
– scalable, infinite data durability, 99.99% availability, fast data access
– $0.15/month/GB stored and $0.10-$0.17/GB transferred
Is offloading data storage from an in-house mass storage system
to S3 feasible and cost-effective?
The DØ Experiment
• High-energy physics collaboration
• Traces from January ‘03 to March ’05 (27 months)
• 375 TB data, 5.2 PB transferred
• Shared data usage: no access control
• 561 users from 70+ institutions in 18 countries
• High intensity data usage: ~550Mbps sustained access rate in DZero
• 113,062 jobs running for 973,892 hours over the period of 27 months
Trace recording interval
01/2003 – 03/2005
Number of jobs
113,062
Hours of computation
973,892
Total storage volume
375 TB
Total data processed
5.2 PB
Average data access rate
273 GB/hour
Approach
•
Characterize S3
– Does it live up to its own claims?
– Study meantime superseded by cloudstatus.com
•
Toy scenario: evaluate a representative scientific application (DZero) in
this context
– Estimate performance and costs
– Is the functionality provided adequate?
•
Outline
– S3 architecture
– Toy scenario: S3-supported DZero: cost and functionality requirements
– Lessons/suggested improvements
Amazon S3 Architecture
•
Two-level namespace
– Buckets (think directories)
– Global names
– Two goals: data organization and charging
– Data objects
– Opaque object (max 5GB)
– Metadata (attribute-value, up to 4KB)
•
Functionality
– Simple put/get functionality
– Limited search functionality
– Objects are immutable, cannot be renamed
•
Data access protocols
– SOAP
– REST
– BitTorrent
Amazon S3 Functionality
•
•
•
•
•
•
•
•
Amazon S3 is intentionally built with a minimal feature set.
Write, read, and delete objects containing from 1 byte to 5 gigabytes of
data each. The number of objects you can store is unlimited.
Each object is stored in a bucket and retrieved via a unique, developerassigned key.
A bucket can be located in the United States or in Europe. All objects
within the bucket will be stored in the bucket’s location, but the objects
can be accessed from anywhere.
Authentication mechanisms are provided to ensure that data is kept
secure from unauthorized access. Objects can be made private or public,
and rights can be granted to specific users.
Uses standards-based REST and SOAP interfaces designed to work with
any Internet-development toolkit.
Built to be flexible so that protocol or functional layers can easily be
added. Default download protocol is HTTP. A BitTorrent™ protocol
interface is provided to lower costs for high-scale distribution.
Additional interfaces will be added in the future.
Reliability backed with the Amazon S3 Service Level Agreement.
Amazon S3 Architecture
•
Security
– Identities
– Assigned by S3 when initial contract is ‘signed’
– Authentication
– Public/private key scheme
– But private key is generated by Amazon!
– Access control
– Access control lists (limited to 100 principals)
– ACL attributes
– FullControl
– Read & Write (objects cannot be written)
– ReadACL & WriteACL (for buckets or objects)
– Auditing (pseudo)
– S3 can provide a log record
Access Performance via BitTorrent
Our question: does BitTorrent
work how it is supposed to?
Answer: Yes.
S3 Evaluation: Cost
•
Scenario 1: All data stored at S3 and processed by DZero
– Storage cost: $691,000/year ($829,440 for S3-Europe)
– Transfer: $335,012/year
–  $85,500/month
•
Scenario 2: Reducing transfer costs:
– Caching:
– 50TB cooperative cache: $43,888 per year in transfer costs (~10 times
lower)
– BitTorrent and distributed replicas
– Use EC2: Replace transfer costs with $43,284/year
•
Scenario 3: Reducing storage costs:
– Archive cold data
– lifetime of 30% files < 24 hours, 40% < a week, 50% < a month
– Throw away derived data
– Distinguish between raw and derived data
Key Idea: Unbundling Performance Characteristics
Problem: S3 is about an order of magnitude more expensive than inhouse maintenance of resources!
•
•
High availability, high durability, high performance are bundled at a
single pricing point…
… but some applications need only one or two of them
– A cache: availability and access performance
– A backup solution (e.g., for DZero): durability and availability
•
Solution: SLAs that allow the user to specify their requirements
and chose pricing point.
Unbundling Performance Characteristics
Application class
Durability
Availability
High
performance
data access
Cache
No
Depends
Yes
Long-term
archival
Yes
No
No
Online production
No
Yes
Yes
Batch production
No
No
Yes
The resources needed to provide high performance data access, high
data availability and long data durability are different
S3 Evaluation: Security
•
Traditional risks with distributed storage are still a concern:
– Permanent data loss,
– Temporary data unavailability (DoS),
– Loss of confidentiality
– Malicious or erroneous data modifications
– New risk: direct monetary loss
– Magnified as there is no built-in solution to limit loss
•
•
Security scheme’s big advantage: it’s simple
… but has limitations
– Access control
– Hard to use ACLs in large systems – needs at least groups (now available)
– ACLs limited to 100 principals
– No support for fine grained delegation
– Implicit trust between users and the S3 service
– No support for un-repudiabiliy
– No tools to limit risk
Suggested Improvements
•
To lower costs: unbundle performance characteristics
– Availability, durability, and access time are bundled at a single pricing point
• Some applications need only one or two of them
– Solution: SLAs that allow the user to specify their requirements and chose
pricing point
•
To provide specific support for science collaborations
– Better security support for complex collaborations
– Additional functionality for better usability:
– Metadata based searches
– Renaming and mutating objects
– Relax hard-coded limitations: 100 buckets, 100 users in ACL, etc.
•
Lesson for application integrators: Use application-level information to
reduce costs
– Raw vs. derived data
– Exploit usage patterns: e.g., data gets cold.
•
AMAZON S3 FOR SCIENCE GRIDS: A VIABLE SOLUTION?
– Not yet
– In addition, sociological issues
Download