STORK & NeST: Making Data Placement a First

advertisement
STORK & NeST: Making
Data Placement a First
Class Citizen in the Grid
Peter F. Couvares
(based on material from Tevfik Kosar, Nick LeRoy, and Jeff Weber)
Associate Researcher, Condor Team
Computer Sciences Department
University of Wisconsin-Madison
<condor-admin@cs.wisc.edu>
Need to move data
around..
Reliable and Efficient Grid Data Placement
www.cs.wisc.edu/condor
2
While doing this..
Locate the data
Access heterogeneous resources
Recover form all kinds of failures
Allocate and de-allocate storage
Move the data
Clean-up everything
All of these need to be done reliably and
efficiently!
Reliable and Efficient Grid Data Placement
www.cs.wisc.edu/condor
3
Stork
A scheduler for data placement activities
in the Grid
What Condor is for computational jobs,
Stork is for data placement
Stork’s fundamental concept:
“Make data placement a first class citizen in the
Grid.”
Reliable and Efficient Grid Data Placement
www.cs.wisc.edu/condor
4
Outline
Introduction
The Concept
Stork Features
Big Picture
Conclusions
Reliable and Efficient Grid Data Placement
www.cs.wisc.edu/condor
5
The Concept
•
Stage-in
•
Execute the Job
•
Stage-out
Individual Jobs
Reliable and Efficient Grid Data Placement
www.cs.wisc.edu/condor
6
The Concept
Allocate space for
input & output data
Stage-in
•
Stage-in
•
Execute the Job
•
Stage-out
Execute the job
Release input space
Stage-out
Release output space
Individual Jobs
Reliable and Efficient Grid Data Placement
www.cs.wisc.edu/condor
7
The Concept
Allocate space for
input & output data
Stage-in
•
Stage-in
•
Execute the Job
•
Stage-out
Execute the job
Release input space
Data Placement Jobs
Stage-out
Release output space
Computational Jobs
Reliable and Efficient Grid Data Placement
www.cs.wisc.edu/condor
8
The Concept
Condor
Job
Queue
DAG specification
Data A A.submit
Data B B.submit
Job C C.submit
…..
Parent A child B
Parent B child C
Parent C child D, E
…..
C
D
A
B
F
C
E
E
DAGMan
Reliable and Efficient Grid Data Placement
www.cs.wisc.edu/condor
Stork
Job
Queue
9
Why Stork?
Stork understands the characteristics and
semantics of data placement jobs.
Can make smart scheduling decisions, for
reliable and efficient data placement.
Reliable and Efficient Grid Data Placement
www.cs.wisc.edu/condor
10
Understanding Job
Characteristics & Semantics
Job_type = transfer, reserve, release?
Source and destination hosts, files, protocols to
use?
• Determine concurrency level
• Can select alternate protocols
• Can select alternate routes
• Can tune network parameters (tcp buffer size, I/O block
size, # of parallel streams)
• …
Reliable and Efficient Grid Data Placement
www.cs.wisc.edu/condor
11
Support for Heterogeneity
Protocol
translation
using Stork
memory
buffer.
Reliable and Efficient Grid Data Placement
www.cs.wisc.edu/condor
12
Support for Heterogeneity
Protocol
translation
using Stork
Disk Cache.
Reliable and Efficient Grid Data Placement
www.cs.wisc.edu/condor
13
Flexible Job Representation
[
Type
= “Transfer”;
Src_Url = “srb://ghidorac.sdsc.edu/kosart.condor/x.dat”;
Dest_Url = “nest://turkey.cs.wisc.edu/kosart/x.dat”;
……
……
]
Reliable and Efficient Grid Data Placement
www.cs.wisc.edu/condor
14
Failure Recovery and Efficient
Resource Utilization
Fault tolerance
• Just submit a bunch of data placement jobs, and then go
away..
Control number of concurrent transfers from/to
any storage system
• Prevents overloading
Space allocation and De-allocations
• Make sure space is available
Reliable and Efficient Grid Data Placement
www.cs.wisc.edu/condor
15
Outline
Introduction
The Concept
Stork Features
Big Picture
Conclusions
Reliable and Efficient Grid Data Placement
www.cs.wisc.edu/condor
16
USER
PLANNER
Abstract DAG
JOB
DESCRIPTIONS
USER
RLS
PLANNER
Abstract DAG
Concrete DAG
WORKFLOW
MANAGER
JOB
DESCRIPTIONS
USER
RLS /
MCAT
PLANNER
Abstract DAG
JOB
DESCRIPTIONS
Concrete DAG
WORKFLOW
MANAGER
COMPUTATION
SCHEDULER
DATA
PLACEMENT
SCHEDULER
GRID STORAGE SYSTEMS
GRID COMPUTE NODES
USER
RLS /
MCAT
PLANNER
Abstract DAG
JOB
DESCRIPTIONS
Concrete DAG
WORKFLOW
MANAGER
COMPUTATION
SCHEDULER
C. JOB
LOG FILES
GRID COMPUTE
NODES
DATA
PLACEMENT
SCHEDULER
POLICY ENFORCER
D. JOB
LOG FILES
GRID STORAGE SYSTEMS
USER
RLS /
MCAT
PLANNER
Abstract DAG
JOB
DESCRIPTIONS
Concrete DAG
WORKFLOW
MANAGER
COMPUTATION
SCHEDULER
C. JOB
LOG FILES
DATA
PLACEMENT
SCHEDULER
D. JOB
LOG FILES
GRID COMPUTE
NODES
NETWORK
MONITORING TOOLS
POLICY ENFORCER
GRID STORAGE SYSTEMS
DATA MINER
FEEDBACK MECHANISM
USER
RLS /
MCAT
PLANNER
Abstract DAG
JOB
DESCRIPTIONS
Concrete DAG
DAGMAN
MATCHMAKER
CONDOR/
CONDOR-G
C. JOB
LOG FILES
STORK
D. JOB
LOG FILES
GRID COMPUTE
NODES
NETWORK
MONITORING TOOLS
GRID STORAGE SYSTEMS
DATA MINER
FEEDBACK MECHANISM
Conclusions
Regard data placement as individual jobs.
Treat computational and data placement jobs
differently.
Introduce a specialized scheduler for data
placement.
Provide end-to-end automation, fault tolerance,
run-time adaptation, multilevel policy support,
reliable and efficient transfers.
Reliable and Efficient Grid Data Placement
www.cs.wisc.edu/condor
23
Future work
Enhanced interaction between Stork and higher
level planners
• better coordination of CPU and I/O
Interaction between multiple Stork servers and
job delegation from one to another
Enhanced authentication mechanisms
More run-time adaptation
Reliable and Efficient Grid Data Placement
www.cs.wisc.edu/condor
24
Related Publications
Tevfik Kosar and Miron Livny. “Stork: Making Data Placement a First Class
Citizen in the Grid”. In Proceedings of 24th IEEE Int. Conference on Distributed
Computing Systems (ICDCS 2004), Tokyo, Japan, March 2004.
George Kola, Tevfik Kosar and Miron Livny. “A Fully Automated Fault-tolerant
System for Distributed Video Processing and Off-site Replication. To appear in
Proceedings of 14th ACM Int. Workshop on etwork and Operating Systems
Support for Digital Audio and Video (Nossdav 2004), Kinsale, Ireland, June
2004.
Tevfik Kosar, George Kola and Miron Livny. “A Framework for Self-optimizing,
Fault-tolerant, High Performance Bulk Data Transfers in a Heterogeneous Grid
Environment”. In Proceedings of 2nd Int. Symposium on Parallel and Distributed
Computing (ISPDC 2003), Ljubljana, Slovenia, October 2003.
George Kola, Tevfik Kosar and Miron Livny. “Run-time Adaptation of Grid Data
Placement Jobs”. In Proceedings of Int. Workshop on Adaptive Grid Middleware
(AGridM 2003), New Orleans, LA, September 2003.
Reliable and Efficient Grid Data Placement
www.cs.wisc.edu/condor
25
You don’t have to FedEx
your data anymore..
Stork delivers it for you!
For more information:
• Email: condor-admin@cs.wisc.edu
• http://www.cs.wisc.edu/condor/stork
Reliable and Efficient Grid Data Placement
www.cs.wisc.edu/condor
26
NeST
NeST (Network Storage Technology)
A lightweight, portable storage manager for data placement
activities on the Grid
Allocation: NeST negotiates “mini storage contracts”
between users and server.
Multi-protocol: Supports Chirp, GridFTP, NFS, HTTP
Chirp is NeST’s internal protocol.
Secure: GSI authentication
Lightweight: Configuration and installation can be performed
in minutes, and does not require root.
Reliable and Efficient Grid Data Placement
www.cs.wisc.edu/condor
27
Why storage allocations ?
› Users need both temporary storage, and long-term
›
›
guaranteed storage.
Administrators need a storage solution with
configurable limits and policy.
Administrators will benefit from NeST’s
automatic reclamations of expired storage
allocations.
www.cs.wisc.edu/condor
Storage allocations in NeST
› Lot – abstraction for storage allocation
with an associated handle
• Handle is used for all subsequent operations on
this lot
› Client requests lot of a specified size and
duration. Server accepts or rejects client
request.
www.cs.wisc.edu/condor
Lot types
› User / Group
• User: single user (user controls ACL)
• Group: shared use (all members control ACL)
› Best effort / Guaranteed
• Best effort: server may purge data if necessary. Good fit
for derived data.
• Guaranteed: server honors request duration.
› Hierarchical: Lots with lots (“sublots”)
www.cs.wisc.edu/condor
Lot operations
› Create, Delete, Update
› MoveFile
• Moves files between lots
› AddUser, RemoveUser
• Lot level authorization
• List of users allowed to request sub-lots
› Attach / Detach
• Performs NeST lot to path binding
www.cs.wisc.edu/condor
Functionality: GT4 GridFTP
globus-url-copy
Sample Application
(GSI-FTP)
GridFTP Server
Disk Module
Disk
Storage
www.cs.wisc.edu/condor
Functionality: GridFTP
+NeST
NeST Client
globus-url-copy
(Lot operations, etc.)
(File transfers)
(GSI-FTP)
(chirp)
GridFTP Server
NeST Module
(File transfer)
(chirp)
NeST Server
Chirp
Handler
Disk
Storage
www.cs.wisc.edu/condor
NeST with Stork
Stork
GT4
NeST
(File transfers)
(GSI-FTP)
(Lot operations, etc.)
(chirp)
GridFTP Server
NeST Module
(File transfer)
(chirp)
NeST Server
Chirp
Handler
Disk
Storage
www.cs.wisc.edu/condor
NeST Sample Work DAG
Condor-G
Stork
Allocate Xfer In
Job
Job
Stork
Xfer Out
Job
NeST
www.cs.wisc.edu/condor
Release
Connection Manager
› Used to control connections to NeST
› Allows connection reservations:
• Reserve # of simultaneous connections
• Reserve total bytes of transfer
• Reservations have durations & expire
• Reservations are persistent
www.cs.wisc.edu/condor
Release Status
› http://www.cs.wisc.edu/condor/nest
› v0.9.7 expected soon
• v0.9.7 Pre 2 released
• Just bug fixes; features frozen
› v1.0 expected later this year
› Currently supports Linux, will support other O/S’s
in the future.
www.cs.wisc.edu/condor
Roadmap
› Performance tests with Stork
› Continue hardening code base
› Expand supported platforms
• Solaris & other UNIX-en
› Bundle with Condor
› Connection Manager
www.cs.wisc.edu/condor
Questions ?
› More information available at
http://www.cs.wisc.edu/condor/nest
www.cs.wisc.edu/condor
Download