Reliable and Efficient Data Placement in a Grid Environment PhD Research Summary

advertisement
Reliable and Efficient
Data Placement
in a Grid Environment
PhD Research Summary
Tevfik Kosar
IBM TJ Watson Research Center
June 22nd, 2004
Grid Computing
“Distributed computing across networks using open
standards supporting heterogeneous resources” - IBM
Reliable and Efficient Data Placement in a Grid Environment
Motivations for Grid Computing

Increase Capacity
Improve Efficiency / Reduce Costs
Reduce “Time to Results”
Provide Reliability / Availability

Support Heterogeneous systems

Enable Collaborations …



Reliable and Efficient Data Placement in a Grid Environment
Future of Grid
“Grid is hot because it's the right technology
for its time and within the next five years it
will be a de facto part and parcel of virtually
every major financial markets firm's
infrastructure..”
`
- Grid Computing in Financial Markets: Moving
Beyond Compute Intensive Applications, Tabb Group
Reliable and Efficient Data Placement in a Grid Environment
Moving Beyond
Compute-Intensive Applications
“While the compute-intensive segment is
growing, the vast amount of new grid growth
will not come from compute-intensive solutions,
but from data and service grids whose
application we believe to be much wider than
traditional compute grids.”
`
- Grid Computing in Financial Markets: Moving
Beyond Compute Intensive Applications, Tabb Group
Reliable and Efficient Data Placement in a Grid Environment
What about Science?






Genomic information processing applications
Biomedical Informatics Research Network (BIRN)
applications
Cosmology applications (MADCAP)
Methods for modeling large molecular systems
Coupled climate modeling applications
Real-time observatories, applications, and datamanagement (ROADNet)
Reliable and Efficient Data Placement in a Grid Environment
Some Remarkable Numbers
Characteristics of four physics experiments targeted by GriPhyN:
Application First Data Data Volume
User
(TB/yr)
Community
SDSS
1999
10
100s
LIGO
2002
250
100s
ATLAS/
CMS
2005
5,000
1000s
Source: GriPhyN Proposal, 2000
Reliable and Efficient Data Placement in a Grid Environment
Even More Remarkable…
“ ..the data volume of CMS is expected
to subsequently increase rapidly, so that
the accumulated data volume will reach
1 Exabyte (1 million Terabytes) by
around 2015.”
Source: PPDG Deliverables to CMS
Reliable and Efficient Data Placement in a Grid Environment
Access to Remote Data




Remote I/O
Move application close to data
Move data close to application
Move both data and application
Reliable and Efficient Data Placement in a Grid Environment
Access to Remote Data




Remote I/O
Move application close to data
Move data close to application
Move both data and application
Remote I/O does not scale well for large data sets!
Reliable and Efficient Data Placement in a Grid Environment
Access to Remote Data




Remote I/O
Move application close to data
Move data close to application
Move both data and application
Remote I/O does not scale well for large data sets!
Storage sites do not always have sufficient
computational power nearby!
Reliable and Efficient Data Placement in a Grid Environment
Need to move data around
TB
TB
PB
PB
Reliable and Efficient Data Placement in a Grid Environment
While doing this..






Locate the data
Access heterogeneous resources
Face with all kinds of failures
Allocate and de-allocate storage
Move the data
Clean-up everything
All of these need to be done reliably and
efficiently!
Reliable and Efficient Data Placement in a Grid Environment
Goal



Data placement is crucial in a Grid
environment.
Current approaches regard it as a side
affect of computation.
Data placement must be regarded as a
first class citizen in the Grid just like the
computational jobs.
Reliable and Efficient Data Placement in a Grid Environment
Approach


Regard data placement activities as full
fledged jobs.
Design and implement a system to
reliably and efficiently schedule,
execute, monitor, and manage them.
Reliable and Efficient Data Placement in a Grid Environment
Outline







Introduction
Background
The Concept
Data Placement Subsystem
Progress Made
Contributions
Future Work
Reliable and Efficient Data Placement in a Grid Environment
Background
CPU
BUS
HARDWARE
LEVEL
I/O PROCESSOR
MEMORY
CONTROLLER
DISK
Reliable and Efficient Data Placement in a Grid Environment
Background
OPERATING
SYSTEMS LEVEL
I/O
SUBSYSTEM
I/O
SCHEDULER
CPU
SCHEDULER
I/O CONTROL
SYSTEM
CPU
HARDWARE
LEVEL
I/O PROCESSOR
MEMORY
DMA
CONTROLLER
DISK
Reliable and Efficient Data Placement in a Grid Environment
BUS
Background
BATCH
SCHEDULERS
DISTRIBUTED
SYSTEMS LEVEL
I/O SUBSYSTEM
OPERATING
SYSTEMS LEVEL
I/O
CPU
SCHEDULER
SCHEDULER
I/O CONTROL SYSTEM
CPU
HARDWARE
LEVEL
I/O PROCESSOR
MEMORY
DMA
CONTROLLER
DISK
Reliable and Efficient Data Placement in a Grid Environment
BUS
Background
DISTRIBUTED
SYSTEMS LEVEL
BATCH
SCHEDULERS
DATA PLACEMENT
SUBSYSTEM
I/O SUBSYSTEM
OPERATING
SYSTEMS LEVEL
I/O
CPU
SCHEDULER
SCHEDULER
I/O CONTROL SYSTEM
CPU
HARDWARE
LEVEL
I/O PROCESSOR
MEMORY
DMA
CONTROLLER
DISK
Reliable and Efficient Data Placement in a Grid Environment
BUS
Outline







Introduction
Background
The Concept
Data Placement Subsystem
Progress Made
Contributions
Future Work
Reliable and Efficient Data Placement in a Grid Environment
The Concept
•
Stage-in
•
Execute the Job
•
Stage-out
Individual Jobs
Reliable and Efficient Data Placement in a Grid Environment
The Concept
Allocate space for
input & output data
Stage-in
•
Stage-in
•
Execute the Job
•
Stage-out
Execute the job
Release input space
Individual Jobs
Stage-out
Release output space
Reliable and Efficient Data Placement in a Grid Environment
Traditional Schedulers

Not aware of characteristics and
semantics of data placement jobs
Executable = /tmp/foo.exe
Arguments = a b c d
Executable = globus-url-copy
Arguments = gsiftp://host1/f1
.
gsiftp://host2/f2
Any difference?
Reliable and Efficient Data Placement in a Grid Environment
Understanding Job
Characteristics & Semantics


Job_type = transfer, reserve, release?
Source and destination hosts, files, protocols
to use?





Determine concurrency level
Can select alternate protocols
Can select alternate routes
Can tune network parameters (tcp buffer size, I/O
block size, # of parallel streams)
…
Reliable and Efficient Data Placement in a Grid Environment
The Concept
Allocate space for
input & output data
Stage-in
•
Stage-in
•
Execute the Job
•
Stage-out
Execute the job
Release input space
Individual Jobs
Stage-out
Release output space
Reliable and Efficient Data Placement in a Grid Environment
The Concept
Allocate space for
input & output data
Stage-in
•
Stage-in
•
Execute the Job
•
Stage-out
Execute the job
Release input space
Stage-out
Data Placement Jobs
Computational Jobs
Release output space
Reliable and Efficient Data Placement in a Grid Environment
Outline







Introduction
Background
The Concept
Data Placement Subsystem
Progress Made
Contributions
Future Work
Reliable and Efficient Data Placement in a Grid Environment
USER
JOB
DESCRIPTIONS
USER
PLANNER
JOB
DESCRIPTIONS
USER
JOB
DESCRIPTIONS
PLANNER
COMPUTATION
SCHEDULER
DATA
PLACEMENT
SCHEDULER
STORAGE SYSTEMS
COMPUTE NODES
USER
JOB
DESCRIPTIONS
PLANNER
COMPUTATION
SCHEDULER
C. JOB
LOG FILES
DATA
PLACEMENT
SCHEDULER
RESOURCE BROKER/
POLICY ENFORCER
D. JOB
LOG FILES
STORAGE SYSTEMS
USER
JOB
DESCRIPTIONS
PLANNER
COMPUTATION
SCHEDULER
C. JOB
LOG FILES
DATA
PLACEMENT
SCHEDULER
RESOURCE BROKER/
POLICY ENFORCER
D. JOB
LOG FILES
STORAGE SYSTEMS
DATA MINER
FEEDBACK MECHANISM
NETWORK
MONITORING TOOLS
Transfer time (T) vs Probability (t < T)
1
0.8
0.6
0.4
0.2
Transfer time (T)
(minutes)
15.8
14.9
12.7
10.3
9.5
8.6
7.9
7.3
6.9
6.6
6.1
5.8
5.6
5.4
5.2
5.0
4.8
0
4.6
Probabilty (t < T)
(%)
1.2
USER
JOB
DESCRIPTIONS
PLANNER
COMPUTATION
SCHEDULER
C. JOB
LOG FILES
DATA
PLACEMENT
SCHEDULER
RESOURCE BROKER/
POLICY ENFORCER
D. JOB
LOG FILES
STORAGE SYSTEMS
DATA MINER
FEEDBACK MECHANISM
NETWORK
MONITORING TOOLS
USER
COMPUTATION
SCHEDULER
C. JOB
LOG FILES
DATA
PLACEMENT
SCHEDULER
RESOURCE BROKER/
POLICY ENFORCER
D. JOB
LOG FILES
STORAGE SYSTEMS
DATA MINER
FEEDBACK MECHANISM
NETWORK
MONITORING TOOLS
DATA PLACEMENT SUBSYSTEM
JOB
DESCRIPTIONS
PLANNER
Outline







Background
Related Work
The Concept
Data Placement Subsystem
Progress Made
Contributions
Future Work
Reliable and Efficient Data Placement in a Grid Environment
USER
COMPUTATION
SCHEDULER
C. JOB
LOG FILES
DATA
PLACEMENT
SCHEDULER
RESOURCE BROKER/
POLICY ENFORCER
D. JOB
LOG FILES
STORAGE SYSTEMS
DATA MINER
Implemented
FEEDBACK MECHANISM
NETWORK
MONITORING TOOLS
DATA PLACEMENT SUBSYSTEM
JOB
DESCRIPTIONS
PLANNER
Separation of Jobs
DAG specification
DaP A A.submit
DaP B B.submit
Job C C.submit
…..
Parent A child B
Parent B child C
Parent C child D, E
…..
Reliable and Efficient Data Placement in a Grid Environment
Separation of Jobs
DAG specification
DaP A A.submit
DaP B B.submit
Job C C.submit
…..
Parent A child B
Parent B child C
Parent C child D, E
…..
D
A
B
F
C
E
Workflow Manager
Reliable and Efficient Data Placement in a Grid Environment
Separation of Jobs
Compute
C
Job
Queue
DAG specification
DaP A A.submit
DaP B B.submit
Job C C.submit
…..
Parent A child B
Parent B child C
Parent C child D, E
…..
D
A
B
F
C
E
E
Workflow Manager
Reliable and Efficient Data Placement in a Grid Environment
DaP
Job
Queue
Separation of Jobs
Condor
Job
Queue
DAG specification
DaP A A.submit
DaP B B.submit
Job C C.submit
…..
Parent A child B
Parent B child C
Parent C child D, E
…..
C
D
A
B
F
C
E
E
DAGMan
Reliable and Efficient Data Placement in a Grid Environment
Stork
Job
Queue
Stork:
Data Placement Scheduler



Most important component of the data
placement subsystem.
Understands the characteristics and
semantics of data placement jobs.
Can make smart scheduling decisions
for reliable and efficient data
placement.
Reliable and Efficient Data Placement in a Grid Environment
Support for Heterogeneity
Protocol
translation
using Stork
memory
buffer.
Reliable and Efficient Data Placement in a Grid Environment
Support for Heterogeneity
Protocol
translation
using Stork
Disk Cache.
Reliable and Efficient Data Placement in a Grid Environment
Flexible Job Representation
and Multilevel Policy Support
[
Type
= “Transfer”;
Src_Url = “srb://ghidorac.sdsc.edu/kosart.condor/x.dat”;
Dest_Url = “nest://turkey.cs.wisc.edu/kosart/x.dat”;
……
……
Max_Retry = 10;
Restart_in = “2 hours”;
]
Reliable and Efficient Data Placement in a Grid Environment
Run-time Adaptation

Dynamic protocol selection
[
dap_type = “transfer”;
src_url = “drouter://slic04.sdsc.edu/tmp/test.dat”;
dest_url = “drouter://quest2.ncsa.uiuc.edu/tmp/test.dat”;
alt_protocols = “nest-nest, gsiftp-gsiftp”;
]
[
dap_type = “transfer”;
src_url = “any://slic04.sdsc.edu/tmp/test.dat”;
dest_url = “any://quest2.ncsa.uiuc.edu/tmp/test.dat”;
]
Reliable and Efficient Data Placement in a Grid Environment
Run-time Adaptation -2

Run-time Protocol Auto-tuning
[
link
= “slic04.sdsc.edu – quest2.ncsa.uiuc.edu”;
protocol = “gsiftp”;
bs
tcp_bs
p
= 1024KB;
= 1024KB;
= 4;
//block size
//TCP buffer size
]
Reliable and Efficient Data Placement in a Grid Environment
Failure Recovery and Efficient
Resource Utilization

Fault tolerance


Control number of concurrent transfers
from/to any storage system


Just submit a bunch of data placement
jobs, and then go away..
Prevents overloading
Space allocation and De-allocations

Make sure space is available
Reliable and Efficient Data Placement in a Grid Environment
Case Study -I
Reliable and Efficient Data Placement in a Grid Environment
Dynamic Protocol Selection
Reliable and Efficient Data Placement in a Grid Environment
Runtime Adaptation
Before Tuning:
• parallelism = 1
• block_size = 1 MB
• tcp_bs = 64 KB
After Tuning:
• parallelism = 4
• block_size = 1 MB
• tcp_bs = 256 KB
Reliable and Efficient Data Placement in a Grid Environment
Case Study -II:
SRB-UniTree Data Pipeline



Transfer ~3 TB of DPOSS data from SRB
@SDSC to UniTree @NCSA
No common interface
Network and storage limitations
A data pipeline created with Stork
Reliable and Efficient Data Placement in a Grid Environment
Management Site
(at UW)
A
D
SRB Server
(at SDSC)
UniTree Server
(at NCSA)
1 Gb/s,
0.4 ms
100 Mb/s,
0.6 ms
100 Mb/s,
66.7 ms
SDSC Cache
NCSA Cache
(20 GB Disk space)
(20 GB disk space)
B
C
Control flow
Data flow
Comparing Pipelines
Configuration
End-to-end rate
1 staging node
40 Mb/s
2 staging nodes (not tuned)
25.6 Mb/s
2 staging nodes (tuned)
47.6 Mb/s
Reliable and Efficient Data Placement in a Grid Environment
Failure Recovery
UniTree not responding
SDSC cache reboot &
UW CS Network outage
Diskrouter reconfigured
and restarted
Software problem
Profiling Data Transfer
Protocols and Servers



Get a better understanding of transfers
How time is spent at kernel level during
transfers?
Profiled GridFTP and NeST servers
using “oprofile” and by changing
different parameters
Reliable and Efficient Data Placement in a Grid Environment
Percentage of CPUof
TimeCPU time
Percentage
GridFTP
Read: 6.5 MB/s
Write: 7.8 MB/s
45.0
40.0
35.0
30.0
25.0
20.0
15.0
10.0
5.0
0.0
Idle
Ethern
et
Driver
Interru
pt
Handli
Read From GridFTP
15.9
40.9
Write To GridFTP
44.5
1.5
Libc
Globus
Oprofil
e
IDE
File I/O
Rest of
Kernel
10.3
8.1
2.7
2.7
4.0
2.0
13.4
4.9
16.8
3.8
5.1
0.3
3.8
19.3
CPU
Percentage
Percentageof
of Server
CPU time
60.0
50.0
40.0
30.0
20.0
10.0
0.0
Idl e
Ethernet
Dri ver
Interrupt
Handl i ng
Li bc
NeST
Oprofi l e
IDE
Fi l e I/O
Rest of
Kernel
Read From NeST
12.5
44.2
10.2
10.4
1.1
3.5
3.0
2.1
12.9
Wri te To NeST
57.7
1.0
4.3
12.6
6.7
3.7
0.3
1.7
12.0
c)
d)
Outline







Introduction
Background
The Concept
Data Placement Subsystem
Progress Made
Contributions
Future Work
Reliable and Efficient Data Placement in a Grid Environment
Contributions

Short term:


Provide a system for reliable and efficient data
placement for the use of Grid community
Already deployed at:



Soon will be deployed at:



NCSA, WCER, UW-HEP, NOAO, OSU
Projects: USCMS, DPOSS, SDSS, Blast
CERN, ISI, SLAC, Caltech, LOCI, BMRB
Projects: CMS, BaBar, Quest, IBP
In CERN package it will be distributed to 40
countries and 70 institutions
Reliable and Efficient Data Placement in a Grid Environment
Contributions

Medium Term:

Introduce a new concept to the distributed
systems community




“Regard data placement as a first class citizen”
Profiling work: better understanding of
storage systems and transfer protocols
Characterization of data placement jobs
Provide and apply a set of policies for
storage systems
Reliable and Efficient Data Placement in a Grid Environment
Contributions

Long Term:


Serve as a basis for further research in
data placement area
Two of our papers are already being
studies in classes:

Eg. Graduate level “Scheduling in Distributed
Systems” class at OSU
Reliable and Efficient Data Placement in a Grid Environment
Future Work

Get a better understanding of data
placement jobs



Extend the Profiling work
Study real workloads
Define the set of scheduling decisions
specific to them and apply
Reliable and Efficient Data Placement in a Grid Environment
Future Work - II

Define a set of policies for storage
systems and apply




Use results of profiling work
Consider user concerns
Prevent overloading, avoid failures
Provide efficient usage and load balancing..
Reliable and Efficient Data Placement in a Grid Environment
Future Work - III

Collect useful information, interpret and
feed back to the scheduling system




Collect and interpret log files
Interact with network monitoring tools
Increase reliability and efficiency
Run-time adaptation
Reliable and Efficient Data Placement in a Grid Environment
Future Work - IV

Better coordination of computational
and data resources


Study ways to interact/integrate the data
placement scheduler with higher level
planners and computational schedulers
More reliable and efficient data processing
systems/pipelines
Reliable and Efficient Data Placement in a Grid Environment
Conclusions




Data placement is crucial in a
distributed computing environment.
Current approaches regard it as a side
affect of computation.
It must be regarded as a first class
citizen just like the computational jobs.
Regard data placement activities as full
fledged jobs.
Reliable and Efficient Data Placement in a Grid Environment
Conclusions


Distinguish data placement jobs from
computational jobs.
Design and implement a data placement
subsystem to reliably and efficiently
schedule, execute, monitor, and manage
them.
Reliable and Efficient Data Placement in a Grid Environment
Thank you for listening..
Questions?
Download