stdchk : A Checkpoint Storage System for Desktop Grid Computing – UBC

advertisement
stdchk: A Checkpoint Storage System
for Desktop Grid Computing
Samer Al-Kiswany – UBC
Matei Ripeanu – UBC
Sudharshan S. Vazhkudai – ORNL
Abdullah Gharaibeh – UBC
The University of British Columbia
Oak Ridge National Laboratory
1
Checkpointing Introduction
Checkpointing uses: fault tolerance, debugging, or migration.
Typically, an application running for days on hundreds of nodes
(e.g. a desktop gird ) saves checkpoint images periodically.
...
C
C
C
ICDCS ‘08
C
2
Deployment Scenario
ICDCS ‘08
3
The Challenge
Although checkpointing is necessary:
 It is a pure overhead from the performance point of view.
Most of the time spent writing to the storage system.
 Generates a high load on the storage system
Requirement:
High performance, scalable, and reliable storage system
optimized for checkpointing applications.
Challenge:
Low cost, transparent support for checkpointing at filesystem level.
ICDCS ‘08
4
Checkpointing Workload Characteristics
 Write intensive application ( bursty ).
e.g., a job running on hundreds of nodes.  periodically checkpoints
100s of GB of data.
 Write once, rarely read during application execution.
 Potentially high similarity between consecutive checkpoints.
 Applications specific checkpoint image life span.
When it is safe to delete the image ?
ICDCS ‘08
5
Why Checkpointing-Optimized Storage System?
 Optimizing for checkpointing workload can bring valuable
benefits:
 High throughput through specialization.
 Considerable storage space and network effort saving.
through transparent support for incremental checkpointing.
 Simplified data management by exploiting the particulaities
of checkpoint usage scenarios.
 Reduce the load on a share file-system
 Can be built atop scavenged resources – low cost.
ICDCS ‘08
6
stdchk
A checkpointing optimized storage system built using
scavenged resources.
ICDCS ‘08
7
Outline
 stdchk architecture
 stdchk features
 stdchk system evaluation
ICDCS ‘08
8
stdchk Architecture
Manager
(Metadata management)
Benefactors
(Storage nodes)
Client
(FS interface)
ICDCS ‘08
9
stdchk Features
 High-throughput for write operations
 Support for transparent incremental checkpointing
 Simplified data management
 High reliability through replication
 POSIX file system API – as a result using stdchk does
not require modifications to the application.
ICDCS ‘08
10
Optimized Write Operation Alternatives
Write procedure alternatives:
 Complete local write
 Incremental write
 Sliding window write
ICDCS ‘08
11
Optimized Write Operation Alternatives
Write procedure alternatives:
 Complete local write
 Incremental write
 Sliding window write
Compute Node
Application
stdchk
stdchk FS Interface
Disk
ICDCS ‘08
12
Optimized Write Operation Alternatives
Write procedure alternatives:
 Complete local write
 Incremental write
 Sliding window write
Compute Node
Application
stdchk
stdchk FS Interface
Disk
ICDCS ‘08
13
Optimized Write Operation Alternatives
Write procedure alternatives:
 Complete local write
 Incremental write
 Sliding window write
Compute Node
Application
stdchk
stdchk FS Interface
Memory
Disk
ICDCS ‘08
14
Write Operation Evaluation
Testbed:
28 machines
Each machine has : two 3.0GHz Xeon processors, 1 GB
RAM, two 36.5GB SCSI disks.
ICDCS ‘08
15
Achieved Storage Bandwidth
Complete Local Write
Sliding-Window Write
NFS
Linear (iperf)
Incremental Write
iperf
Local I/O
Write Throughput (MB/s) .
Sliding120
Window
write achieves
high
100
bandwidth (110
80
MBps)
60
Saturates
the
1 Gbps link
40
20
0
1
2
4
Stripe Width
8
The average ASB over a 1 Gbps testbed.
ICDCS ‘08
16
stdchk Features
 High throughput write operation
 Transparent incremental checkpointing
 Checkpointing optimized data management
 POSIX file system interface – no required modification to
the application
ICDCS ‘08
17
Transparent Incremental Checkpointing
Incremental checkpointing may bring valuable benefits:
 Lower network effort.
 Less storage space used.
But :
How much similarity is there between consecutive checkpoints ?
How can we detect similarities between checkpoints? Is this fast
enough?
ICDCS ‘08
18
Similarity Detection Mechanism – Compare-by-Hash
Hashing
Checkpoint T0
X
X
T0
Y
Y
Z
Z
ICDCS ‘08
19
Similarity Detection Mechanism – Compare-by-Hash
Will store T1
Hashing
Checkpoint T1
X
W
T0
Y
Y
T1
Z
Z
W
ICDCS ‘08
20
Similarity Detection Mechanism
 How to divide the file into blocks?
 Fixed-size blocks + compare-by-Hash (FsCH)
 Content-based blocks + compare-by-Hash (CbCH)
ICDCS ‘08
21
FsCH Insertion Problem
B1
B2
B3
B4
B5
B1
B2
B3
B4
B5
Checkpoint i
B6
Checkpoint i+1
Result:
Lower similarity detection ratio.
ICDCS ‘08
22
Content-based Compare-by-Hash (CbCH)
offset
B1
B2
B3
B4
Checkpoint i
m bytes
Hashing
k bits
HashValue
HashValue
HashValue
==0K 0=? ?0 ?
KK
ICDCS ‘08
23
Content-based Compare-by-Hash (CbCH)
B1
B2
B3
B4
Checkpoint i
B1
BX
B3
B4
Checkpoint i+1
Result:
Higher similarity detection ratio.
But:
Computationally intensive.
ICDCS ‘08
24
Evaluating Similarity Between Consecutive Checkpoints
The Applications : BMS* and BLAST
Checkpointing interval: 1, 5 and 15 minutes
Type
Number of
checkpoints
Avg. Checkpoint
size
Application level
100
2.4 MB
System level - BLCR
 1200
450 MB
Virtual machine level - Xen
 400
1 GB
* Checkpoints by Pratul Agarwal (ORNL)
ICDCS ‘08
25
Similarity Ratio and Detection Throughput
Technique 
Interval 
FsCH
1MB
CbCH
nooverlap
m=20B,
k=14b
BMS
BLAST
App
BLCR
1 min
5 min
0.0% [108]
23.4% [109]
0.0% [28.4] 82% [26.6]
Xen
15 min
5 or 15 min
6.3% [113] 0.0% [110]
70% [26.4] 0.0%
0.0% [28.4]
The table presents the average rate of detected similarity and the
throughput in MB/s (in brackets) for each heuristic.
But: Using the GPU, CbCH achieves over 190 MBps throughput !!
- StoreGPU: Exploiting Graphics Processing Units to Accelerate
Distributed Storage Systems, S. Al-Kiswany, A. Gharaibeh, E. SantosNeto, G. Yuan, M. Ripeanu, HPDC, 2008.
ICDCS ‘08
26
Compare-by-Hash Results
FsCH slightly
degrades achieved
bandwidth.
But reduces the
storage space
used and network
effort by 24%
Write Throughput (MB/s) .
120
100
80
60
40
no-detection
FsCH
20
0
64
128
256
File System Interface Write
Buffer size (MB)
Achieved Storage Bandwidth
ICDCS ‘08
27
Outline
 stdchk architecture
 stdchk features
 stdchk overall system evaluation
ICDCS ‘08
28
stdchk throughput (MB/s) .
stdchk Scalability
450
Steady
Nodes
Join
400
Nodes
Leave
350
stdchk sustains
300
high loads
:
250
 Number of nodes
200
 Workload
150
100
50
0
0
50
100
150
Time (s)
200
250
300
7 clients: Each client writes 100 files (100MB each). Total of 70GB.
stdchk pool of 20 benefactor nodes.
ICDCS ‘08
29
Experiment with Real Application
Application : BLAST
Execution time: > 5 days
Checkpointing interval : 30s
Stripe width : 4 benefactors
Client machine: two 3.0GHz Xeon processors, SCSI disks.
Checkpointing time (s)
Data size (TB)
Total execution time (s)
Local
disk
stdchk
22,733
16,497
27.0%
3.55
1.14
69.0%
462,141
455,894
ICDCS ‘08
Improvement
1.3%
30
Summary
stdchk : A checkpointing optimized storage system built
using scavenged resources.
stdchk features:
 High throughput write operation
 Saves considerable disk space and network effort.
 Checkpointing optimized data management
 Easy to adopt – implements a POSIX file system interface
 Inexpensive - built atop scavenged resources
Consequently, stdchk:
 Offloads the checkpointing workload from the shared FS.
 Speeds up the checkpointing operations (reduces
checkpointing overhead)
ICDCS ‘08
31
Thank you
netsyslab.ece.ubc.ca
ICDCS ‘08
32
Download