A framework for implementing IO-bound maintenance applications Eno Thereska PARALLEL DATA LABORATORY

advertisement
A framework for implementing IO-bound
maintenance applications
Eno Thereska
PARALLEL DATA LABORATORY
Carnegie Mellon University
Disk maintenance applications
• Lots of disk maintenance apps
• data protection (backup)
• storage optimization (defrag, load bal.)
• caching (write-backs)
• Important for system robustness
• Background activities
• should eventually complete
• ideally without interfering with primary apps
http://www.pdl.cmu.edu/
2
Eno Thereska
November 2004
Current approaches
Client
workload
Backup
throughput MB/s
• Implement maintenance application as
foreground application
• competes for bandwidth or off-hours only
http://www.pdl.cmu.edu/
3
Eno Thereska
November 2004
Current approaches
Client
workload
Cache writeback
throughput MB/s
• Trickle maintenance activity periodically
• lost opportunities due to inadequate
scheduling decisions
http://www.pdl.cmu.edu/
4
Eno Thereska
November 2004
Real support for background applications
• Push maintenance activities in the
background
• priorities and explicit support for them
• APIs allow application expressiveness
• Storage subsystem does the scheduling
• using idle time, if there is any
• using otherwise-wasted rotational latency in a
busy system
http://www.pdl.cmu.edu/
5
Eno Thereska
November 2004
Outline
•
•
•
•
•
Motivation and overview
The freeblock subsystem
Background application interfaces
Example applications
Conclusions
http://www.pdl.cmu.edu/
6
Eno Thereska
November 2004
The freeblock subsystem
• Disk scheduling subsystem supporting new
APIs with explicit background requests
• Finds time for background activities
• by detecting idle time (short and long bursts)
• by utilizing otherwise-wasted rotational latency
in a busy system
http://www.pdl.cmu.edu/
7
Eno Thereska
November 2004
After reading blue sector
After BLUE read
http://www.pdl.cmu.edu/
8
Eno Thereska
November 2004
Red request scheduled next
After BLUE read
http://www.pdl.cmu.edu/
9
Eno Thereska
November 2004
Seek to Red’s track
After BLUE read
http://www.pdl.cmu.edu/
Seek for RED
10
Eno Thereska
November 2004
Wait for Red sector to reach head
After BLUE read
http://www.pdl.cmu.edu/
Seek for RED
Rotational latency
11
Eno Thereska
November 2004
Read Red sector
After BLUE read
http://www.pdl.cmu.edu/
Seek for RED
Rotational latency
12
After RED read
Eno Thereska
November 2004
Traditional service time components
After BLUE read
Seek for RED
Rotational latency
After RED read
• Rotational latency is wasted time
http://www.pdl.cmu.edu/
13
Eno Thereska
November 2004
Rotational latency gap utilization
After BLUE read
http://www.pdl.cmu.edu/
14
Eno Thereska
November 2004
Seek to Third track
After BLUE read
Seek to Third
SEEK
http://www.pdl.cmu.edu/
15
Eno Thereska
November 2004
Free transfer
After BLUE read
SEEK
http://www.pdl.cmu.edu/
Seek to Third
Free transfer
FREE TRANSFER
16
Eno Thereska
November 2004
Seek to Red’s track
After BLUE read
SEEK
http://www.pdl.cmu.edu/
Seek to Third
Free transfer
FREE TRANSFER
17
Seek to RED
SEEK
Eno Thereska
November 2004
Read Red sector
After BLUE read
SEEK
http://www.pdl.cmu.edu/
Seek to Third
Free transfer
FREE TRANSFER
18
Seek to RED
After RED read
SEEK
Eno Thereska
November 2004
Steady background I/O progress
40
from idle time
35
from rotational gaps
“Free” MB/s
30
25
20
15
10
5
0
0
10
20 30
40
50 60
70 80
90 100
% disk utilization by foreground (random 4KB) reads/writes
http://www.pdl.cmu.edu/
19
Eno Thereska
November 2004
The freeblock subsystem (cont…)
• Implemented in FreeBSD
• Efficient scheduling
• low CPU and memory utilizations
• Minimal impact on foreground workloads
• < 2%
• See refs for more details
http://www.pdl.cmu.edu/
20
Eno Thereska
November 2004
Outline
•
•
•
•
•
Motivation and overview
The freeblock subsystem
Background application interfaces
Example applications
Conclusions
http://www.pdl.cmu.edu/
21
Eno Thereska
November 2004
Application programming interface (API) goals
• Work exposed but done opportunistically
• all disk accesses are asynchronous
• Minimized memory-induced constraints
• late binding of memory buffers
• late locking of memory buffers
• “Block size” can be application-specific
• Support for speculative tasks
• Support for rate control
http://www.pdl.cmu.edu/
22
Eno Thereska
November 2004
API description: task registration
application
fb_read (addr_range, blksize,…)
fb_write (addr_range, blksize,…)
Background
Foreground
background scheduler
foreground scheduler
http://www.pdl.cmu.edu/
23
Eno Thereska
November 2004
API description: task completion
application
callback_fn (addr, buffer, flag, …)
Background
Foreground
background scheduler
foreground scheduler
http://www.pdl.cmu.edu/
24
Eno Thereska
November 2004
API description: late locking of buffers
application
buffer = getbuffer_fn (addr, …)
Background
Foreground
background scheduler
foreground scheduler
http://www.pdl.cmu.edu/
25
Eno Thereska
November 2004
API description: aborting/promoting tasks
application
fb_abort (addr_range, …)
fb_promote (addr_range, …)
Background
Foreground
background scheduler
foreground scheduler
http://www.pdl.cmu.edu/
26
Eno Thereska
November 2004
Complete API
http://www.pdl.cmu.edu/
27
Eno Thereska
November 2004
Designing disk maintenance applications
• APIs talk in terms of logical blocks (LBNs)
• Some applications need structured version
• as presented by file system or database
• Example consistency issues
• application wants to read file “foo”
• registers task for inode’s blocks
• by time blocks read, file may not exist anymore!
http://www.pdl.cmu.edu/
28
Eno Thereska
November 2004
Designing disk maintenance applications
• Application does not care about structure
• scrubbing, data migration, array reconstruction
• Coordinate with file system/database
• cache write-backs, LFS cleaner, index
generation
• Utilize snapshots
• backup, background fsck
http://www.pdl.cmu.edu/
29
Eno Thereska
November 2004
Outline
•
•
•
•
•
Motivation and overview
The freeblock subsystem
Background application interfaces
Example applications
Conclusions
http://www.pdl.cmu.edu/
30
Eno Thereska
November 2004
Example #1: Physical backup
• Backup done using snapshots
snapshot subsystem (in FS)
sys_fb_getrecord()
getblks()
sys_fb_read()
backup application
freeblock subsystem
http://www.pdl.cmu.edu/
31
Eno Thereska
November 2004
Example #1: Physical backup
• Experimental setup
•
•
•
•
•
18 GB Seagate Cheetah 36ES
FreeBSD in-kernel implementation
PIII with 384MB of RAM
3 benchmarks used: Synthetic, TPC-C, Postmark
snapshot includes 12GB of disk
• GOAL: read whole snapshot for free
http://www.pdl.cmu.edu/
32
Eno Thereska
November 2004
Backup completed for free
90
80
Backup time (mins)
70
60
50
< 2% impact on
foreground workload
40
30
20
10
0
Idle system
http://www.pdl.cmu.edu/
Synthetic
TPC-C
33
Postmark
Eno Thereska
November 2004
Example #2: Cache write-backs
• Must flush dirty buffers
• for space reclamation
• for persistence (if memory is not NVRAM)
• Simple cache manager extensions
•
•
•
•
fb_write(dirty_buffer,…)
getbuffer_fn(dirty_buffer,…)
fb_promote(dirty_buffer,…)
fb_abort(dirty_buffer,…)
http://www.pdl.cmu.edu/
34
when blocks become dirty
calling back to lock
when flush is forced
when block dies in cache
Eno Thereska
November 2004
Example #2: Cache write-backs
• Experimental setup
•
•
•
•
•
18 GB Seagate Cheetah 36ES
PIII with 384MB of RAM
controlled experiments with synthetic workload
benchmarks (same as used before) in FreeBSD
syncer daemon wakes up every 1 sec and
flushes entries that have been dirty > 30secs
• GOAL: write back dirty buffers for free
http://www.pdl.cmu.edu/
35
Eno Thereska
November 2004
% improvement in avg. resp. time
% dirty buffers cleaned for free
Foreground read:write has impact
100
80
60
40
20
0
0
1:2
1:1
2:1
read-write ratio
http://www.pdl.cmu.edu/
100
80
60
40
20
0
0
1:2
1:1
2:1
read-write ratio
36
Eno Thereska
November 2004
100
80
60
40
20
0
LRU+Syncer
http://www.pdl.cmu.edu/
LRU only
% improvement in avg. resp. time
% dirty buffers cleaned for free
95% of NVRAM cleaned for free
37
100
80
60
40
20
0
LRU+Syncer
LRU only
Eno Thereska
November 2004
50
40
30
20
10
0
Synthetic TPC-C Postmark
http://www.pdl.cmu.edu/
% improvement in app. throughput
% dirty buffers cleaned for free
10-20% improvement in overall perf.
38
50
40
30
20
10
0
Synthetic TPC-C Postmark
Eno Thereska
November 2004
Example #3: Layout reorganizer
• Layout reorganization improves
access latencies
• defragmentation is a type of reorganization
• typical example of background activity
• Our experiment:
• disk used is 18GB
• we want to defrag up to 20% of it
• goal: defrag for free
http://www.pdl.cmu.edu/
39
Eno Thereska
November 2004
Disk Layout Reorganized for Free!
600
Reorganization time (mins)
Random
500
Circular
400
Track Shuffle
300
200
100
0
1%
http://www.pdl.cmu.edu/
10%
20%
1%
8MB
Reorganizer buffer size (MB)
40
10%
64MB
20%
Eno Thereska
November 2004
Other maintenance applications
•
•
•
•
•
Virus scanner
LFS cleaner
Disk scrubber
Data mining
Data migration
http://www.pdl.cmu.edu/
41
Eno Thereska
November 2004
Summary
• Framework for building background
storage applications
• Asynchronous interfaces
• applications describe what they need
• storage subsystem satisfies their needs
• Works well for real applications
http://www.pdl.cmu.edu/Freeblock/
http://www.pdl.cmu.edu/
42
Eno Thereska
November 2004
Download