Document 14106740

advertisement
Using Similarity in Content and Access Pa5erns to Improve Space Efficiency and Performance in Storage Systems Xing Lin PhD Defense July 22, 2015
Photos Stored in Facebook
Reference “Finding a needle in Haystack: Facebook’s photo storage”, in USENIX OSDI ‘10
2
Photos Stored in Facebook
Reference “Finding a needle in Haystack: Facebook’s photo storage”, in USENIX OSDI ‘10
3
Photos Stored in Facebook
Ÿ Number of photos uploaded every week Reference “Finding a needle in Haystack: Facebook’s photo storage”, in USENIX OSDI ‘10
4
Photos Stored in Facebook
Ÿ Number of photos uploaded every week 1 billion (60 terabytes)
Reference “Finding a needle in Haystack: Facebook’s photo storage”, in USENIX OSDI ‘10
5
Photos Stored in Facebook
Ÿ Number of photos uploaded every week 1 billion (60 terabytes)
Ÿ Total number of photos stored by 2010 260 billion (20 petabytes)
Reference “Finding a needle in Haystack: Facebook’s photo storage”, in USENIX OSDI ‘10
6
Amazon S3
7
Amazon S3
5 objects for each star in the galaxy 8
Amazon S3
5 objects for each star in the galaxy (Assume) average object size is 100 KB, total data size is 200 PB
9
How Much Data Does This Plane Generate per Flight?
BOEING 787
10
How Much Data Does This Plane Generate per Flight?
BOEING 787
11
Exponencal Increase of Digital Data
12
Exponencal Increase of Digital Data
Efficient Storage Solucons
13
Data Reduccon Techniques
•  Compression: find redundant strings and replace with compact encodings –  2x reduccon –  LZ ([Ziv and Lempel 1997]), lz4, gzip, bzip2, 7z, xz, … 14
Data Reduccon Techniques
•  Compression: find redundant strings and replace with compact encodings –  2x reduccon –  LZ ([Ziv and Lempel 1997]), lz4, gzip, bzip2, 7z, xz, … •  Deduplica0on: find duplicate chunks and store unique ones –  10x reduccon for backups –  Venc ([Quinlan FAST02]), DataDomain FileSystem ([Zhu FAST08]), iDedup ([Srinivasan FAST12]), … 15
Limitacons
•  Compression: search redundancy in string level –  Does not scale for deteccng redundant strings across a large range 16
Limitacons
•  Compression: search redundancy in string level –  Does not scale for deteccng redundant strings across a large range •  Deduplica0on: interleave metadata with data –  Frequent metadata changes introduce many unnecessary unique chunks
17
Proposed Solucons
•  Migratory Compression: detect similarity in block level to group similar blocks (Chap2, FAST14) 18
Proposed Solucons
•  Migratory Compression: detect similarity in block level to group similar blocks (Chap2, FAST14) •  Deduplica0on: –  Separate metadata from data, to store same type of data together (Chap3, HotStorage15) 19
Proposed Solucons
•  Migratory Compression: detect similarity in block level to group similar blocks (Chap2, FAST14) •  Deduplica0on: –  Separate metadata from data, to store same type of data together (Chap3, HotStorage15) –  Use deduplicacon for efficient disk image storage (Chap4, TridentCom15) 20
Proposed Solucons
•  Migratory Compression: detect similarity in block level to group similar blocks (Chap2, FAST14) •  Deduplica0on: –  Separate metadata from data, to store same type of data together (Chap3, HotStorage15) –  Use deduplicacon for efficient disk image storage (Chap4, TridentCom15) •  Differen0al IO Scheduling: schedule same type of IO requests for predictable and efficient performance (Chap5, HotCloud12)
21
Thesis Statement
Similarity in content and access pa5erns can be uclized to improve space efficiency, by storing similar data together and performance predictability and efficiency, by scheduling similar IO requests to the same hard drive. 22
Outline
ü  Introduccon
ü  Migratory Compression
ü  Improve deduplicacon by separacng metadata from data
ü  Using deduplicacon for efficient disk image deployment ü  Performance predictability and efficiency for Cloud Storage Systems
ü  Conclusion
23
Background on Compression
•  Compression: finds redundant strings within a window size and encodes with more compact structures 24
Background on Compression
•  Compression: finds redundant strings within a window size and encodes with more compact structures •  Metrics -  Compression Factor (CF) = original size / compressed size -  Throughput = original size / (de)compression cme
25
Background on Compression
•  Compression: finds redundant strings within a window size and encodes with more compact structures •  Metrics -  Compression Factor (CF) = original size / compressed size -  Throughput = original size / (de)compression cme
Compressor
MAX Window size
Techniques
gzip
64 KB
LZ; Huffman coding
bzip2
900 KB
Run-­‐length encoding; Burrows-­‐
Wheeler Transform; Huffman coding
7z
1 GB
Variant of LZ77; Markov chain-­‐based range coder
26
Background on Compression
•  Compression: finds redundant strings within a window size and encodes with more compact structures •  Metrics -  Compression Factor (CF) = original size / compressed size -  Throughput = original size / (de)compression cme
Compressor
MAX Window size
Techniques
gzip
64 KB
LZ; Huffman coding
bzip2
900 KB
Run-­‐length encoding; Burrows-­‐
Wheeler Transform; Huffman coding
7z
1 GB
Variant of LZ77; Markov chain-­‐based range coder
27
Background on Compression
•  Compression: finds redundant strings within a window size and encodes with more compact structures •  Metrics -  Compression Factor (CF) = original size / compressed size -  Throughput = original size / (de)compression cme
MAX Techniques
Window size
gzip
64 KB
bzip2
7z
900 KB
1 GB
LZ; Huffman coding
Run-­‐length encoding; …
Markov chain-­‐
based range coder
Compression Tput (MB/s)
Compr
essor
Example Throughput vs. CF
14
gzip
12
10
8
bzip2
6
4
7z
2
0
2.0
2.5
3.0
3.5
4.0
4.5
Compression Factor (X)
28
5.0
Background on Compression
•  Compression: finds redundant strings within a window size and encodes with more compact structures •  Metrics -  Compression Factor (CF) = original size / compressed size -  Throughput = original size / (de)compression cme
MAX Techniques
Window size
gzip
64 KB
bzip2
7z
900 KB
1 GB
LZ; Huffman coding
Run-­‐length encoding; …
Markov chain-­‐
based range coder
Compression Tput (MB/s)
Compr
essor
Example Throughput vs. CF
14
gzip
12
10
8
bzip2
6
4
7z
2
0
2.0
2.5
3.0
3.5
4.0
4.5
Compression Factor (X)
The larger the window, the beEer the compression but slower. Fundamental reason: finding redundancy in string level does not scale to large windows
29
5.0
Migratory Compression
•  Problem: finding redundancy in string level does not scale to large windows •  Key idea: group by similarity in block level, enabling standard compressors to find repeated strings with small windows 30
Migratory Compression
•  Problem: finding redundancy in string level does not scale to large windows •  Key idea: group by similarity in block level, enabling standard compressors to find repeated strings with small windows •  Migratory compression (MC): coarse-­‐grained reorganizacon to group similar blocks to improve compressibility –  A generic pre-­‐processing stage for standard compressors –  In many cases, improve both compressibility and throughput –  Effeccve for improving compression for archival storage 31
Migratory Compression
•  Compress a single, large file (mzip) -  Tradiconal compressors are unable to exploit redundancy across a large range of data (e.g., many GB) 32
Migratory Compression
•  Compress a single, large file (mzip) -  Tradiconal compressors are unable to exploit redundancy across a large range of data (e.g., many GB) A
B
A’
C
… (GBs)
A”
B’
…
window
33
Migratory Compression
•  Compress a single, large file (mzip) -  Tradiconal compressors are unable to exploit redundancy across a large range of data (e.g., many GB) A
B
A’
C
… (GBs)
A”
B’
…
Standard compressor
34
Migratory Compression
•  Compress a single, large file (mzip) -  Tradiconal compressors are unable to exploit redundancy across a large range of data (e.g., many GB) A
B
A’
C
… (GBs)
A”
B’
…
compress
…
…
Standard compressor
35
Migratory Compression
•  Compress a single, large file (mzip) -  Tradiconal compressors are unable to exploit redundancy across a large range of data (e.g., many GB) A
B
A’
C
… (GBs)
A”
B’
…
compress
…
…
Standard compressor
mzip
36
Migratory Compression
•  Compress a single, large file (mzip) -  Tradiconal compressors are unable to exploit redundancy across a large range of data (e.g., many GB) A
B
A’
… (GBs)
A”
B’
…
transform
compress
…
C
A
A’
A”
B
B’
C
…
…
Standard compressor
mzip
37
Migratory Compression
•  Compress a single, large file (mzip) -  Tradiconal compressors are unable to exploit redundancy across a large range of data (e.g., many GB) A
B
A’
… (GBs)
A”
B’
…
transform
compress
…
C
A
…
Standard compressor
A’
A”
B
B’
C
…
compress
mzip
38
Migratory Compression
•  Compress a single, large file (mzip) -  Tradiconal compressors are unable to exploit redundancy across a large range of data (e.g., many GB) A
B
A’
… (GBs)
A”
B’
…
transform
compress
…
C
A
…
Standard compressor
A’
A”
B
C
B’
…
compress
…
mzip
39
mzip Example
Block ID
A
1
B
2
C
3
A’
4
D
5
B’
…
File
0
…
40
mzip Example
Block ID
Similarity
detector
A
1
B
2
C
3
A’
4
D
5
B’
…
File
0
…
41
mzip Example
migrate recipe
0 3 1 5 2 4 .
.
Block ID
Similarity
detector
A
1
B
2
C
3
A’
4
D
5
B’
…
File
0
…
0 2 4 1 5 3 .
.
restore recipe
42
mzip Example
migrate recipe
0 3 1 5 2 4 .
.
Block ID
A
1
B
2
C
3
A’
4
D
5
B’
…
File
0
…
0 2 4 1 5 3 .
.
restore recipe
43
mzip Example
migrate recipe
0 3 1 5 2 4 .
Migrate
Block ID
A
0
1
B
A’
1
2
C
B
2
3
A’
B’
3
4
D
C
4
5
B’
D
5
…
…
0 2 4 1 5 3 .
Reorganized
File
A
…
0
…
File
.
.
restore recipe
44
mzip Example
migrate recipe
0 3 1 5 2 4 .
Migrate
Block ID
A
0
1
B
A’
1
2
C
B
2
3
A’
B’
3
4
D
C
4
5
B’
D
5
…
…
0 2 4 1 5 3 .
Reorganized
File
A
…
0
…
File
.
.
restore recipe
45
mzip Example
migrate recipe
0 3 1 5 2 4 .
Migrate
Block ID
A
0
1
B
A’
1
2
C
B
2
3
A’
B’
3
4
D
C
4
5
B’
D
5
…
…
Reorganized
File
A
…
0
…
File
.
Restore
0 2 4 1 5 3 .
.
restore recipe
46
Detect Similar Blocks
Similarity feature: hash ([Broder 1997]) 47
Detect Similar Blocks
Similarity feature: hash ([Broder 1997]) Chunk
48
Detect Similar Blocks
Similarity feature: hash ([Broder 1997]) Chunk
Maximal Value 1
49
Detect Similar Blocks
Similarity feature: hash ([Broder 1997]) Chunk
Maximal Value 1
Maximal Maximal Value 3
Value 2
Maximal Value 4
50
Detect Similar Blocks
Similarity feature: hash ([Broder 1997]) Chunk
Maximal Value 1
Maximal Maximal Value 3
Value 2
Maximal Value 4
Similar chunk
51
Detect Similar Blocks
Similarity feature: hash ([Broder 1997]) Chunk
Maximal Value 1
Maximal Maximal Value 3
Value 2
Maximal Value 4
Similar chunk
52
Detect Similar Blocks
Similarity feature: hash ([Broder 1997]) Chunk
Maximal Value 1
Maximal Maximal Value 3
Value 2
Maximal Value 4
Similar chunk
Super-­‐feature ([Shilane FAST12])
53
Detect Similar Blocks
Similarity feature: hash ([Broder 1997]) Chunk
Maximal Value 1
Maximal Maximal Value 3
Value 2
Maximal Value 4
Similar chunk
Super-­‐feature A match on any super-­‐feature ([Shilane FAST12]) is good enough 54
Efficient Data Reorganizacon •  Problem: requires lots of random IOs 55
Efficient Data Reorganizacon •  Problem: requires lots of random IOs •  Hard drives: does not perform well •  SSD: provides good random IO performance 56
Efficient Data Reorganizacon • 
• 
• 
• 
Problem: requires lots of random IOs Hard drives: does not perform well SSD: provides good random IO performance Mulc-­‐pass: helps considerably for hard drives –  convert random IOs into mulcple scans of input files 57
Evaluacon
•  How much can compression be improved?
–  Compression Factor (CF): original size / compressed size
•  What is the complexity (runtime overhead)?
–  Compression throughput: original size / runtime
•  More in the paper
–  How does SSD or HDD affect data reorganization
performance? For HDD, does multi-pass help?
–  How does MC perform, compared with delta
compression?
–  What are the configuration options for MC?
58
Migratory Compression -­‐ Evaluacon
Dataset: backup of worksta0on1 (engineering desktop) Compression Tput (MB/s)
25 20 gzip
15 10 5 0 0 1 2 3 4 5 6 Compression Factor (X)
7 8 9 59
Migratory Compression -­‐ Evaluacon
Dataset: backup of worksta0on1 (engineering desktop) Compression Tput (MB/s)
25 gz(mc)
20 gzip
15 10 5 0 0 1 2 3 4 5 6 Compression Factor (X)
7 8 9 60
Migratory Compression -­‐ Evaluacon
Dataset: backup of worksta0on1 (engineering desktop) Compression Tput (MB/s)
25 gz(mc)
20 gzip
15 10 bz(mc)
7z(mc)
bzip2
5 7z
0 0 1 2 3 4 5 6 Compression Factor (X)
7 8 9 61
Migratory Compression -­‐ Evaluacon
Dataset: backup of worksta0on1 (engineering desktop) Compression Tput (MB/s)
25 gz(mc)
20 gzip
15 rzip -­‐ intra-­‐file dedupe -­‐ bzip2 remainder
rz(mc)
rzip
10 bz(mc)
7z(mc)
bzip2
5 7z
0 0 1 2 3 4 5 6 Compression Factor (X)
7 8 9 62
Migratory Compression -­‐ Evaluacon
Dataset: backup of worksta0on1 (engineering desktop) 25 Compression Tput (MB/s)
MC improves both CF and compression throughput ü Deduplicacon ü Re-­‐organizacon
gz(mc)
20 gzip
15 rzip -­‐ intra-­‐file dedupe -­‐ bzip2 remainder
rz(mc)
rzip
10 bz(mc)
7z(mc)
bzip2
5 7z
0 0 1 2 3 4 5 6 Compression Factor (X)
7 8 9 63
Compression Factor vs. Compression Throughput
Compression Tput (MB/s)
Exchange2
25
gzip
20
bzip2
15
7z
rzip
10
gz(mc)
5
bz(mc)
7z(mc)
0
0
2
4
6
Compression Factor (X)
8
rz(mc)
Compression Factor vs. Compression Throughput
Compression Tput (MB/s)
Exchange2
MC improves CF but slightly reduces compression throughput
25
gzip
20
bzip2
15
7z
rzip
10
gz(mc)
5
bz(mc)
7z(mc)
0
0
2
4
6
Compression Factor (X)
8
rz(mc)
Migratory Compression -­‐ Summary
•  More results in the paper – 
– 
– 
– 
Decompression performance Default vs. maximal compression Memory vs. SSD vs. hard drives Applicacon of MC for archival storage Migratory Compression: Coarse-­‐grained Data Reordering to Improve Compressibility Xing Lin, Guanlin Lu, Fred Douglis, Philip Shilane, and Grant Wallace FAST ‘14: Proceedings of the 12th USENIX Conference on File and Storage Technologies, Feb. 2014 66
•  More results in the paper – 
– 
– 
– 
Decompression performance Default vs. maximal compression Memory vs. SSD vs. hard drives Applicacon of MC for archival storage Throughput
Migratory Compression -­‐ Summary
15
10
5
0
2.0
3.0 4.0
CF
5.0
•  Migratory Compression –  Improves exiscng compressors, in both compressibility and frequently run0me –  Redraw the performance-­‐compression curve! Migratory Compression: Coarse-­‐grained Data Reordering to Improve Compressibility Xing Lin, Guanlin Lu, Fred Douglis, Philip Shilane, and Grant Wallace FAST ‘14: Proceedings of the 12th USENIX Conference on File and Storage Technologies, Feb. 2014 67
Outline
ü  Introduccon
ü  Migratory Compression
ü  Improve deduplicaKon by separaKng metadata from data
ü  Using deduplicacon for efficient disk image deployment ü  Performance predictability and efficiency for Cloud Storage Systems
ü  Conclusion
68
Deduplicacon
Idea: idencfy duplicate data blocks and store a single copy 69
Deduplicacon
Idea: idencfy duplicate data blocks and store a single copy v1.0
Input Input v2.0
70
Deduplicacon
Idea: idencfy duplicate data blocks and store a single copy Chunks
A
B
C
D
Input E
F
G
H
v1.0
A
B
C
D
Input E
F’ G
H
v2.0
71
Deduplicacon
Idea: idencfy duplicate data blocks and store a single copy Chunks
A
B
C
D
Input E
F
G
H
v1.0
A
B
C
D
Input E
F’ G
H
v2.0
Stored on disk
A
B
C
D
E
F
G
H
F’
72
Deduplicacon
Idea: idencfy duplicate data blocks and store a single copy Chunks
A
B
C
D
Input E
F
G
H
v1.0
A
B
C
D
Input E
F’ G
H
v2.0
Stored on disk
A
B
C
D
E
F
G
Eliminate nearly all of v2.0 on disk
H
F’
73
Expected Deduplicacon
What if we have many versions?
74
Expected Deduplicacon
What if we have many versions?
v1.0
Input v2.0
Input 75
Expected Deduplicacon
What if we have many versions?
Dedup raco = original_size/post_dedup_size
v1.0
Input v2.0
Input ~2 x
76
Expected Deduplicacon
What if we have many versions?
Dedup raco = original_size/post_dedup_size
v1.0
Input v2.0
Input ~2 x
Input ~3 x
v3.0
77
Expected Deduplicacon
What if we have many versions?
Dedup raco = original_size/post_dedup_size
v1.0
Input v2.0
Input ~2 x
~3 x
v3.0
Input …
78
Great (Deduplicacon) Expectacons
•  In Reality –  308 versions of Linux source code: 2 x –  Other examples of awful deduplicacon 79
Great (Deduplicacon) Expectacons
•  In Reality –  308 versions of Linux source code: 2 x –  Other examples of awful deduplicacon •  ContribuKons –  Idencfy and categorize barriers to deduplicacon –  Solucons •  EMC Data Domain (industrial experience) •  GNU tar (academic research)
80
Metadata Impacts Deduplicacon
•  Case 1: metadata changes –  The input is an aggregate of many small user files, interleaved with file metadata 81
Metadata Impacts Deduplicacon
•  Case 1: metadata changes –  The input is an aggregate of many small user files, interleaved with file metadata File1
File2
File3
File4
File Metadata
File5
File Data
82
Metadata Impacts Deduplicacon
•  Case 1: metadata changes –  The input is an aggregate of many small user files, interleaved with file metadata File1
File2
File3
File4
File5
v1.0
File1
File2
File3
File4
File5
v2.0
83
Metadata Impacts Deduplicacon
•  Case 1: metadata changes –  The input is an aggregate of many small user files, interleaved with file metadata chunks
File1
File2
File3
File4
File5
v1.0
File1
File2
File3
File4
File5
v2.0
chunks
84
Metadata Impacts Deduplicacon
•  Case 1: metadata changes –  The input is an aggregate of many small user files, interleaved with file metadata –  GNU tar, EMC NetWorker, Oracle RMAN backup chunks
File1
File2
File3
File4
File5
v1.0
File1
File2
File3
File4
File5
v2.0
chunks
85
Metadata Impacts Deduplicacon
•  Case 1: metadata changes –  The input is an aggregate of many small user files, interleaved with file metadata –  GNU tar, EMC NetWorker, Oracle RMAN backup –  Videos suffer from a similar problem ([Dewakar HotStorage15])
chunks
File1
File2
File3
File4
File5
v1.0
File1
File2
File3
File4
File5
v2.0
chunks
86
Metadata Impacts Deduplicacon
•  Case 2: metadata locacon changes –  The input is encoded in (fixed-­‐size) blocks and metadata is inserted for each block 87
Metadata Impacts Deduplicacon
•  Case 2: metadata locacon changes –  The input is encoded in (fixed-­‐size) blocks and metadata is inserted for each block –  Data insercon/delecon leads to metadata shiws 88
Metadata Impacts Deduplicacon
•  Case 2: metadata locacon changes –  The input is encoded in (fixed-­‐size) blocks and metadata is inserted for each block –  Data insercon/delecon leads to metadata shiws –  Tape format
89
Metadata Impacts Deduplicacon
•  Case 2: metadata locacon changes –  The input is encoded in (fixed-­‐size) blocks and metadata is inserted for each block –  Data insercon/delecon leads to metadata shiws –  Tape format
v1.0
90
Metadata Impacts Deduplicacon
•  Case 2: metadata locacon changes –  The input is encoded in (fixed-­‐size) blocks and metadata is inserted for each block –  Data insercon/delecon leads to metadata shiws –  Tape format
v1.0
91
Metadata Impacts Deduplicacon
•  Case 2: metadata locacon changes –  The input is encoded in (fixed-­‐size) blocks and metadata is inserted for each block –  Data insercon/delecon leads to metadata shiws –  Tape format
v1.0
v2.0
92
Metadata Impacts Deduplicacon
•  Case 2: metadata locacon changes –  The input is encoded in (fixed-­‐size) blocks and metadata is inserted for each block –  Data insercon/delecon leads to metadata shiws –  Tape format
v1.0
v2.0
93
Metadata Impacts Deduplicacon
•  Case 2: metadata locacon changes –  The input is encoded in (fixed-­‐size) blocks and metadata is inserted for each block –  Data insercon/delecon leads to metadata shiws –  Tape format
v1.0
Chunks
v2.0
Chunks
94
Metadata Impacts Deduplicacon
•  Case 2: metadata locacon changes –  The input is encoded in (fixed-­‐size) blocks and metadata is inserted for each block –  Data insercon/delecon leads to metadata shiws –  Tape format
v1.0
Chunks
Locacons of block markers are shiwed, leading to different chunks
v2.0
Chunks
95
Solucon: Separate Metadata from Data
96
Solucon: Separate Metadata from Data
•  Three approaches: –  Recommended: design deduplicacon-­‐friendly formats •  Case study: EMC NetWorker 97
Solucon: Separate Metadata from Data
•  Three approaches: –  Recommended: design deduplicacon-­‐friendly formats •  Case study: EMC NetWorker –  Transparent: applicacon-­‐level post-­‐processing •  Case study: GNU tar 98
Solucon: Separate Metadata from Data
•  Three approaches: –  Recommended: design deduplicacon-­‐friendly formats •  Case study: EMC NetWorker –  Transparent: applicacon-­‐level post-­‐processing •  Case study: GNU tar –  Last resort: format-­‐aware deduplicacon •  Case studies: 1) virtual tape library (VTL)
2) Oracle RMAN backup
99
Solucon: Separate Metadata from Data
•  Three approaches: –  Recommended: design deduplicacon-­‐friendly formats •  Case study: EMC NetWorker –  Transparent: applicacon-­‐level post-­‐processing •  Case study: GNU tar –  Last resort: format-­‐aware deduplicacon •  Case studies: 1) virtual tape library (VTL)
2) Oracle RMAN backup
100
Applicacon-­‐level Post-­‐processing
•  tar (tape archive) –  Collects files into one archive file –  File system archiving, source code distribucon, … •  GNU tar format –  A sequence of entries, one per file –  For each file: a file header and data blocks –  Header block: file path, size, modificacon cme 101
Applicacon-­‐level Post-­‐processing
•  tar (tape archive) –  Collects files into one archive file –  File system archiving, source code distribucon, … •  GNU tar format –  A sequence of entries, one per file –  For each file: a file header and data blocks –  Header block: file path, size, modificacon cme File 1
File 2
Header block
Data blocks
102
Applicacon-­‐level Post-­‐processing
•  tar (tape archive) –  Collects files into one archive file –  File system archiving, source code distribucon, … •  GNU tar format –  A sequence of entries, one per file –  For each file: a file header and data blocks –  Header block: file path, size, modificacon cme File 1
File 2
Header block
Data blocks
103
Metadata Changes with GNU tar
104
Metadata Changes with GNU tar
SHA1s
linux-2.6.39.3/README a735c31cef6d19d56de6824131527fdce04ead47
linux-2.6.39.4/README a735c31cef6d19d56de6824131527fdce04ead47
105
Metadata Changes with GNU tar
SHA1s
linux-2.6.39.3/README a735c31cef6d19d56de6824131527fdce04ead47
linux-2.6.39.4/README a735c31cef6d19d56de6824131527fdce04ead47
linux-2.6.39.3/README -rw-rw-r-- 1 root root 17525 Jul 9 2011
linux-2.6.39.4/README -rw-rw-r-- 1 root root 17525 Aug 3 2011
106
Metadata Changes with GNU tar
SHA1s
linux-2.6.39.3/README a735c31cef6d19d56de6824131527fdce04ead47
linux-2.6.39.4/README a735c31cef6d19d56de6824131527fdce04ead47
linux-2.6.39.3/README -rw-rw-r-- 1 root root 17525 Jul 9 2011
linux-2.6.39.4/README -rw-rw-r-- 1 root root 17525 Aug 3 2011
File 1
File 2
Header block
Version 1
Data blocks
Modified
Header block
Version 2
107
Metadata Changes with GNU tar
SHA1s
linux-2.6.39.3/README a735c31cef6d19d56de6824131527fdce04ead47
linux-2.6.39.4/README a735c31cef6d19d56de6824131527fdce04ead47
linux-2.6.39.3/README -rw-rw-r-- 1 root root 17525 Jul 9 2011
linux-2.6.39.4/README -rw-rw-r-- 1 root root 17525 Aug 3 2011
File 1
File 2
Header block
Data blocks
Version 1
Chunk1 Chunk2
Chunk3
Modified
Header block
Chunk1 and Chunk2 in version 2 are different.
Version 2
Chunk1 Chunk2
Chunk3
108
Metadata Changes with GNU tar
SHA1s
linux-2.6.39.3/README a735c31cef6d19d56de6824131527fdce04ead47
linux-2.6.39.4/README a735c31cef6d19d56de6824131527fdce04ead47
linux-2.6.39.3/README -rw-rw-r-- 1 root root 17525 Jul 9 2011
linux-2.6.39.4/README -rw-rw-r-- 1 root root 17525 Aug 3 2011
File 1
File 2
Header block
Data blocks
Version 1
Chunk1 Chunk2
Chunk3
Modified
Header block
Chunk1 and Chunk2 in version 2 are different.
Version 2
Chunk1 Chunk2
Chunk3
2 x for 308 versions of Linux
109
Migratory tar (mtar)
tar
File 1
File 2
Header block
Data blocks
110
Migratory tar (mtar)
tar
File 1
File 2
Header block
Data blocks
Migrate
2
Number of
header blocks
mtar
111
Migratory tar (mtar)
tar
File 1
File 2
Header block
Data blocks
Restore
Migrate
2
Number of
header blocks
mtar
112
mtar -­‐ Evaluacon
•  9 GNU sowware and Linux kernel source code •  Many versions: 13 ~ 308 113
mtar -­‐ Evaluacon
•  9 GNU sowware and Linux kernel source code •  Many versions: 13 ~ 308 16
tar
mtar
Improvements N.N
Deduplication Ratio (X)
14
12
5.3
10
8
6
4
2
0
1.7 1.5 1.5 1.5
1.3 1.2 1.1 1.2
ke sh reu sk
a
lk
b
c
bc
am ba co fdi
gc gd gli sta
Datasets
1.1
tar
lin
ux
114
mtar -­‐ Evaluacon
•  9 GNU sowware and Linux kernel source code •  Many versions: 13 ~ 308 16
tar
mtar
Improvements N.N
Deduplication Ratio (X)
14
12
5.3
10
8
6
4
2
0
Improvements: 1. Across all datasets 2. Huge: 1.5-­‐5.3×
1.7 1.5 1.5 1.5
1.3 1.2 1.1 1.2
ke sh reu sk
a
lk
b
c
bc
am ba co fdi
gc gd gli sta
Datasets
1.1
tar
lin
ux
115
More in the Paper
•  Design deduplicaKon-­‐friendly formats –  Case study: EMC NetWorker •  Applicacon-­‐level post-­‐processing –  Case study: GNU tar •  Format-­‐aware DeduplicaKon –  Case studies: 1) virtual tape library 2) Oracle RMAN backup 116
Summary
•  Metadata impacts deduplicacon –  Metadata changes more frequently, introducing many unnecessary unique chunks •  Solucon: separate metadata from data –  Up to 5× improvements in deduplicacon Metadata Considered Harmful … to Deduplicacon Xing Lin, Fred Douglis, Jim Li, Xudong Li, Robert Ricci, Stephen Smaldone and Grant Wallance HotStorage ’15: 7th USENIX Workshop on Hot Topics in Storage and File Systems 117
Outline
ü  Introduccon
ü  Migratory Compression
ü  Improve deduplicacon by separacng metadata from data
ü  Using deduplicaKon for efficient disk image deployment ü  Performance predictability and efficiency for Cloud Storage Systems
ü  Conclusion
118
Introduccon
•  Cloud providers or network testbeds maintain a large number of operacng system images (disk images) –  Amazon Web Service (AWS): 37,000+ images –  Emulab: 1020+ 119
Introduccon
•  Cloud providers or network testbeds maintain a large number of operacng system images (disk images) –  Amazon Web Service (AWS): 37,000+ images –  Emulab: 1020+ •  Disk image: a snapshot of a disk’s content –  Several to hundreds of GBs. –  Similarity across disk images ([Jin, SYSTOR12]) 120
Introduccon
•  Cloud providers or network testbeds maintain a large number of operacng system images (disk images) –  Amazon Web Service (AWS): 37,000+ images –  Emulab: 1020+ •  Disk image: a snapshot of a disk’s content –  Several to hundreds of GBs. –  Similarity across disk images ([Jin, SYSTOR12]) Requirements: efficient image storage
121
What is Image Deployment?
The process of distribucng and installing a disk image at mulcple compute nodes 122
What is Image Deployment?
The process of distribucng and installing a disk image at mulcple compute nodes Image server
Compute Nodes
123
What is Image Deployment?
The process of distribucng and installing a disk image at mulcple compute nodes Image server
Compute Nodes
124
What is Image Deployment?
The process of distribucng and installing a disk image at mulcple compute nodes Requirements: efficient image transmission and installacon
Image server
Compute Nodes
125
Exiscng Systems: Frisbee and Venc
•  Frisbee: an efficient image deployment system ([Hibler ATC03]) •  Venc: a deduplicacng storage system ([Quinlan FAST02]) 126
Exiscng Systems: Frisbee and Venc
•  Frisbee: an efficient image deployment system ([Hibler ATC03]) •  Venc: a deduplicacng storage system ([Quinlan FAST02]) Image server
Switch
Compute Nodes
127
Exiscng Systems: Frisbee and Venc
•  Frisbee: an efficient image deployment system ([Hibler ATC03]) •  Venc: a deduplicacng storage system ([Quinlan FAST02]) Image server
Switch
Compute Nodes
128
Exiscng Systems: Frisbee and Venc
•  Frisbee: an efficient image deployment system ([Hibler ATC03]) •  Venc: a deduplicacng storage system ([Quinlan FAST02]) Install at full disk bandwidth
Image server
Switch
Compute Nodes
129
Exiscng Systems: Frisbee and Venc
•  Frisbee: an efficient image deployment system ([Hibler ATC03]) •  Venc: a deduplicacng storage system ([Quinlan FAST02]) Install at full disk bandwidth
Limited bandwidth
Image server
Switch
Compute Nodes
130
Exiscng Systems: Frisbee and Venc
•  Frisbee: an efficient image deployment system ([Hibler ATC03]) •  Venc: a deduplicacng storage system ([Quinlan FAST02]) Transfer compressed data
Install at full disk bandwidth
Limited bandwidth
Image server
Switch
Compute Nodes
131
Exiscng Systems: Frisbee and Venc
•  Frisbee: an efficient image deployment system ([Hibler ATC03]) •  Venc: a deduplicacng storage system ([Quinlan FAST02]) Store compressed data (compression is slow)
Transfer compressed data
Install at full disk bandwidth
Limited bandwidth
Image server
Switch
Compute Nodes
132
Integrate Frisbee with Deduplicacng Storage
Frisbee
Deduplicacng storage system Efficient Image Deployment
Efficient Image Storage
133
Integrate Frisbee with Deduplicacng Storage
Frisbee
Efficient Image Deployment
?
Deduplicacng storage system Efficient Image Storage
How to achieve efficient image deployment and storage simultaneously, by integracng these two systems?
134
Requirements for the Integracon
Use compression
Use filesystem to skip unallocated data
Image installacon at full disk bandwidth
Independent installable Frisbee chunks
Requirements
Solucons
135
Requirements for the Integracon
Use compression
Compress each chunk Use filesystem to skip unallocated data
Aligned Fixed-­‐size Chunking Image installacon at full disk bandwidth
Careful seleccon of chunk size Independent installable Frisbee chunks
Pre-­‐computacon of Frisbee chunk headers Requirements
Solucons
136
Requirements for the Integracon
Use compression
Compress each chunk Use filesystem to skip unallocated data
Aligned Fixed-­‐size Chunking Image installacon at full disk bandwidth
Careful seleccon of chunk size Independent installable Frisbee chunks
Pre-­‐computacon of Frisbee chunk headers Requirements
Solucons
137
Use Compression – Store Frisbee Images
Disk
Compressed Frisbee image
Frisbee image creacon tool
Disk
138
Use Compression – Store Frisbee Images
Chunks
Parccon into chunks
Disk
Compressed Frisbee image
Frisbee image creacon tool
Disk
139
Use Compression – Store Frisbee Images
Venc
Store
Chunks
Parccon into chunks
Disk
Compressed Frisbee image
Frisbee image creacon tool
Disk
140
Use Compression – Store Frisbee Images
Distribute and deploy
Fetch
Venc
Store
Chunks
Parccon into chunks
Disk
Compressed Frisbee image
Frisbee image creacon tool
Disk
141
Use Compression – Store Frisbee Images
Distribute and deploy
Compressed data does not deduplicate well
Fetch
Venc
Store
Chunks
Parccon into chunks
Disk
Compressed Frisbee image
Frisbee image creacon tool
Disk
142
Use Compression – Store Frisbee Images
Distribute and deploy
Compressed data does not deduplicate well
Fetch
Venc
Store
Chunks
✔ efficient image deployment ✗ efficient image storage
Parccon into chunks
Disk
Compressed Frisbee image
Frisbee image creacon tool
Disk
143
Use Compression – Store Raw Chunks
144
Use Compression – Store Raw Chunks
Chunks
Parccon into chunks
Disk
145
Use Compression – Store Raw Chunks
Venc
Store
Chunks
Parccon into chunks
Disk
146
Use Compression – Store Raw Chunks
Fetch
Venc
Store
Chunks
Parccon into chunks
Disk
147
Use Compression – Store Raw Chunks
Distribute and deploy
Compressed Disk
Frisbee image creacon tool
Fetch
Venc
Store
Chunks
Parccon into chunks
Disk
148
Use Compression – Store Raw Chunks
Distribute and deploy
Compressed Disk
Frisbee compression will become the bo5leneck
Frisbee image creacon tool
Fetch
Venc
Store
Chunks
Parccon into chunks
Disk
149
Use Compression – Store Raw Chunks
Distribute and deploy
Compressed Disk
Frisbee compression will become the bo5leneck
Frisbee image creacon tool
Fetch
✗ efficient image deployment ✔ efficient image storage
Venc
Store
Chunks
Parccon into chunks
Disk
150
Use Compression – Store Compressed Chunks
151
Use Compression – Store Compressed Chunks
Chunks
Parccon into chunks
Disk
152
Use Compression – Store Compressed Chunks
Compression
Chunks
Parccon into chunks
Disk
153
Use Compression – Store Compressed Chunks
Venc
Store
Compression
Chunks
Parccon into chunks
Disk
154
Use Compression – Store Compressed Chunks
Distribute and deploy
Fetch
Venc
Store
Compression
Chunks
Parccon into chunks
Disk
155
Use Compression – Store Compressed Chunks
Distribute and deploy
Fetch
Venc
•  No compression in image deployment •  Compression of two idenccal chunks => same compressed chunk Store
Compression
Chunks
Parccon into chunks
Disk
156
Use Compression – Store Compressed Chunks
Distribute and deploy
Fetch
Venc
Store
•  No compression in image deployment •  Compression of two idenccal chunks => same compressed chunk ✔  efficient image deployment ✔  efficient image storage Compression
Chunks
Parccon into chunks
Disk
157
Efficient Image Storage
•  Dataset: 430 Linux images –  Filesystem size: 21 TB (allocated 2 TB)
Format
Size (GB)
Frisbee Images
233
Venc 32KB
75
158
Efficient Image Storage
•  Dataset: 430 Linux images –  Filesystem size: 21 TB (allocated 2 TB)
Format
Size (GB)
Frisbee Images
233
Venc 32KB
75
3× reduccon
159
End-to-end image deployment (seconds)
Efficient Image Deployment
30
Baseline Frisbee
Venti-based Frisbee
25
20
15
10
5
0
1
8
Number of clients
16
End-­‐to-­‐End Image Deployment Performance
160
End-to-end image deployment (seconds)
Efficient Image Deployment
30
Baseline Frisbee
Venti-based Frisbee
25
Similar performance In image deployment
20
15
10
5
0
1
8
Number of clients
16
End-­‐to-­‐End Image Deployment Performance
161
More in the Paper
•  Aligned Fixed-­‐size Chunking (AFC) •  Image retrieve cme based on chunk sizes –  Smaller chunk size: be5er deduplicacon –  Larger chunk size: be5er image deployment performance •  Image deployment performance with staging nodes 162
Summary
Compress each chunk Aligned Fixed-­‐size Chunking Careful seleccon of chunk size ✔  efficient image deployment ✔  efficient image storage
Pre-­‐computacon of Frisbee chunk headers Using Deduplicacng Storage for Efficient Disk Image Deployment Xing Lin, Mike Hibler, Eric Eide, and Robert Ricci TridentCom ’15: 10th internaconal conference on testbeds and research infrastructures for the development of networks & communices 163
Outline
ü  Introduccon
ü  Migratory Compression
ü  Improve deduplicacon by separacng metadata from data
ü  Using deduplicacon for efficient disk image deployment ü  Performance predictability and efficiency for Cloud Storage Systems
ü  Conclusion
164
Introduccon
•  Cloud Compucng is becoming popular –  Amazon Web Service (AWS), Microsow Azure ... •  Virtualizacon enables efficient resource sharing –  Mulcplexing and consolidacon; economics of scale 165
Introduccon
•  Cloud Compucng is becoming popular –  Amazon Web Service (AWS), Microsow Azure ... •  Virtualizacon enables efficient resource sharing –  Mulcplexing and consolidacon; economics of scale •  Sharing introduces interference –  Random and sequencal workloads [Gulac VPACT ‘09] –  Read and write workloads [Ahmad WWC ‘03] 166
Introduccon
•  Cloud Compucng is becoming popular –  Amazon Web Service (AWS), Microsow Azure ... •  Virtualizacon enables efficient resource sharing –  Mulcplexing and consolidacon; economics of scale •  Sharing introduces interference –  Random and sequencal workloads [Gulac VPACT ‘09] –  Read and write workloads [Ahmad WWC ‘03] •  Targecng environments: a large number of mixed workloads
167
Random Workloads’ Impact on Disk Bandwidth
Workloads: -­‐ All sequencal -­‐ 1 sequencal, adding more random ones
High disk bandwidth = high disk uclizacon
168
Random Workloads’ Impact on Disk Bandwidth
160 Aggr. Bandwidth (MB/s)
140 SR Workloads: -­‐ All sequencal -­‐ 1 sequencal, adding more random ones
120 100 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Number of co-­‐located workloads
High disk bandwidth = high disk uclizacon
169
Random Workloads’ Impact on Disk Bandwidth
160 Aggr. Bandwidth (MB/s)
140 SR RR Workloads: -­‐ All sequencal -­‐ 1 sequencal, adding more random ones
120 100 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Number of co-­‐located workloads
High disk bandwidth = high disk uclizacon
170
Random Workloads’ Impact on Disk Bandwidth
160 Aggr. Bandwidth (MB/s)
140 SR RR Workloads: -­‐ All sequencal -­‐ 1 sequencal, adding more random ones
120 100 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Number of co-­‐located workloads
High disk bandwidth = high disk uclizacon
171
Random Workloads’ Impact on Disk Bandwidth
160 Aggr. Bandwidth (MB/s)
140 SR RR Workloads: -­‐ All sequencal -­‐ 1 sequencal, adding more random ones
120 100 80 60 40 20 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Number of co-­‐located workloads
High disk bandwidth = high disk uclizacon
172
Random Workloads’ Impact on Disk Bandwidth
160 SR Aggr. Bandwidth (MB/s)
140 RR Workloads: -­‐ All sequencal -­‐ 1 sequencal, adding more random ones
120 100 80 60 40 20 10 MB/s
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Number of co-­‐located workloads
High disk bandwidth = high disk uclizacon
173
Random Workloads’ Impact on Disk Bandwidth
160 SR Aggr. Bandwidth (MB/s)
140 RR Workloads: -­‐ All sequencal -­‐ 1 sequencal, adding more random ones
120 100 80 60 40 20 10 MB/s
0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 Number of co-­‐located workloads
Random workloads are harmful for disk bandwidth
High disk bandwidth = high disk uclizacon
174
Differencal I/O Scheduling (DIOS)
•  Opportunity: replicacon is commonly used in cloud storage systems –  Google Filesystem, Ceph, sheepdog, etc. •  Idea: dedicate each disk to serve one type of requests 175
Differencal I/O Scheduling (DIOS)
•  Opportunity: replicacon is commonly used in cloud storage systems –  Google Filesystem, Ceph, sheepdog, etc. •  Idea: dedicate each disk to serve one type of requests •  System implica0ons (tradeoff) –  High bandwidth for disks serving sequencal requests –  Lower performance for random workloads 176
Architecture
Disk1
Disk2
177
Architecture
Disk1
1
2
Disk2
3
File(a)
178
Architecture
Disk1
1
2
1
Disk2
3
2
1
2
3
3
File(a)
179
Architecture
Disk1
1
2
1
Disk2
3
2
File(a)
1
3
2
A
3
B
C
File(b)
180
Architecture
Disk1
Disk2
1
2
3
1
2
3
A
B
C
A
B
C
1
2
File(a)
3
A
B
C
File(b)
181
Architecture
Disk1
Disk2
1
2
3
1
2
3
A
B
C
A
B
C
Random access
1
2
File(a)
3
A
B
C
File(b)
182
Architecture
Disk1
Disk2
1
2
3
1
2
3
A
B
C
A
B
C
Random access
1
Sequencal access
2
File(a)
3
A
B
C
File(b)
183
Architecture
Primary replica
Secondary replica
Disk1
Disk2
1
2
3
1
2
3
A
B
C
A
B
C
Random access
1
Sequencal access
2
File(a)
3
A
B
C
File(b)
184
Architecture
Primary replica
Secondary replica
Disk1
Disk2
1
2
3
1
2
3
A
B
C
A
B
C
✔ High bandwidth
Random access
1
Sequencal access
2
File(a)
3
A
B
C
File(b)
185
Implementacon
•  Ceph: distributed storage system that implements replicacon -  Use randomness in seleccng replicas •  DIOS: modified Ceph with determiniscc replica seleccons and request type-­‐aware scheduling
186
Evaluacon – Sequencal Workloads
Workload: 20 sequencal, mixing with different numbers of random workloads. Total data disks: 6 187
Evaluacon – Sequencal Workloads
300 Ceph Aggr. Bandwidth (MB/s)
250 200 150 Workload: 20 sequencal, mixing with different numbers of random workloads. Total data disks: 6 100 50 0 0 RR 10 RR 20 RR 40 RR 188
Evaluacon – Sequencal Workloads
300 Ceph DIOS Aggr. Bandwidth (MB/s)
250 200 150 Workload: 20 sequencal, mixing with different numbers of random workloads. Total data disks: 6 100 50 0 0 RR 10 RR 20 RR 40 RR 189
Evaluacon – Sequencal Workloads
300 Ceph DIOS Aggr. Bandwidth (MB/s)
250 200 150 Workload: 20 sequencal, mixing with different numbers of random workloads. Total data disks: 6 100 2.5 ×
50 6 ×
0 0 RR 10 RR 20 RR 40 RR 190
Evaluacon – Sequencal Workloads
300 Ceph DIOS Aggr. Bandwidth (MB/s)
250 200 150 Workload: 20 sequencal, mixing with different numbers of random workloads. Total data disks: 6 100 2.5 ×
50 6 ×
0 0 RR 10 RR 20 RR 40 RR 191
Evaluacon – Sequencal Workloads
300 Ceph DIOS Aggr. Bandwidth (MB/s)
250 200 150 Workload: 20 sequencal, mixing with different numbers of random workloads. Total data disks: 6 100 2.5 ×
50 6 ×
DIOS: •  Consistent performance •  Higher bandwidth (disk ucl)
0 0 RR 10 RR 20 RR 40 RR 192
Evaluacon – Random Workloads
800 700 Aggr. IOPS
600 500 400 300 200 100 Ceph 0 10 RR
20 RR
40 RR
193
Evaluacon – Random Workloads
800 700 Aggr. IOPS
600 500 400 300 200 100 Ceph 0 10 RR
20 RR
DIOS 40 RR
194
Evaluacon – Random Workloads
800 700 Aggr. IOPS
600 Half of disks are serving random workloads 500 400 300 200 100 Ceph 0 10 RR
20 RR
DIOS 40 RR
195
Evaluacon – Random Workloads
800 700 Aggr. IOPS
600 Half of disks are serving random workloads 500 400 High disk uclizacon for the other half of disks 300 200 100 Ceph 0 10 RR
20 RR
DIOS 40 RR
196
Summary
•  It is more efficient and leads to more predictable performance, by dedicacng a disk to serve a single type of read requests. •  Future direccons –  Write workloads –  Extension to 3-­‐way replicacon (load balance)
Towards Fair Sharing of Block Storage in a Mulc-­‐tenant Cloud Xing Lin, Yun Mao, Feifei Li, Robert Ricci HotCloud ’12: 4th USENIX Workshop on Hot Topics in Cloud Compucng, June 2012 197
Outline
ü  Introduccon
ü  Migratory Compression
ü  Improve deduplicacon by separacng metadata from data
ü  Using deduplicacon for efficient disk image deployment ü  Performance predictability and efficiency for Cloud Storage Systems
ü  Conclusion
198
Conclusion
•  Problems: –  Limitacons in tradiconal compression and deduplicacon –  Performance interference for cloud storage •  Use similarity in content and access pa5erns –  Migratory compression: find similarity in block level and do re-­‐organizacon –  Deduplicacon: store same type of data blocks together –  Differencal IO scheduling: schedule same type of requests to same disk
199
Thesis Statement
Similarity in content and access pa5erns can be uclized to improve space efficiency, by storing similar data together and performance predictability and efficiency, by scheduling similar IO requests to the same hard drive. 200
Acknowledgements
•  Advisor, for always supporcng my work –  Robert Ricci •  CommiEee members, for providing feedback –  Rajeev Balasubramonian –  Fred Douglis –  Feifei Li –  Jacobus Van der Merwe •  Flux members –  Eric Eide, Mike Hibler, David Johnson, Gary Wong, … •  EMC/Data Domain colleagues –  Philip Shilane, Stephen Smaldone, Grant Wallace, … •  Families and friends, for kindly support 201
Acknowledgements
•  Advisor, for always supporcng my work –  Robert Ricci (HotCloud12, HotStorage15, TridentCom15)
•  CommiEee members, for providing feedback –  Rajeev Balasubramonian –  Fred Douglis –  Feifei Li –  Jacobus Van der Merwe (WDDD11)
(FAST14, HotStorage15)
(HotCloud12)
(SOCC15)
•  Flux members –  Eric Eide, Mike Hibler, David Johnson, Gary Wong, … •  EMC/Data Domain colleagues –  Philip Shilane, Stephen Smaldone, Grant Wallace, … •  Families and friends, for kindly support 202
Thank You !
203
Backup Slides
204
VF -­‐ Related Work
•  Deduplicacon analysis for virtual machine images [Jin SYSTOR ‘09] –  Benefit: 70~80% of blocks are duplicates –  Only analysis •  LiveDFS: deduplicacng filesystem for storing disk images for OpenStack [Ng middleware ‘11] –  Focus on spacal locality and metadata prefetching –  Scales linearly and only 4 VMs; no compression •  Emustore: an early a5empt for this project [Pullakandam Master Thesis 2011] –  Compression during image deployment –  Used regular fixed-­‐size chunking
205
Disk Image Deployment Pipeline
•  Compression factor is 3.18X for this image •  Available network bandwidth: 500 Mb/s (60 MB/s)
Read compressed image from server disk
Transfer compressed image across network
Decompress compressed image
Write image into disk
206
Disk Image Deployment Pipeline
•  Compression factor is 3.18X for this image •  Available network bandwidth: 500 Mb/s (60 MB/s)
Read compressed image from server disk
480 (150)
Transfer compressed image across network
172.84 (54.26)
Decompress compressed image
Write image into disk
149.10
71.07
Effeccve throughput (uncompressed data) (MB/s)
207
Disk Image Deployment Pipeline
•  Compression factor is 3.18X for this image •  Available network bandwidth: 500 Mb/s (60 MB/s)
Read compressed image from server disk
480 (150)
Transfer compressed image across network
172.84 (54.26)
Decompress compressed image
Write image into disk
149.10
71.07
Effeccve throughput (uncompressed data) (MB/s)
208
Disk Image Deployment Pipeline
•  Compression factor is 3.18X for this image •  Available network bandwidth: 500 Mb/s (60 MB/s)
Read compressed image from server disk
480 (150)
Transfer compressed image across network
172.84 (54.26)
Decompress compressed image
Write image into disk
149.10
71.07
Effeccve throughput (uncompressed data) (MB/s)
209
Disk Image Deployment Pipeline
•  Compression factor is 3.18X for this image •  Available network bandwidth: 500 Mb/s (60 MB/s)
Read compressed image from server disk
480 (150)
Transfer compressed image across network
Decompress compressed image
Write image into disk
149.10
71.07
172.84 (54.26)
Effeccve throughput (uncompressed data) (MB/s)
Compression
29.68
210
Chunk Boundary Shiw Problem
Image A
Image A’
1
2
a
b
c
3
4
5
6
7
8
…
3
4
5
6
7
8
…
211
Chunk Boundary Shiw Problem
Image A
Image A’
1
2
a
b
c
3
4
5
6
7
8
…
3
4
5
6
7
8
…
Regular fixed-­‐size (4 blocks) chunking
1
2
3
4
5
6
7
8
…
a
b
c
3
4
5
6
7
8
…
212
Chunk Boundary Shiw Problem
Image A
Image A’
1
2
a
b
c
3
4
5
6
7
8
…
3
4
5
6
7
8
…
Regular fixed-­‐size (4 blocks) chunking
1
2
3
4
5
6
7
8
…
a
b
c
3
4
5
6
7
8
…
213
Aligned Fixed-­‐size Chunking (AFC)
Image A
1
2
3
BlockID
0
1
9
Image A’
a
b
c
3
4
5
6
7
8
…
4
5
6
7
8
…
214
Aligned Fixed-­‐size Chunking (AFC)
Image A
1
2
3
BlockID
0
1
9
Image A’
a
b
c
3
4
5
6
7
8
…
4
5
6
7
8
…
215
Aligned Fixed-­‐size Chunking (AFC)
Image A
1
2
3
BlockID
0
1
9
Image A’
a
b
c
3
4
5
6
7
8
…
4
5
6
7
8
…
216
Aligned Fixed-­‐size Chunking (AFC)
Image A
1
2
3
BlockID
0
1
9
Image A’
a
b
c
3
4
5
6
7
8
…
4
5
6
7
8
…
217
Aligned Fixed-­‐size Chunking (AFC)
Image A
1
2
3
BlockID
0
1
9
Image A’
a
b
c
3
4
5
6
7
8
…
4
5
6
7
8
…
218
Aligned Fixed-­‐size Chunking (AFC)
Image A
1
2
3
BlockID
0
1
9
Image A’
a
b
c
3
4
5
6
7
8
…
4
5
6
7
8
…
219
Aligned Fixed-­‐size Chunking (AFC)
Image A
1
2
3
BlockID
0
1
9
Image A’
a
b
c
3
4
5
6
7
8
…
4
5
6
7
8
…
Aligned Fixed-­‐size (4 blocks) chunks
220
Aligned Fixed-­‐size Chunking (AFC)
Image A
1
2
3
BlockID
0
1
9
Image A’
a
b
c
3
4
5
6
7
8
…
4
5
6
7
8
…
Aligned Fixed-­‐size (4 blocks) chunks
Image A
1
2
3
4
5
6
7
8
…
221
Aligned Fixed-­‐size Chunking (AFC)
Image A
1
2
3
BlockID
0
1
9
Image A’
a
b
c
3
4
5
6
7
8
…
4
5
6
7
8
…
Aligned Fixed-­‐size (4 blocks) chunks
Image A
A’
1
a
b
c
2
3
4
5
6
7
8
…
3
4
5
6
7
8
…
222
Aligned Fixed-­‐size Chunking (AFC)
Image A
1
2
3
BlockID
0
1
9
Image A’
a
b
c
3
4
5
6
7
8
…
4
5
6
7
8
…
Aligned Fixed-­‐size (4 blocks) chunks
Image A
A’
1
a
b
c
2
3
4
5
6
7
8
…
3
4
5
6
7
8
…
223
AFC -­‐ Evaluacon
4000
Variable-disk
Variable-stream
Fixed-disk
Fixed-stream
AFC
Chunking time (s)
3500
3000
2500
2000
1500
1000
500
0
5
5.5
6
6.5 7 7.5 8 8.5
Deduplication ratio (X)
9
9.5
10
224
Download