SILT: A Memory-Efficient, High-Performance Key-Value

advertisement
SILT: A Memory-Efficient,
High-Performance Key-Value Store
Hyeontaek Lim, Bin Fan, David G. Andersen
Michael Kaminsky†
Carnegie Mellon University
†Intel Labs
2011-10-24
Key-Value Store
Key-Value Store
Cluster
Clients
PUT(key, value)
value = GET(key)
DELETE(key)
•
•
•
•
E-commerce (Amazon)
Web server acceleration (Memcached)
Data deduplication indexes
Photo storage (Facebook)
2
• Many projects have examined flash memorybased key-value stores
– Faster than disk, cheaper than DRAM
• This talk will introduce SILT,
which uses drastically less memory than
previous systems
while retaining high performance.
3
Flash Must be Used Carefully
Random reads / sec
48,000
Fast, but not THAT fast
$ / GB
1.83
Space is precious
Another long-standing problem:
random writes are slow and bad for flash life (wearout)
4
DRAM Must be Used Efficiently
DRAM used for index (locate) items on flash
1 TB of data to store on flash
4 bytes of DRAM for key-value pair (previous state-of-the-art)
32 B: Data deduplication
=> 125 GB!
1000
168 B: Tweet
=> 24 GB
100
Index size
(GB)
10
1 KB: Small image
=> 4 GB
1
10
100
1000
Key-value pair size (bytes)
10000
5
Three Metrics to Minimize
Memory overhead = Index size per entry
• Ideally 0 (no memory overhead)
Read amplification = Flash reads per query
• Limits query throughput
• Ideally 1 (no wasted flash reads)
Write amplification = Flash writes per entry
• Limits insert throughput
• Also reduces flash life expectancy
• Must be small enough for flash to last a few years
6
Landscape: Where We Were
Read amplification
6
SkimpyStash
HashCache
4
BufferHash
2
FlashStore
FAWN-DS
?
0
0
2
4
6
8
Memory overhead
10
12
(bytes/entry)
7
Seesaw Game?
SkimpyStash
Memory efficiency
How can we
improve?
FAWN-DS
FlashStore
HashCache
BufferHash
High performance
8
Solution Preview: (1) Three Stores
with (2) New Index Data Structures
Queries look up stores in sequence (from new to old)
Inserts only go to Log
Data are moved in background
SILT Sorted Index
(Memory efficient)
SILT Filter
SILT Log Index
(Write friendly)
Memory
Flash
9
LogStore: No Control over Data Layout
Naive Hashtable (48+ B/entry)
SILT Log Index (6.5+ B/entry)
Still need pointers:
size ≥ log N bits/entry
Memory
Flash
Inserted entries
are appended
(Older)
On-flash log
Memory overhead
6.5+ bytes/entry
(Newer)
Write amplification
1
10
SortedStore: Space-Optimized Layout
SILT Sorted Index (0.4 B/entry)
Memory
Flash
Need to perform bulkinsert to amortize cost
On-flash sorted array
Memory overhead
0.4 bytes/entry
Write amplification
High
11
Combining SortedStore and LogStore
<SortedStore>
<LogStore>
SILT Sorted Index
SILT Log Index
Merge
On-flash sorted array
On-flash log
12
Achieving both Low Memory Overhead
and Low Write Amplification
SortedStore
• Low memory overhead
• High write amplification
LogStore
• High memory overhead
• Low write amplification
SortedStore
LogStore
Now we can achieve simultaneously:
Write amplification = 5.4 = 3 year flash life
Memory overhead = 1.3 B/entry
With “HashStores”, memory overhead = 0.7 B/entry!
(see paper)
13
SILT’s Design (Recap)
<SortedStore>
<HashStore>
<LogStore>
SILT Sorted Index
SILT Filter
SILT Log Index
Merge
On-flash sorted array
Memory overhead
0.7 bytes/entry
Conversion
On-flash hashtables
Read amplification
1.01
On-flash log
Write amplification
5.4
14
Review on New Index Data
Structures in SILT
SILT Sorted Index
SILT Filter & Log Index
Entropy-coded tries
Partial-key cuckoo hashing
For SortedStore
Highly compressed (0.4 B/entry)
For HashStore & LogStore
Compact (2.2 & 6.5 B/entry)
Very fast (> 1.8 M lookups/sec)
15
Compression in Entropy-Coded Tries
0
0
0
1
0
1
1
1
1
0
0
0
1
1
Hashed keys (bits are random)
 # red (or blue) leaves ~ Binomial(# all leaves, 0.5)
 Entropy coding (Huffman coding and more)
(More details of the new indexing schemes in paper)
16
Landscape: Where We Are
Read amplification
6
SkimpyStash
HashCache
4
BufferHash
2
FlashStore
FAWN-DS
SILT
0
0
2
4
6
8
Memory overhead
10
12
(bytes/entry)
17
Evaluation
1. Various combinations of indexing schemes
2. Background operations (merge/conversion)
3. Query latency
Experiment Setup
CPU
Flash drive
Workload size
Query pattern
2.80 GHz (4 cores)
SATA 256 GB
(48 K random 1024-byte reads/sec)
20-byte key, 1000-byte value, ≥ 50 M keys
Uniformly distributed (worst for SILT)
18
LogStore Alone: Too Much Memory
Workload: 90% GET (50-100 M keys) + 10% PUT (50 M keys)
19
LogStore+SortedStore: Still Much Memory
Workload: 90% GET (50-100 M keys) + 10% PUT (50 M keys)
20
Full SILT: Very Memory Efficient
Workload: 90% GET (50-100 M keys) + 10% PUT (50 M keys)
21
Small Impact from Background Operations
Workload: 90% GET (100~ M keys) + 10% PUT
40 K
Oops! bursty
TRIM by ext4 FS
33 K
22
Low Query Latency
Best tput @ 16 threads
Workload: 100% GET (100 M keys)
Median = 330 μs
99.9 = 1510 μs
# of I/O threads
23
Conclusion
• SILT provides both memory-efficient and
high-performance key-value store
– Multi-store approach
– Entropy-coded tries
– Partial-key cuckoo hashing
• Full source code is available
– https://github.com/silt/silt
24
Thanks!
25
Download