Document

advertisement
1
MAINTAINING LARGE AND FAST
STREAMING INDEXES ON FLASH
Aditya Akella, UW-Madison
First GENI Measurement Workshop
Joint work with Ashok Anand, Steven Kappes (UW-Madison)
and Suman Nath (MSR)
Memory & storage technologies
2

Question:
What is the role of emerging memory/storage
technologies in supporting current and future
measurements and applications?

This talk:
Role of flash memory in supporting
applications/measurements that need large streaming
indexes; Improving current apps and enabling future apps
Streaming stores and indexes
3

Motivating apps/scenarios
 Caching, content-based networks (DOT), WAN optimization, deduplication
 Large-scale & fine-grained measurements



Index features
 Streaming: Data stored in a streaming fashion, maintain online index for
fast access



E.g., IP Mon: compute per packet queuing delays
Fast correlations across large collections of netflow records
Expire old data, update index constantly
Large size: Data store ~ several TB, index ~ 100s of GB
Need for speed (fast reads and writes)

Impacts usefulness of caching applications, timeliness of fine-grained TE
Index workload
4


Key aspects

Index lookups and writes are random

Equal mix of reads/writes

New data replaces some old data  fast, constant expiry
Index data structures

Tree-like (B-tree) and log structures not suitable

Slow lookup (e.g. log(n) complexity in trees)

Poor support for flexible, fast garbage collection

Hash tables ideal…

… But current options for large streaming hash tables not optimal
Current options for >100GB Hashtables
5



DRAM: large DRAMs expensive and can get very hot
Disk: inexpensive, but too slow
Flash provides a good balance between cost,
performance, power efficiency…
Bigger and more energy efficient than DRAM
 Comparable to disk in price
 >2 orders of magnitude faster than disk, if used carefully


But… need appropriate data structures to maximize
flash effectiveness and overcome inefficiencies
Flash properties
6

Flash chips: Layout – large number of
blocks (128KB), each block has
multiple pages (2KB)





Read/write granularity: page,
erase granularity: block
Read page: 50us, write page: 400us,
block erase: 1ms
Cheap: Any read including random,
sequential write
Expensive: random writes/overwrites,
sub-block deletion


Requires movement of valid pages
from block to erase
SSDs: disk like interface for flash


Sequential/random read,
sequential write: 80us
Random write: 8ms
Flash good for hashtable
lookups
Insertions are hard
 small random overwrites
Expiration is hard
 small random deletes
BufferHash data structure
7

Batch expensive operations – random writes and
deletes – on flash
 Maintain
a hierarchy of small hashtables
 Maintain
 Efficient
upper levels in DRAM
insertion
 Accumulate
random updates in memory
 Flush accumulated updates to lower level in flash (at the
granularity of a flash page)
 Efficient
 Delete
deletion
in batch (at flash block granularity)
 Amortizes deletion cost
Handling small random updates
8
1. Buffer small random updates in DRAM as small hashtables (buffers)
2. When a HT is full, write it to flash, without modifying existing data
• Each “super table” is a collection of small hashtables  different
“incarnations” over time of the same buffer
• How to search them? Use (bit-sliced) bloom filters
Hash key K bits
N bits
HT Index
HT key
2^k Buffers
Each table uses N-bits key
DRAM
Bit-sliced Bloom filter
Flash
…
Super Table
Lookup
9




Let key = <k1,k2>
Check the k1’th hashtable in memory for the key k2
If not found, use the bloom filters to decide which
hashtable h of the k1’th supertable may contain the
key k2
Read and check the hashtable (e.g., in h’th page of
k1’th block of flash)
Expiry of hash entries
10

A supertable is a collection of hashtables
 Expire

the oldest hashtable from a supertable
Option 1: use a flash block as a circular queue
 Supertable
= flash block, hashtable = flash page
 Delete oldest hashtable incarnation (page) and replace
it with a new one
 If a flash block has p pages, supertable has p latest
hashtables
 Problem: a page can not be independently deleted
without deleting the block (requires copying other
pages)
Handle expiry of hash entries
11


Interleave pages from different supertables when writing
to flash or SSD
Instead of
1
2
3
4
1
2
3
4
1
2
3
4
1
2
3
4
2
2
2
2
3
3
3
3
4
4
4
4
• Do this …
1
1
1
1
• Advantage: batch deletion of multiple oldest incarnations
• Other flexible expiration policies can also be supported
Insertion
12



Key = <k1,k2>
Insert into k1’th hashtable in-memory, using k2 as the
key
If the hashtable is full
 Expire
the tail hashtable in k1’th supertable
 This expires the oldest incarnation from all supertables
 Copy k1’th hashtable from memory to the head of k1’th
supertable
Benchmarks
13


Prototyped BufferHash on 2 SSDs and hard drive
99th percentile read and write latencies under 0.1ms



Two orders of magnitude better than disks, at roughly similar cost
Built a WAN accelerator that is 3X better than current designs
Theoretical results on tuning BufferHash parameters


Low bloom filter false positives, low lookup cost, low deletion cost on average
Optimal buffer size
Conclusion
14




Many emerging apps and important measurement
problems need fast streaming indexes with constant
read/write/eviction
Flash provides a good hardware platform to maintain
such indexes
BufferHash helps maximize flash effectiveness and
overcome efficiencies
Open issues:
Role of flash in other measurement problems/architectures?
 Role of other emerging memory/storage technologies (e.g.
PCM)?
 How to leverage persistence?

Download