Document

1 MAINTAINING LARGE AND FAST STREAMING INDEXES ON FLASH Aditya Akella, UW-Madison First GENI Measurement Workshop Joint work with Ashok Anand, Steven Kappes (UW-Madison) and Suman Nath (MSR) Memory & storage technologies 2  Question: What is the role of emerging memory/storage technologies in supporting current and future measurements and applications?  This talk: Role of flash memory in supporting applications/measurements that need large streaming indexes; Improving current apps and enabling future apps Streaming stores and indexes 3  Motivating apps/scenarios  Caching, content-based networks (DOT), WAN optimization, deduplication  Large-scale & fine-grained measurements    Index features  Streaming: Data stored in a streaming fashion, maintain online index for fast access    E.g., IP Mon: compute per packet queuing delays Fast correlations across large collections of netflow records Expire old data, update index constantly Large size: Data store ~ several TB, index ~ 100s of GB Need for speed (fast reads and writes)  Impacts usefulness of caching applications, timeliness of fine-grained TE Index workload 4   Key aspects  Index lookups and writes are random  Equal mix of reads/writes  New data replaces some old data  fast, constant expiry Index data structures  Tree-like (B-tree) and log structures not suitable  Slow lookup (e.g. log(n) complexity in trees)  Poor support for flexible, fast garbage collection  Hash tables ideal…  … But current options for large streaming hash tables not optimal Current options for >100GB Hashtables 5    DRAM: large DRAMs expensive and can get very hot Disk: inexpensive, but too slow Flash provides a good balance between cost, performance, power efficiency… Bigger and more energy efficient than DRAM  Comparable to disk in price  >2 orders of magnitude faster than disk, if used carefully   But… need appropriate data structures to maximize flash effectiveness and overcome inefficiencies Flash properties 6  Flash chips: Layout – large number of blocks (128KB), each block has multiple pages (2KB)      Read/write granularity: page, erase granularity: block Read page: 50us, write page: 400us, block erase: 1ms Cheap: Any read including random, sequential write Expensive: random writes/overwrites, sub-block deletion   Requires movement of valid pages from block to erase SSDs: disk like interface for flash   Sequential/random read, sequential write: 80us Random write: 8ms Flash good for hashtable lookups Insertions are hard  small random overwrites Expiration is hard  small random deletes BufferHash data structure 7  Batch expensive operations – random writes and deletes – on flash  Maintain a hierarchy of small hashtables  Maintain  Efficient upper levels in DRAM insertion  Accumulate random updates in memory  Flush accumulated updates to lower level in flash (at the granularity of a flash page)  Efficient  Delete deletion in batch (at flash block granularity)  Amortizes deletion cost Handling small random updates 8 1. Buffer small random updates in DRAM as small hashtables (buffers) 2. When a HT is full, write it to flash, without modifying existing data • Each “super table” is a collection of small hashtables  different “incarnations” over time of the same buffer • How to search them? Use (bit-sliced) bloom filters Hash key K bits N bits HT Index HT key 2^k Buffers Each table uses N-bits key DRAM Bit-sliced Bloom filter Flash … Super Table Lookup 9     Let key = <k1,k2> Check the k1’th hashtable in memory for the key k2 If not found, use the bloom filters to decide which hashtable h of the k1’th supertable may contain the key k2 Read and check the hashtable (e.g., in h’th page of k1’th block of flash) Expiry of hash entries 10  A supertable is a collection of hashtables  Expire  the oldest hashtable from a supertable Option 1: use a flash block as a circular queue  Supertable = flash block, hashtable = flash page  Delete oldest hashtable incarnation (page) and replace it with a new one  If a flash block has p pages, supertable has p latest hashtables  Problem: a page can not be independently deleted without deleting the block (requires copying other pages) Handle expiry of hash entries 11   Interleave pages from different supertables when writing to flash or SSD Instead of 1 2 3 4 1 2 3 4 1 2 3 4 1 2 3 4 2 2 2 2 3 3 3 3 4 4 4 4 • Do this … 1 1 1 1 • Advantage: batch deletion of multiple oldest incarnations • Other flexible expiration policies can also be supported Insertion 12    Key = <k1,k2> Insert into k1’th hashtable in-memory, using k2 as the key If the hashtable is full  Expire the tail hashtable in k1’th supertable  This expires the oldest incarnation from all supertables  Copy k1’th hashtable from memory to the head of k1’th supertable Benchmarks 13   Prototyped BufferHash on 2 SSDs and hard drive 99th percentile read and write latencies under 0.1ms    Two orders of magnitude better than disks, at roughly similar cost Built a WAN accelerator that is 3X better than current designs Theoretical results on tuning BufferHash parameters   Low bloom filter false positives, low lookup cost, low deletion cost on average Optimal buffer size Conclusion 14     Many emerging apps and important measurement problems need fast streaming indexes with constant read/write/eviction Flash provides a good hardware platform to maintain such indexes BufferHash helps maximize flash effectiveness and overcome efficiencies Open issues: Role of flash in other measurement problems/architectures?  Role of other emerging memory/storage technologies (e.g. PCM)?  How to leverage persistence? 

Document

Related documents

Products

Support

Document

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib