Hash in a Flash: Hash Tables for Solid State Devices Tyler Clemons* Charu Aggarwal† S M Faisal* Shirish Tatikonda ‡ Srinivasan Parthasarathy* *The Ohio State University. Columbus, Ohio ‡IBM Almaden Research Center. San Jose, California †IBM T.J. Watson Center. Yorktown Heights, New York Motivation and Introduction 2 Data is growing at a fast pace Scientific data, Twitter, Facebook, Wikipedia, WWW Traditional Data Mining and IR algorithms require random out-of-core data access Often Data is too large to fit in memory thus frequent random disk access is expected 11/30/2007 Motivation and Introduction (2) 3 Traditional Hard Disk Drives can keep pace with storage requirements but NOT random access workloads Moving parts are physical limitations Also contribute to rising energy consumption Flash Devices have emerged as an alternative Lack moving parts Faster Random Access Lower energy usage But they have several drawbacks…. 11/30/2007 Flash Devices 4 Limited Lifetime Supports Also limited number of rewrites known as erasures or cleans. Impacts response time These are incurred at the block level. Blocks consist of pages. Pages (4kb-8kb) are the smallest I/O unit Poor Random Write Performance Incurs many erasures and lowers lifetime Efficient Lowers sequential write performance erasures and increases lifetime 11/30/2007 On Flash Devices, DM, and IR 5 Flash Devices provide fast random read access Common for many IR and DM algorithms and data structures Hash Tables are common in both DM and IR Useful for associating keys and values Counting Hash Tables associate keys with a frequency This is found in many algorithms that track word frequency We will examine one such algorithm common in both DM and IR (TF-IDF) They exhibit random access for writes and reads Random Writes are an issue for Flash Devices 11/30/2007 Hash Tables for Flash Devices must: 6 Reduce erasures/cleans and Reduce random writes to SSD Batch updates Maintain reasonable query times Data Structure must not incur unreasonable disk overhead Nor should it require unreasonable memory restraints 11/30/2007 Our approach 7 Our approach makes two key contributions: Optimize our designs for a counting hash table. This has not been done by the previous approaches (A. Anand ’10), (D. Andersen ’09) , (B. Debnath, ’10) , (D. Zelinalipour-Yatzi ’05) The Primary Hash Table resides on the Flash Device. Many designs use the SSD as a cache to the HDD (D. Andersen ’09) (B. Debnath, ’10) Anticipate data sets with high random access and throughout requirements 11/30/2007 Hash Tables for Flash Devices must: 8 Reduce erasures/cleans and Reduce random writes to SSD Batch updates Create In Memory Structure Target semi-random updates or block level updates Maintain reasonable query times Data Structure must not incur unreasonable disk overhead Carefully Nor index keys on disk should it require unreasonable memory restraints Memory requirement is at most fixed parameter Memory Bounded(MB) Buffering 9 (64,2) (12,7) Updates are quickly combined in memory Updates are Hashed into a bucket in the RAM When full, batch updates to corresponding Disk Buckets If Disk Buckets are full, invoke overflow region Memory Bounded(MB) Buffering 10 Two way Hash On-Disk Closed Hash Table Hash at page level Update via block level Linear Probing for collisions In memory Open Hash table Hash at block level Combine updates Flush with merge() operation Overflow segment Closed Hash table excess 11/30/2007 Can we improve MB? 11 Reduces number of write operations to flash device Query times are reasonable Batch Updates only when memory buffer is full Updates are semi-random (Key,Value) changes are maintained in memory Memory buffer search is fast Relatively fast SSD random access and linear probing (See Paper) Prefetch pages MB has disadvantages Sequential Page Level operations are preferred Fewer block updates Limited by the amount of available memory Think large disk datasets. Updates may be numerous 11/30/2007 Introduce an On Disk Buffer 12 Batch updates from memory to disk are page level Reduce expensive block level writes (time and cleans) Increase Sequential writes Increase buffering capability Reduce expensive non semi-random Block Updates May decrease cleans Search space increases during queries Incurred only if inserting and reading concurrently However, less erasure time will decrease latency 11/30/2007 On Disk Buffering 13 Change Segment (CS) stage() operation Flushes memory to CS Fast Page Level Operations merge() operation Sequential Log Structure sequential writes Invoked when CS is full Combines CS with Data Segment Less frequent than stage() What is the structure of the CS? 11/30/2007 Change Segment Structure v1 14 Buckets are assigned specific Change Segment Buckets. Change Segment Buckets are shared by multiple RAM buffer buckets. Memory Disk Bounded Buffer (MDB) 15 Associate a CS block to k data blocks Semi random writes Only merge() full CS blocks Frequently updated blocks may incur numerous (k-1) merge() operations Query times incur an additional block read Packed with unwanted data 11/30/2007 Change Segment Structure v2 16 As buckets are flushed, they are written sequentially to the change segment one page at a time MDB-L 17 No Partitions in CS Allows frequently updated blocks to have maximum space merge() all blocks when CS is full Potentially expensive Very infrequent Queries are supported by pointers As blocks are staged onto the CS, their pages are recorded for later retrieval Prefetch 11/30/2007 Expectations 18 MB will incur more cleans than MDB or MDBL Frequent MDB and MDBL will incur slightly higher query times Addition merge() operation will incur block erasure of CS MDB and MDBL will have superior I/O performance Most operations are page level Less erasures lower latency 11/30/2007 Experimental Setup (Application) 19 TF-IDF Term Frequency-Inverse Document Frequency Word importance is highest for infrequent words Requires a counting hash table Useful in many data mining and IR applications (document classification and search) 11/30/2007 Experimental Setup (DataSets) 20 100,000 Random Wikipedia articles 136M keywords 9.7M entries MemeTracker (Aug 2009 dump) 402M total entries 17M unique 11/30/2007 Experimental Setup (Method) 21 1M random queries were issued during insertion phase 10 random workloads, queries need not be in the table Measure Query Performance, I/O time, and Cleans Used three SSD configurations One Single Level Cell (SLC) vs two Multi Level Cell (MLC) configurations MLC is more popular. Cheaper per GB but less lifetime SLC have lower internal error rate, and faster response rates (See Paper for specific configurations) DiskSim and Microsoft SSD Plugin Used for benchmarking and fine-tuning our SSD Results (AVERAGE Query Time) 22 By varying the on memory buffer, as a percentage of the data segment, the average query time only reduces by fractions of a second. This suggest the majority of the query time is incurred by the disk. 11/30/2007 Results (AVERAGE Query Time) 23 By varying the on disk buffer, as a percentage of the data segment, the average query time decreases substantiall for MDBL This reduction is seen in both datasets. 11/30/2007 MDB requires block reads in the CS. Results (AVERAGE Query Time) 24 Using the Wiki dataset, we compared SLC with MLC We experience consistent performance 11/30/2007 Results(AVERAGE I/O) 25 In this experiment, we set the in memory buffer to 5% and the CS to 12.5% of the primary hash table size Simulation time is highest for MB because of the block erasures (next slide). MDBL is faster than MDB because of the increased page level operations Results(Cleans/Erasures) 26 Cleans are extremely low for both MDB and MDBL relative to MB This is caused by the page level sequential operations Queries are effected by cleans because the SSD must allocate resources to cleaning moving 11/30/2007 Discussion and Conclusion 27 Flash Devices are gaining popularity Low Latency, High Random Read Performance, Low Energy Limited lifetime, poor random write performance Hash tables are useful data structures in many data mining and IR algorithms They exhibit random write patterns Challenging for Flash Devices We have demonstrated that a proper Hash table for Flash Devices will have In-memory buffer for batch memorydisk updates On disk data buffer with page level operations 11/30/2007 Future work 28 Our current designs rely on hash functions that use the mod operator Extendible Hashing Checkpoint methods for crash recovery Examine on Real SSD Disksim is great for finetuning and examining statistics 11/30/2007 Questions?