Phase Change Memory Aware Data Management and Application Jiangtao Wang Outline • Introduction • Integrating PCM into the Memory Hierarchy − PCM for main memory − PCM for auxiliary memory • Conclusion Phase change memory •An emerging memory technology •Memory(DRAM) −Read/write speeds and Byte-addressable −Lower Idle power •Storage(SSD & HDD) −Non-volatile −high capacity (high density) Phase change memory DRAM PCM NAND Flash Page size 64B 64B 2KB Page read latency 20-50ns ~50ns ~25us Page write latency 20-50ns ~1us ~500us Endurance ∞ 106-108 104-105 Idle power ~100mW/GB ~1mW/GB 1-10mW/GB Density 1x 2-4x 4x •Cons: −Asymmetry read/write latency −Limited write endurance Phase change memory Read operation 100us 1ms HDD 10us FLASH Write operation 1us PCM DRAM 100ns HDD FLASH PCM DRAM 10ns 10ms Outline • Introduction • Integrating PCM into the Memory Hierarchy − PCM for main memory − PCM for auxiliary memory • Conclusion Integrating PCM into the Memory Hierarchy • PCM for main memory – Replacing DRAM with PCM to achieve larger main memory capacity • PCM for auxiliary memory – PCM as a write buffer for HDD/SSD DISK Buffering dirty page to minimize the disk write I/Os – PCM as secondary storage Storing log records PCM for main memory CPU CPU CPU L1/L2 Cache L1/L2 Cache L1/L2 Cache Memory Controller Memory Controller DRAM Cache [ISCA’09] [ICCD’11] [DAC’09] [CIDR’11] Memory Controller DRAM Write buffer Phase Change Memory Phase Change Memory Phase Change Memory HDD/SSD Disk HDD/SSD Disk HDD/SSD Disk (a)PCM-only memory (b)DRAM as a cache memory (c)DRAM as a write buffer PCM for main memory Challenges with PCM • Major disadvantage – Writes Compared to read operation ,PCM writes incur higher energy consumption、 higher latency and limited endurance Read latency 20~50ns Write latency ~1us Read energy 1 J/GB Write energy 6 J/GB Endurance 106~108 Reducing PCM writes is an important goal of data management on PCM ! PCM for main memory Optimization on PCM write [ISCAS’07] [ISCA’09] [MICRO’09] • Optimization: data comparison write • Goal: write only modified bits rather than entire cache line • Approach: read-compare-write CPU cache 0 1 0 1 1 0 0 1 0 1 0 1 1 0 1 0 1 1 1 0 0 1 0 1 1 0 1 1 0 1 1 0 1 1 1 0 read PCM 0 1 0 1 1 0 0 1 0 1 0 1 1 0 1 0 1 1 1 0 PCM for main memory PCM-friendly algorithms Rethinking Database Algorithms for Phase Change Memory(CIDR2011) • Motivation Choosing PCM-friendly database algorithms and data structures to reduce the number of writes PCM for main memory PCM-friendly DB algorithms • Prior design goals for DRAM − Low computation complexity − Good CPU cache performance − Power efficiency (more recently) • New goal for PCM − minimizing PCM writes − Low wear , energy and latency − Finer-grained access granularity:bits,words,cache line • Two core database techniques − B+-Tree Index − Hash Joins PCM-friendly DB algorithms B+-Tree Index • B+ -Tree – Records at leaf nodes – High fan out – Suitable for file systems • For PCM – Insertion/deletion incur a lot of write operations num 5 keys Insert 3 2 4 7 8 9 num 6 keys 2 3 4 7 8 9 incurs 11 writes pointers – K keys and K pointers in a node: 2(K/2)+1=K+1 pointers PCM-friendly DB algorithms B+-Tree Index • PCM-friendly B+-Tree – Unsorted: all the non-leaf and leaf nodes unsorted – Unsorted leaf: sorted non-leaf and unsorted leaf – Unsorted leaf with bitmap :sorted non-leaf and unsorted leaf with bitmaps num 5 keys num 8 2 9 4 7 1011 1010 pointers Unsorted leaf node keys 8 2 9 4 7 pointers Unsorted leaf node with bitmap PCM-friendly DB algorithms B+-Tree Index • Unsorted leaf – Insert/delete incurs 3 writes num 5 num keys Delete 2 8 2 9 4 7 4 keys 8 7 9 4 pointers pointers • Unsorted leaf with bitmap – Insert incurs 3 writes; delete incurs 1 write num 1011 1010 num keys 8 2 9 4 pointers 7 Delete 2 1001 1010 keys 8 2 9 4 pointers 7 Experimental evaluation B+-Tree Index • Simulation Platform – – – – Cycle-accurate X86-64 simulator: PTLSim Extended the simulator with PCM support Modeled data comparison write CPU cache(8MB), B+-Tree (50 million entrys,75% full,1GB) • Three workloads: – Inserting 500K random keys – Deleting 500K random keys – Searching 500K random keys Experimental evaluation B+-Tree Index • Node size 8 cache lines; 50 million entries, 75% full; Energy Total wear 3E+8 Execution time 16 5E+9 2E+8 1E+8 4E+9 12 10 cycles energy (mJ) num bits modified 14 8 6 4 insert delete search 0 2E+9 1E+9 2 0E+0 3E+9 insert delete search 0E+0 insert Unsorted schemes achieve the best performance • For insert intensive workload: unsorted-leaf • For insert & delete intensive workload : unsorted-leaf with bitmap delete search PCM-friendly DB algorithms Hash Joins • Simple Hash Join algorithms Two representative – Simple Hash Join – Cache Partitioning Hash Table ######## R Build Phase ######## Probe Relation ######## ######## •Problem – too many cache misses – Build and probe hash table(exceeds CPU cache size) – Small record size S PCM-friendly DB algorithms Hash Joins • Cache Partitioning Join Phase R Partition Phase •Problems :Too many writes! R1 S1 R2 S2 R3 S3 R4 S4 Partition Phase S PCM-friendly DB algorithms Hash Joins • Virtual Partitioning(PCM-friendly DB algorithms) Partition phase S’1 R’1 R Virtual partitioning R’2 S’1 R’3 S’1 R’4 S’1 Virtual Partitioning Store record ID S PCM-friendly DB algorithms Hash Joins • Virtual Partitioning (PCM-friendly DB algorithms) Join phase Hash table ######## R’1 Build ######## Probe S’1 ######## ######## R S • Good CPU cache performance • Reducing writes Experimental evaluation Hash Join • Relations R and S are in main memory(PCM) • R(50MB) joins S(100MB) (2 matches per R record) • Varying record size from 20B to 100B Total wear Execution time PCM energy 1E+10 8E+9 30 cycles energy (mJ) 40 20 10 0 6E+9 4E+9 2E+9 20B 40B 60B 80B 100B record size 0E+0 20B 40B 60B 80B 100B record size PCM for auxiliary memory CPU L1/L2 Cache CPU L1/L2 Cache Memory Controller DRAM DRAM PCM write buffer SSD/HDD PCM HDD/SSD Disk PCM as a write buffer for HDD/SSD DISK PCM as secondary storage [DAC’09] [CIKM’11] [TCDE’10] [VLDB’11] PCM for auxiliary memory • PCM as a write buffer for HDD/SSD DISK PCMLogging: Reducing Transaction Logging Overhead with PCM(CIKM2011) • PCM as secondary storage – Accelerating In-Page Logging with Non-Volatile Memory(TCDE2010) – IPL-P: In-Page Logging with PCRAM (VLDB2011 demo) PCM for auxiliary memory • PCM as a write buffer for HDD/SSD DISK PCMLogging: Reducing Transaction Logging Overhead with PCM(CIKM2011) • Motivation Buffering dirty page and transaction logging to minimize disk I/Os PCM forPCMBasic auxiliary memory • •PCMBasic Two schemes – PCMBasic – PCMLogging •Cons: −Data redundancy −Space management on PCM DRAM Dirty pages Write log PCM Buffer pool DISK Log pool PCMLogging • PCMLogging – Eliminate explicit logs (REDO and UNDO log) – Integrate implicit logs into buffered updated(shadow pages) DRAM P MetaData P1 P1 P1 P2 P2 … DISK MetaData PCM PCMLogging • Overview – DRAM • Mapping Table(MT):map logial page to physical page – PCM • Page format • FreePageBitmap • ActiveTxList Page Content XXXXXXXXXX MetaData XID PID PCMLogging •Overview PCMLogging • PCMLogging Operation Two additional data structures in the main memory to support undo memory • Transaction Table(TT) Record all in-progress transaction and their corresponding dirty pages in DRAM and PCM • Dirty Page Table(DPT) Keep track of the previous version for each PCM “overwritten” by an in-progress transaction PCMLogging • Flushing Dirty pages to PCM – Add XID to ActiveTxList before writing dirty page to PCM – If page P exists in the PCM, do not overwrite and create an out-ofplace P’ T3 update P5 PCMLogging •• Abort Commit –– discard flush allitsitsdirty dirtypages pagesand restore previous data –– Modify Modifymetadata: metadata: PCMLogging • Tuple-based Buffering – In the PCM • the buffer slots be managed in the unit of tuples, • To manage the free space, employ a slotted directory instead of a bitmap – In the DRAM • Mapping Table, we still keep track of dirty pages, but maintain the mappings for the buffered tuples in each dirty page – Merge tuples with the corresponding page of the disk • read/write request • move committed tuples from PCM to the external disk Experimental evaluation • • • • Simulator based on DiskSim TPC-C benchmark DRAM 64MB Tuple-based (PL=PCMLogging) PCM for auxiliary memory • PCM as secondary storage – Accelerating In-Page Logging with Non-Volatile Memory(TCDE2010) – IPL-P: In-Page Logging with PCRAM (VLDB2011 demo) • Motivation IPL scheme with PCRAM can improve the performance of flash memory database systems by storing frequent log records in PCRAM Design of Flash-Based DBMS: An In-Page Logging Approach(SIGMOD2007) In-Page Logging • Introduction – Updating a single record may result in invalidating the current page – Sequential logging approaches incur expensive merge operation – Co-locate a data page and its log records in the same physical block Design of Flash-Based DBMS: An In-Page Logging Approach(SIGMOD2007) In-Page Logging update Database Buffer in-memory data page (8K) log in-memory log sector (512B) Flash Memory … 15 data pages Physical block(128K) log … log Log region(8K) 16 sectors(512B) In-Page Logging update Database Buffer in-memory data page (8K) in-memory log sector (512B) log Flash Memory … + log merge … … log log … log In-Page Logging • Cons – Units of write log is a sector(512B) – Only SLC-type NAND flash supports partial programming – The amount of log records for a page is usually small In-Page Logging • Pros – – – – log records can be flushed in a finer granularity the low latency of flushing log records PCRAM is faster than flash memory for small reads SLC or MLC flash memory can be used for IPL policy. Experimental evaluation Accelerating In-Page Logging with Non-Volatile Memory(TCDE2010) • • • • A trace-driven simulation Implement an IPL module to the B+-tree based Berkeley DB Million key-value records insert/search Log sector in memory(128B/512B) Experimental evaluation IPL-P: In-Page Logging with PCRAM (VLDB2011 demo) • Hardware platform – PCRAM(512M,the granularity:128B) – Intel X25-M SSD (USB interface) • Workload – Million key-value records insert/search/update – B+-tree based Berkeley DB – Page size :8KB Outline • Introduction • Integrating PCM into the Memory Hierarchy − PCM for main memory − PCM for auxiliary memory • Conclusion Conclusion • PCM is expected to play an important role in the memory hierarchy • It is important to consider read/write asymmetry of PCM when design PCM-friendly algorithms • Integrating PCM into Hybrid memory might be more practical • If we use PCM as main memory,we had to revise some system application(e.g. Main Memory Database Systems )to address PCM-specific challenges. Thank You!