Dynamic Page Migration Algorithm for DRAM/PCM Hybrid Memory System Rachata Ausavarungnirun, Tao Yang, Thomas Tzou {rausavar, taoy, ttzou}@ece.cmu.edu 1. Motivation As technologies continue to scale, power consumption due to main memory is becoming increasingly significant. Recent studies show that due to subthreshold leakage, DRAM scaling under 40nm may not be sustainable [6]. Some proposed techniques using non-volatile memory as alternative to DRAM. Through these studies, Phase Changed Memory (PCM) is seen as one of the most promising candidates due to its low read power consumption. However, the disadvantage of using PCM is that, compared to DRAM, it shows a longer write latency and higher power consumption. In addition, it suffers from the write reliability as the PCM will wear out after 109-1012 memory writes [3]. To address these problems, some projects propose a DRAM/PCM hybrid memory alternative. However, several issues should be noted, such as how the memory management and data migration policy should be invoked. We will try to examine the various tradeoffs of these algorithms in this paper. 2. Previous Work Past research has shown main memory consisting of only PCM can not achieve comparable performance compare to DRAM [1][5]. In the hybrid architecture, there are several ways to utilize PCM. Qureshi et al. [2] proposed using PCM as backing storage for the DRAM. They evaluated the performance and power impact of various PCM-based hybrid memory architectures. They also showed a better management algorithm can improve the write endurance of the PCM significantly. With those results in mind, we will try to explore better algorithms to further improve energy efficiency and reliability with minimum performance overhead. Another hybrid memory architecture, PDRAM, is proposed by Dhiman et al [3], where the PCM is used in parallel with DRAM. They demonstrated such an organization can achieve significant energy saving with negligible performance hit. They used a static memory allocation algorithm where a new page is always allocated from the DRAM. However we believe a dynamic allocation policy can achieve better energy efficiency. All of the previous studies mostly focused on the tradeoffs between various organizations of hybrid memory system. We will be focusing on the policy/algorithm used to manage such system, given hybrid architecture. If the size of DRAM is scaled properly to the PCM, we believe using a dynamic energy and wearinessaware policy to migrate data between PCM and DRAM can further reduce power consumption and maintain sensible reliability of the PCM cell without significantly degrading performance. 3. Plan Our goal for this project is can be divided into three major components: 1) Investigating the PCM/DRAM hybrid architecture and organization. Choose a reasonable organization for DRAM/PCM. Our plan is to use the PCM and DRAM in parallel as proposed in PDRAM with a data line connection between PCM and DRAM for data migration purposes 2) Determine the DRAM: PCM ratio to minimize DRAM power consumption while not suffering from the PCM latency 3) Investigating the page migration policy for energy PCM wear leveling. This is elaborated in section 4.2 Date Deliverable 10/13/2010 Choose benchmarks, bring up the simulator infrastructure and collect data using just DRAM Extending DRAM to model PCM 11/1/2010 Implement and evaluate the baseline and threshold based algorithms Analyze the results and propose new algorithms 11/19/2010 Evaluate the proposed algorithms Fine tune the architecture/policy accordingly 11/29/2010 Tentative end date (Presentation) Figure 1: Plan of Schedule As a fallback plan, if we can not come up with effective algorithms to evaluate, we will switch our focus towards various organizations of hybrid memory. 4. Methodology 4.1 BLESS with DRAM/PCM and power estimation The main component that BLESS lacks for our application is the DRAM and PCM power simulation. Our initial task is to extend BLESS to simulate DRAM operating power, leakage and workload. Afterwards, we will create a PCM model using the DRAM model as a base. Our PCM model will have higher latency and utilize a counter that tracks the number of writes per cell. Finally, an MMU will be used in order to handle our page allocation/migration policy between DRAM and PCM by simulating the mapping from virtual address issued by the traces to the physical address in the simulation. 4.2 Page allocation/migration This will be the focus of our research. We will first evaluate the baseline allocation policy (allocating entirely from DRAM and allocating entirely from PCM). Our group will then look into possible threshold based page migration policy given the number of reads and writes to each page. For example, a page that is not latency critical is a good candidate for PCM, where a page frequently being written to is more suitable for DRAM. Once we develop the policy for threshold, we will use a prediction based policy to further optimize the page migration algorithm. Finally, time permitting, we will look into the impact of granularity of our migration policy, for example migrate lines rather than pages. 4.3 Wear Leveling In order to incorporate the wear leveling, we will start in combination with the page migration algorithm. Once the controller issues a page move command, the controller will keep a coarse grain track of where the PCM has been used the least in order to distribute the PCM write counts throughout the entire memory. Once this phase is over, we will further look into a line granularity and impose the Start Gap algorithm discussed in [4] in order to further distribute the write counts inside the page. 4.4 Benchmarking We will use Simics to generate traces for a benchmark program. In terms of benchmarks, we are currently looking at programs in SPEC, primarily memory intensive applications. We will evaluate the algorithms with pure DRAM and pure PCM of the same size. Our main metric will be power consumption and performance. In addition, we will use expected life time to model the reliability of PCM. That is, we will keep track of the write traffic to each page and use that information to derive the lifetime. 5. References [1] Benjamin C. Lee , Engin Ipek , Onur Mutlu , Doug Burger, “Architecting phase change memory a scalable dram alternative,” Proceedings of the 36th annual international symposium on Computer architecture, June 20-24, 2009, Austin, TX, USA [2] Moinuddin K. Qureshi , Vijayalakshmi Srinivasan , Jude A. Rivers, “Scalable high performance main memory system using phase-change memory technology,” Proceedings of the 36th annual international symposium on Computer architecture, June 20-24, 2009, Austin, TX, USA [3] G. Dhiman, R. Ayoub, and T. Rosing, “PDRAM: a hybrid PRAM and DRAM main memory system,” in DAC ’09: Proceedings of the 46th Annual Design Automation Conference. New York, NY, USA: ACM, 2009, pp. 664–469 [4] Moinuddin K. Qureshi , John Karidis, Michelle Franceschini. “Enhancing lifetime and security of pcm-based main memory with start-gap wear leveling.” Proceedings of the 42nd Annual International Symposium on Microarchitecture, 2009 [5] Benjamin C. Lee, Ping Zhou, Jun Yang, Youtao Zhang, Bo Zhao, Engin Ipek, Onur Mutlu, and Doug Burger,"Phase Change Technology and the Future of Main Memory" IEEE Micro, Special Issue: Micro's Top Picks from 2009 Computer Architecture Conferences (MICRO TOP PICKS), Vol. 30, No. 1, pages 60-70, January/February 2010. [6] Semiconductor Industry Association. “Process integration, devices & structures.” International Technology Roadmap for Semiconductors, 2007.