Hybrid Memory in Multi-Core Architectures 1 HanBin Yoon Justin Meza Rachael Harding hanbinyoon@cmu.edu meza@cmu.edu rharding@andrew.cmu.edu Motivation PCM memory without hardware alterations. Today’s main memory relies entirely on DRAM. Strides in process technology have continued to enable DRAM to scale to smaller design rules (hence larger capacities) and be more efficient. However, experts predict DRAM scalability to be approaching its limit due to its charge-based nature of representing information [6]. Modern systems continue to demand larger amounts of memory, partly owing to their propensity towards chip multiprocessing. Supplying this need with DRAM exclusively will quickly become expensive in terms of both power and cost. Phase-change memory (PCM) offers a competitive alternative to DRAM. Recent research on PCM has enabled its performance and power characteristics to become comparable to that of DRAM [2, 3]. Furthermore, PCM is projected to scale to smaller design rules than DRAM, due to PCM’s resistive nature of representing information [6]. However, enabling PCM has its challenges, such as lower performance and higher power consumption compared to DRAM [1, 7]. Furthermore, PCM requires wear-leveling [4]. PCM technology can be used in conjunction with DRAM to enable high-performance, energy-efficient main memory that multi-core architectures demand in large capacities. We propose a method of combining the strengths of DRAM and PCM that improves performance, quality of service, and energy-efficiency over existing proposals. Specifically, we plan to dynamically partition the amount of DRAM and PCM assigned to processes in a multi-core environment based on their workload characteristics. 2 2. Its hardware use is wasteful: it requires space to store the full PCM page mapping table in SRAM, even though the table may be sparse (for 4GB of PCM with 8KB page size, 4MB of SRAM is required). 3. It is unrealistic: the study’s restrictive evaluation only covers a single-core system, while current trends in processor development tend towards multicore systems that exhibit memory contention between multiple applications. Furthermore, the study does not propose a clear algorithm for DRAM-PCM page allocation and replacement policy. Other recent research has focused on improving the quality of service (QoS) in entirely PCM-based main memory systems based on predetermined application priorities [7]. While maintaining QoS in a multi-core environment is important to ensure fairness and prevent starvation, we believe that future systems will likely incorporate a mixture of DRAM and PCM due to performance and reliability concerns. Such systems will require innovative techniques to enable QoS. In [2, 3], Lee et al. propose a wear-leveling technique and introduce multiple row buffers in order to improve PCM’s performance and reliability. Their changes to PCM architecture also improve its power characteristics to make it a more desirable alternative to DRAM. This work is orthogonal to our proposed research and could be used as an underlying hardware implementation of PCM. Qureshi et al. developed a wear-leveling algorithm for PCM that exhibits low memory overhead [4]. However, if an attack to wear-out PCM were to write repeatedly to the same line, “Region Based Start-Gap (RBSG)” will only distribute the wear over a single start-gap region. Such an approach does not utilize the full endurance capacity of the PCM. Our own approach may extend the existing operating system page table to perform wear-leveling more evenly over the entire PCM. Related Work Prior work has taken a multi-core-agnostic approach to integrating DRAM and PCM for improved energyefficiency [1]. Though the study addresses some of the issues of incorporating DRAM and PCM, the approach suffers from several shortcomings: 1. It is not scalable: it introduces a PCM page map stored in SRAM in the memory controller which is tightly coupled to the amount of PCM memory being used. It does not scale with varying capacities of 1 3 Approach Milestone 2 Simulate SPEC CPU2006 workload traces under a modified system architecture and analyze the This research aims to develop and evaluate algorithms and intermediate results. hardware support for hybrid DRAM-PCM systems that provide an effective main memory solution for multi-core Milestone 3 Refine our mechanisms based on simulation architectures. Our anticipated contributions (and impleresults. Perform a sensitivity study, varying the ratio mentation ideas) include: of system DRAM-to-PCM and observing the resulting system characteristics. Begin drafting the paper. 1. Evaluating applications to determine optimal page placement in a hybrid memory system to effectively References provide QoS based on application characteristics. We modify the utility-based partitioning approach [1] G. Dhiman, R. Ayoub, and T. Rosing. PDRAM: a hybrid PRAM and DRAM main memory system. In in [5] to determine per-thread memory access patDAC ’09: Proceedings of the 46th Annual Design Auterns. We aim to speed up all the threads in a multitomation Conference, pages 664–669, New York, NY, core system by striking a balance amongst the memUSA, 2009. ACM. ory requirements of threads that have large and small working sets. [2] B. C. Lee, E. Ipek, O. Mutlu, and D. Burger. Architecting phase change memory as a scalable DRAM 2. Developing a dynamic memory mapping algorithm alternative. In ISCA ’09: Proceedings of the 36th anand page allocation and replacement policies for a nual international symposium on Computer architechybrid memory system. In addition to using applicature, pages 2–13, New York, NY, USA, 2009. ACM. tion characteristics to determine data placement, we will also identify frequently-accessed, frequently[3] B. C. Lee, P. Zhou, J. Yang, Y. Zhang, B. Zhao, written “hot” pages and improve performance and E. Ipek, O. Mutlu, and D. Burger. Phase-change techenergy-efficiency by locating them in DRAM. nology and the future of main memory. IEEE Micro, 30(1):143–143, 2010. 3. Developing a novel wear-leveling method for a hybrid memory system. This may be accomplished by [4] M. K. Qureshi, J. Karidis, M. Franceschini, V. Sriniextending the operating system’s virtual memory invasan, L. Lastras, and B. Abali. Enhancing lifefrastructure to ensure page writes are performed unitime and security of PCM-based main memory with formly across PCM. start-gap wear leveling. In MICRO 42: Proceedings of the 42nd Annual IEEE/ACM International Sympo4. Proposing software and hardware changes to implesium on Microarchitecture, pages 14–23, New York, ment the above design goals. For example, adapting NY, USA, 2009. ACM. operating system page tables and the TLB to address pages stored in PCM. [5] M. K. Qureshi and Y. N. Patt. Utility-based cache partitioning: A low-overhead, high-performance, run5. Evaluating the performance and energy-efficiency of time mechanism to partition shared caches. In our architecture. This includes a sensitivity study of MICRO 39: Proceedings of the 39th Annual DRAM-to-PCM ratios to find an optimal provisionIEEE/ACM International Symposium on Microaring of DRAM and PCM. chitecture, pages 423–432, Washington, DC, USA, 2006. IEEE Computer Society. 4 Methodology and Roadmap We will test our memory-mapping algorithm and hard- [6] Semiconductor Industry Association. International ware improvements on the BLeSS simulator modified to Technology Roadmap for Semiconductors: Process include PCM. We will evaluate our architecture’s perIntegration, Devices, and Structures. 2007. formance, QoS, and energy-efficiency using benchmarks from SPEC CPU2006. We will derive a power model [7] P. Zhou, Y. Du, Y. Zhang, and J. Yang. Fine-grained QoS scheduling for PCM-based main memory sysbased on the data listed in [1]. tems. pages 1 –12, apr. 2010. Milestone 1 Consider possible page migration policies, wear-leveling algorithms, and quality of service algorithms. Modify the BLeSS simulator, adding support for PCM and page tables. 2