Protection and Utilization in Shared Cache Through Rationing ∗ Raj Parihar Jacob Brock† Chen Ding† Michael C. Huang † Dept. of Electrical & Computer Engineering Dept. of Computer Science University of Rochester, Rochester, NY 14627, USA {parihar@ece. jbrock@cs. cding@cs. ABSTRACT Shared cache is generally optimized for overall throughput, fairness, or both. Increasingly in shared environments, e.g., compute clouds, users are unrelated to one another. In such circumstances, an overall gain in throughput does not justify an individual loss. This paper explores a new strategy for conservative sharing, which protects the cache occupancy for individual programs, but still enables full cache sharing whenever there is unused space. Keywords Cache management, Protection, Rationing 1. INTRODUCTION We present a new hardware based mechanism called cache rationing. In cache rationing, each core/program is assigned a portion of the shared cache as its ration. Hardware support protects each program’s ration from eviction by a peer program, but allows any program to utilize unused cache space in another’s ration. Accounting is done using some hardware support: an access bit in each cache line tracks whether that line has been used recently, and each core tracks its cache usage using a ration counter. The rationing support also allows hardware-software collaborative caching [6]. In particular, we can add a single “hint bit” to indicate a special memory load and store. A special operation marks the cached data as “evict-me” by clearing the access bit upon the special access. This interface allows collaborative caching and can further improve the performance of rationed cache. 2. CACHE RATIONING michael.huang@}rochester.edu core/program. The ration can be specified by software, e.g., through privileged instructions. Here we show how cache rationing is designed to meet its objectives of protection and utilization at the same time. Ration Counter. To implement our rationing mechanism, we require the storage of a ration counter for each core. Each cache line needs to store a reference to the ration counter of its owner. The ration counter stores two integer values: its owner’s ration size, and its owner’s cache usage (the number of blocks in the cache that it loaded). The maintenance logic is shown in the pseudo-code as follows. There are three cases: a cache load (upon a miss), an eviction, and a normal access (a hit). At a cache load, we set the owner record to point to the ration counter of the loader and increment it. At an eviction, we decrement the owner’s ration counter. The code for a normal access accounts for data sharing by assigning ownership of the accessed block to its most recent user. fetch_at_miss( blk, p ) blk.owner’s_counter = p.ration_counter blk.owner’s_counter ++ evict( blk ) blk.owner’s_counter -access_at_hit( blk, p ) if blk.owner’s_counter != p.ration_counter blk.owner’s_counter -blk.owner’s_counter = p.ration_counter blk.owner’s_counter ++ Access Bit. The second hardware extension is the access bit for detecting an unused ration. Each block has an access bit, which is set whenever the block is referenced. Periodically, all access bits are reset. A rationed cache block is deemed unused if either it is not owned or the access bit is zero. A ration for a CPU core/program is the guaranteed effective size of space in the shared cache that is allocated to the Rationing Control. ∗This On a miss to a completely occupied cache, the following algorithm chooses which block to evict: work is supported by NSF CAREER award CCF-0747324, CCF-1116104, an IBM Fellowship, and by NSFC under the grant 61328201. Permission to make digital or hard copies of part or all of this work for personal or classroom use is granted without fee provided that copies are not made or distributed for profit or commercial advantage, and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be honored. For all other uses, contact the owner/author(s). Copyright is held by the author/owner(s). PACT’14, August 24–27, 2014, Edmonton, AB, Canada. ACM 978-1-4503-2809-8/14/08. http://dx.doi.org/10.1145/2628071.2628120 . miss( blk, p1 ) if repl exist s.t. !repl.owned or !repl.accessed replace repl with blk elsif p1.at_or_over_ration? replace LRU block of p1 else # p2 over ration replace LRU block of p2 2.1 Example Illustration and Comparison Rationing has two goals: cache resource protection, and cache resource utilization. In Figure 1, we show an example for each. Assume we have two cores, each accessing a IPC Norm. to solo run w/ 512KB cache 1.744 1.4 SPEC 2000: with eon (2 cores, 1 MB L2 cache) 1.2 1.0 0.8 1 2 3 ... Partitioning Communist ... 26 1 2 3 ... Free-for-All Capitalist ... 26 1 2 3 ... Rationing Rationing ... 26 Figure 2: SPEC 2000 benchmarks co-run with the eon benchmark. The first bar in each pair represents eon. While free-for-all sharing maximizes utilization, it punishes eon by allowing its peer to hog the cache. Figure 1: Top: Rationing performs as well as partitioning and better than sharing because it protects core 1 against interference by core 2. Bottom: Rationing performs as well as sharing and better than partitioning because core 2 is allowed to utilize core 1’s ration. Non-compulsory misses are shaded. separate set of data, and an evenly rationed cache with two blocks for each core. Figure 1 shows the access trace for each core on the left hand side. It also shows the content of access bits (one for each block) and ration counters (one for each core). In the interleaved execution, if two requests come at the same time, we arbitrarily assume that the cache sees the request from core 1 first. The two examples demonstrate that cache rationing can combine the advantages of cache partitioning and sharing while avoiding their problems. be done using compiler analysis [1, 6]). Taking the hints, the hardware can choose not to cache those data. Regardless of what the software does, it needs the hardware instruction in order to mark a data block and tell the hardware to replace it before replacing other blocks. Such an instruction can be readily supported by cache rationing by zeroing the access bit when a hint bit suggests that a block should not be kept in cache. As an example, consider two cores sharing a 4-block cache. Let the access traces be xyzxyz... for one core and abcabc... for the other. With equal rationing, neither core has enough cache to obtain any reuse. However, with cache hints, the software can free up cache space by zeroing every other access bit. In this case, the non-compulsory miss ratio can be shown to be reduced from 1 to 1/2. In [3], it is shown that a hint-based solution can achieve optimal caching. Protection. The top example in Figure 1 shows resource protection. In this case, core 1 uses 2 blocks, which it can hold entirely within its ration. In free-for-all sharing, data from core 2 can evict data used by core 1. In contrast, the partitioned cache and the rationed cache do not permit core 2 to intrude on the ration of core 1. Due to the lack of protection, the free-for-all policy causes the most misses. Partitioning and rationing perform equally well by providing resource protection. However, the mechanisms are different. As the next example shows, rationing permits sharing. Utilization. The bottom example in Figure 1 shows cache utilization. If core 1 uses just 1 block, and core 2 uses 3, fixed partitioning would under-utilize the space partitioned for core 1. Free-for-all sharing and rationing, in contrast, can fully utilize the 4 cache blocks for the 4 program blocks. In Figure 2, we show the comparison of partitioning, free-for-all and rationing for eon co-run with SPEC 2000 applications. It is evident that rationing never slows down eon and often achieves good speed up for the co-running program. In contrast, partitioning never achieves any speed up and freefor-all speeds up programs at the cost of slowing down eon. Detailed results are included in the technical report [4]. 2.2 Interaction with Other Optimizations In this section, we discuss the ease of integration of rationing with some of the other designs. A number of processors provide special load/store instructions that a program can use to influence hardware cache management [2, 5, 6, 7]. Such cache hints can be used to mark accesses whose data will have no chance of reuse before eviction (this can 3. SUMMARY This paper presents rationing for shared-cache management that protects the ration of each program while at the same time finds and utilizes unused ration among the co-run programs. The new support can be added on top of existing cache architecture with minimal additional hardware. When an application does not use all its ration, rationing achieves good utilization similar to free-for-all cache sharing. When a program exerts strong interference, rationing provides good protection similar to cache partitioning. In addition, rationing provides an integrated design for cache sharing and software-hardware collaboration. 4. REFERENCES [1] K. Beyls and E. H. D’Hollander. Generating cache hints for improved program efficiency. Journal of Systems Architecture, 51(4):223–250, 2005. [2] J. Brock, X. Gu, B. Bao, and C. Ding. Pacman: Program-assisted cache management. In Proceedings of ISMM, 2013. [3] X. Gu, T. Bai, Y. Gao, C. Zhang, R. Archambault, and C. Ding. P-OPT: Program-directed optimal cache management. In Proceedings of the LCPC Workshop, pages 217–231, 2008. [4] R. Parihar, J. Brock, C. Ding, and M. C. Huang. Protection, utilization and collaboration in shared cache through rationing. Technical Report TR-995, University of Rochester, Nov 2013. [5] S. Rus, R. Ashok, and D. X. Li. Automated locality optimization based on the reuse distance of string operations. In Proceedings of CGO, pages 181–190, 2011. [6] Z. Wang, K. S. McKinley, A. L.Rosenberg, and C. C. Weems. Using the compiler to improve cache replacement decisions. In Proceedings of PACT, Charlottesville, Virginia, 2002. [7] X. Yang, S. M. Blackburn, D. Frampton, J. B. Sartor, and K. S. McKinley. Why nothing matters: the impact of zeroing. In Proceedings of OOPSLA, pages 307–324, 2011. 1 2 3 ... PIPP−equa