Protection and Utilization in Shared Cache Through Rationing Raj Parihar Jacob Brock

advertisement
Protection and Utilization
in Shared Cache Through Rationing ∗
Raj Parihar
Jacob Brock†
Chen Ding†
Michael C. Huang
†
Dept. of Electrical & Computer Engineering
Dept. of Computer Science
University of Rochester, Rochester, NY 14627, USA
{parihar@ece.
jbrock@cs.
cding@cs.
ABSTRACT
Shared cache is generally optimized for overall throughput,
fairness, or both. Increasingly in shared environments, e.g.,
compute clouds, users are unrelated to one another. In such
circumstances, an overall gain in throughput does not justify
an individual loss. This paper explores a new strategy for
conservative sharing, which protects the cache occupancy
for individual programs, but still enables full cache sharing
whenever there is unused space.
Keywords
Cache management, Protection, Rationing
1.
INTRODUCTION
We present a new hardware based mechanism called cache
rationing. In cache rationing, each core/program is assigned
a portion of the shared cache as its ration. Hardware support
protects each program’s ration from eviction by a peer program, but allows any program to utilize unused cache space
in another’s ration. Accounting is done using some hardware
support: an access bit in each cache line tracks whether that
line has been used recently, and each core tracks its cache
usage using a ration counter.
The rationing support also allows hardware-software collaborative caching [6]. In particular, we can add a single
“hint bit” to indicate a special memory load and store. A
special operation marks the cached data as “evict-me” by
clearing the access bit upon the special access. This interface allows collaborative caching and can further improve
the performance of rationed cache.
2.
CACHE RATIONING
michael.huang@}rochester.edu
core/program. The ration can be specified by software, e.g.,
through privileged instructions. Here we show how cache rationing is designed to meet its objectives of protection and
utilization at the same time.
Ration Counter.
To implement our rationing mechanism, we require the
storage of a ration counter for each core. Each cache line
needs to store a reference to the ration counter of its owner.
The ration counter stores two integer values: its owner’s ration size, and its owner’s cache usage (the number of blocks
in the cache that it loaded).
The maintenance logic is shown in the pseudo-code as follows. There are three cases: a cache load (upon a miss), an
eviction, and a normal access (a hit). At a cache load, we set
the owner record to point to the ration counter of the loader
and increment it. At an eviction, we decrement the owner’s
ration counter. The code for a normal access accounts for
data sharing by assigning ownership of the accessed block
to its most recent user.
fetch_at_miss( blk, p )
blk.owner’s_counter = p.ration_counter
blk.owner’s_counter ++
evict( blk )
blk.owner’s_counter -access_at_hit( blk, p )
if blk.owner’s_counter != p.ration_counter
blk.owner’s_counter -blk.owner’s_counter = p.ration_counter
blk.owner’s_counter ++
Access Bit.
The second hardware extension is the access bit for detecting an unused ration. Each block has an access bit, which is
set whenever the block is referenced. Periodically, all access
bits are reset. A rationed cache block is deemed unused if
either it is not owned or the access bit is zero.
A ration for a CPU core/program is the guaranteed effective size of space in the shared cache that is allocated to the
Rationing Control.
∗This
On a miss to a completely occupied cache, the following
algorithm chooses which block to evict:
work is supported by NSF CAREER award CCF-0747324,
CCF-1116104, an IBM Fellowship, and by NSFC under the grant
61328201.
Permission to make digital or hard copies of part or all of this work for personal or
classroom use is granted without fee provided that copies are not made or distributed
for profit or commercial advantage, and that copies bear this notice and the full citation on the first page. Copyrights for third-party components of this work must be
honored. For all other uses, contact the owner/author(s). Copyright is held by the
author/owner(s).
PACT’14, August 24–27, 2014, Edmonton, AB, Canada.
ACM 978-1-4503-2809-8/14/08.
http://dx.doi.org/10.1145/2628071.2628120 .
miss( blk, p1 )
if repl exist s.t. !repl.owned or !repl.accessed
replace repl with blk
elsif p1.at_or_over_ration?
replace LRU block of p1
else # p2 over ration
replace LRU block of p2
2.1
Example Illustration and Comparison
Rationing has two goals: cache resource protection, and
cache resource utilization. In Figure 1, we show an example for each. Assume we have two cores, each accessing a
IPC Norm. to solo run w/ 512KB cache
1.744
1.4
SPEC 2000: with eon
(2 cores, 1 MB L2 cache)
1.2
1.0
0.8
1 2 3 ...
Partitioning
Communist
... 26
1 2 3 ...
Free-for-All
Capitalist
... 26
1 2 3 ...
Rationing
Rationing
... 26
Figure 2: SPEC 2000 benchmarks co-run with the eon benchmark. The first bar in each pair represents eon. While free-for-all
sharing maximizes utilization, it punishes eon by allowing its peer
to hog the cache.
Figure 1: Top: Rationing performs as well as partitioning and
better than sharing because it protects core 1 against interference
by core 2. Bottom: Rationing performs as well as sharing and
better than partitioning because core 2 is allowed to utilize core
1’s ration. Non-compulsory misses are shaded.
separate set of data, and an evenly rationed cache with two
blocks for each core.
Figure 1 shows the access trace for each core on the left
hand side. It also shows the content of access bits (one
for each block) and ration counters (one for each core). In
the interleaved execution, if two requests come at the same
time, we arbitrarily assume that the cache sees the request
from core 1 first. The two examples demonstrate that cache
rationing can combine the advantages of cache partitioning
and sharing while avoiding their problems.
be done using compiler analysis [1, 6]). Taking the hints,
the hardware can choose not to cache those data.
Regardless of what the software does, it needs the hardware instruction in order to mark a data block and tell the
hardware to replace it before replacing other blocks. Such
an instruction can be readily supported by cache rationing
by zeroing the access bit when a hint bit suggests that a
block should not be kept in cache.
As an example, consider two cores sharing a 4-block cache.
Let the access traces be xyzxyz... for one core and abcabc... for the other. With equal rationing, neither core
has enough cache to obtain any reuse. However, with cache
hints, the software can free up cache space by zeroing every
other access bit. In this case, the non-compulsory miss ratio
can be shown to be reduced from 1 to 1/2. In [3], it is shown
that a hint-based solution can achieve optimal caching.
Protection.
The top example in Figure 1 shows resource protection.
In this case, core 1 uses 2 blocks, which it can hold entirely
within its ration. In free-for-all sharing, data from core 2
can evict data used by core 1. In contrast, the partitioned
cache and the rationed cache do not permit core 2 to intrude
on the ration of core 1. Due to the lack of protection, the
free-for-all policy causes the most misses. Partitioning and
rationing perform equally well by providing resource protection. However, the mechanisms are different. As the next
example shows, rationing permits sharing.
Utilization.
The bottom example in Figure 1 shows cache utilization.
If core 1 uses just 1 block, and core 2 uses 3, fixed partitioning would under-utilize the space partitioned for core
1. Free-for-all sharing and rationing, in contrast, can fully
utilize the 4 cache blocks for the 4 program blocks. In Figure 2, we show the comparison of partitioning, free-for-all
and rationing for eon co-run with SPEC 2000 applications.
It is evident that rationing never slows down eon and often achieves good speed up for the co-running program. In
contrast, partitioning never achieves any speed up and freefor-all speeds up programs at the cost of slowing down eon.
Detailed results are included in the technical report [4].
2.2
Interaction with Other Optimizations
In this section, we discuss the ease of integration of rationing with some of the other designs. A number of processors provide special load/store instructions that a program
can use to influence hardware cache management [2, 5, 6,
7]. Such cache hints can be used to mark accesses whose
data will have no chance of reuse before eviction (this can
3.
SUMMARY
This paper presents rationing for shared-cache management that protects the ration of each program while at the
same time finds and utilizes unused ration among the co-run
programs. The new support can be added on top of existing cache architecture with minimal additional hardware.
When an application does not use all its ration, rationing
achieves good utilization similar to free-for-all cache sharing. When a program exerts strong interference, rationing
provides good protection similar to cache partitioning. In
addition, rationing provides an integrated design for cache
sharing and software-hardware collaboration.
4.
REFERENCES
[1] K. Beyls and E. H. D’Hollander. Generating cache hints for
improved program efficiency. Journal of Systems Architecture,
51(4):223–250, 2005.
[2] J. Brock, X. Gu, B. Bao, and C. Ding. Pacman:
Program-assisted cache management. In Proceedings of ISMM,
2013.
[3] X. Gu, T. Bai, Y. Gao, C. Zhang, R. Archambault, and C. Ding.
P-OPT: Program-directed optimal cache management. In
Proceedings of the LCPC Workshop, pages 217–231, 2008.
[4] R. Parihar, J. Brock, C. Ding, and M. C. Huang. Protection,
utilization and collaboration in shared cache through rationing.
Technical Report TR-995, University of Rochester, Nov 2013.
[5] S. Rus, R. Ashok, and D. X. Li. Automated locality
optimization based on the reuse distance of string operations. In
Proceedings of CGO, pages 181–190, 2011.
[6] Z. Wang, K. S. McKinley, A. L.Rosenberg, and C. C. Weems.
Using the compiler to improve cache replacement decisions. In
Proceedings of PACT, Charlottesville, Virginia, 2002.
[7] X. Yang, S. M. Blackburn, D. Frampton, J. B. Sartor, and K. S.
McKinley. Why nothing matters: the impact of zeroing. In
Proceedings of OOPSLA, pages 307–324, 2011.
1 2 3 ...
PIPP−equa
Download