A Few Subtle Insights About UCP Moinuddin K. Qureshi Work on UCP done while at: 1 First Things First I thank Xing and Rajeev for: 1. Validating that UCP (based on misses) works 2. Re-Validating that UCP (based on IPC) is slightly better than one based on misses: ~1%-3% As mentioned, this is not the 1st (or 2nd, or 3rd, or 4th …) paper to provide this insight 2 Critique 1: UCP(MPKI) = UCP 10 Consider two apps, A and B, with identical miss rate curves 4 3 5 2 UCP(MPKI) gives 2 ways to both: A&B A & B both access cache 1 per 100 inst, Cache Hit: 1 Cycle, Memory: 100 cycles A has 99 integer ops (1 cycle each): CPI_A = (99+1+ MissRatePerc)/100 0 1 3 2 4 1 Num Ways in 4-way Cache 1.0 A 0.5 B has 99 FP ops (10 cycles each): CPI_B = (990+1+ MissRatePerc)/100 UCP(MPKC) 4 ways to A: IPC_best, WS_best B 0 3 2 4 1 Num Ways in 4-way Cache UCP(MICRO’06) optimizes perf more than UCP(MPKI) 3 Critique 2: Dynamic can beat Static Optima 4 Critique 3: Not all Misses are Created Equal MPKI CPI Problem with Linear CPI Model of Xing 5 UCP: The last 4.5 years … Things I would have liked to see in literature: 1. Non-Integer Way Partition 2. Utility Based Cache Insertion 3. Prefetch Aware Cache Partition 6 Extension 1: Probabilistic Way Partition Common criticism of way partitioning: We can only allocate Integer number of ways A simple way to avoid this is Probabilistic Way Partition. Say you want to allocate 3.5 ways to application A Then on a cache miss, consult a Rand number generator If Randval > 50% of Randmax, then A gets 4 ways, else 3 ways On average, A will end up getting 3.5 ways in the cache Can go finer, say we want to allocate 4.125 ways to B 7 Extension 2: Utility Based Cache Insertion One can achieve the effect of partitioning by intelligent insertion In a 16-way cache, a given application A can insert at 16 locations If N applications share the cache the decision space is 16N An efficient hardware scheme that obtains the best decision in this decision space will outperform both UCP and TADIP 8 Extension 3: Prefetch Aware Partitioning How does one do partitioning under prefetching ? For applications whose dataset is prefetchable, we may Not want to give cache space (even if it has high utility) In-fact sometimes it’s a win-win to give more cache to irregular Apps, as it provides more bandwidth available for prefetching What is the right way to extend UCP to prefetches ? 9 Summary UCP: Partitioning based on misses works (simple) Several work has shown UCP based on IPC works slightly better There are several extensions of UCP still unexplored: -- Let me know if you are interested in exploring questions/comments: moinqureshi@gmail.com 10