Dueling Segmented LRU Replacement Algorithm Hongliang Gao Chris Wilkerson The Basic Ideas • Auxiliary Directory: – Evaluates “dueling” replacement algorithms. • Segmented LRU list: – Reference bit protects lines with good locality. – Aging/ Random Promotion. • Adaptive Bypass: – Protect cache contents by bypassing the cache completely. Dueling Replacement Algos Auxiliary Directory Set0 Set1 Set2 Set3 Set4 Set5 Set6 • 32 sets sampled (static) • 2 policies evaluated in each sampled set. • 16-bit mini-tags • Counter updated when policies differ. Set7 Tag Array Saturating Counter Review of Segmented LRU SLRU: Reference Bit 4 LRU bits per line track LRU position Tag •Reference bit is marked when a line is referenced. •Replace any non-referenced lines first. •Replace global LRU if all lines are referenced. SLRU Features • Random Promotion – Reference bit is marked when referenced or when randomly promoted. – Eg: 1/32 newly allocated lines may randomly be selected for promotion. • Aging – Reference bits can be cleared as well as set. – Line allocations cause the reference bit of the LRU line to be cleared. Adaptive Bypass • Misses result in allocation or bypass. Data Structure Thrashing on 4th way No Thrashing w/o Bypass w/ Bypass Cache • Bypass based on a random probability. – Eg: 1, 1/2, 1/4, … 1/4096. – Probability is doubled/halved according to the success of previous bypasses. SLRU w/ Adaptive Bypassing SLRU: Reference Bit 0 1 0 1 • De-allocated line tracked by partial tag. • Allocated line tracked by 4 bit pointer. • Valid Bit • Virtual Bypass Bit 16 bit partial tag for “out-of-cache” competitor 4 bit pointer for “in-cache competitor” ch en bz ip 40 2 3. gc 42 c 43 9.m 4. ze cf u 44 sm 5. p go b 44 mk 7. d 45 ea lI 0. so I p 45 6. lex h 46 mm 4. e h2 r 47 64 1. om ref ne 47 tpp 3. as ta 48 r 1 48 2. .wr f s 48 3. phi xa nx la 3 nc bm av k er ag e 1. 40 rlb pe 0. 40 % Bypass Frequency of Bypass 100.00% P0_BYPASSED Benchmark P1_BYPASSED 80.00% 60.00% 40.00% 20.00% 0.00% 45 9.m cf 48 0. s 3. xa opl la ex 47 nc b 1. om mk 48 ne t 2. sp pp hi 44 nx 7. 3 43 de a 4. ze lII us 40 mp 1. bz ip 40 2 3. 47 gcc 3. as t 40 48 ar 0. 1 pe .w rlb rf 46 en 4. c h2 h 45 64 6. ref hm 44 m e 5. go r bm gm k ea n 42 DSB impact on MPKI vs TLRU 70 60 40 70 MPKI for true LRU 50 % reduction MPKI w/ DSB -10 60 50 40 30 30 20 20 10 10 0 0 -10 so 0. cf .m p la lex 47 nc b 1. om mk 48 ne t 2. sp pp hi 44 nx 7. 3 43 de a 4. ze lII us 40 m 1. p bz i 40 p2 3. 47 gcc 3. as t 40 48 ar 0. 1 pe .w rlb rf 46 en 4. c h2 h 45 64 6. ref hm 44 m e 5. go r bm gm k ea n xa 3. 48 45 42 9 Speedup vs TLRU Speedup 1.8 1.6 DSB NoAge 1.4 NoByp 1.2 1 0.8 0.6 0.4 0.2 0 BACKUP SLRU w/ Adaptive Bypassing SLRU: Reference Bit • Bypass • Bypassed line tracked by partial tag. • Incumbent line tracked by 4 bit pointer. 1 • Subsequent reference 0 to bypass line reduces bypass probability. • Subsequent reference 16 bit partial tag for “out-of-cache” competitor 4 bit pointer for “in-cache competitor” to incumbent increases bypass probability. CONFIG 1 CONFIG 2 CONFIG 3 Enable bypassing for policy0 True True True Enable bypassing for policy1 False True True Random promotion probability for policy0 0 0 0 Random promotion probability for policy1 0 0 16 Aging for policy0 0 0 0 Aging for policy1 1 1 1 Virtual bypassing probability 16 8 8 Initial bypassing probability 64 64 8 Second minimum bypassing probability (minimum is 0) 1/256 1/4096 1/4096 Config2: 2 Policies auxiliary directory collects statistics replacement policy performance and updates a policy selector counter. SLRU 1-reference bit indicates whether each line is in the reference or non-reference list. Set0 Set1 Set2 4 LRU bits per line track LRU position Set3 Set4 Set5 Set6 valid bits Set7 Tag Array 16 bit partial tag for “out-of-cache” competitor Tracking bypass 4 bit pointer for “in-cache competitor”