Inter-Core Cooperative TLB Prefetchers for Chip Multiprocessors (6/7)

advertisement
Inter-Core Cooperative
TLB Prefetchers for
Chip Multiprocessors
Abhishek Bhattacharjee and Margaret Martonosi
Department of Electrical Engineering
Princeton University
ASPLOS’10
TLB management
• Hardware-managed TLB
– No need for expensive interrupts
– Pipeline remains largely unaffected
– OS cannot employ alternate design
• Software-managed TLB
– Data structure design is flexible since the OS controls the
page table walk
– Miss handler is also instructions
• It may itself miss in the inst. cache.
– Data cache may be polluted by the page table walk
Multiprocessor TLB miss
• CMP maintains per-core instruction and data
TLBs.
• Significant similarities
exist in TLB miss patterns
among multiple cores.
Predictable TLB Miss Pattern
• Inter-core Shared (ICS) TLB Misses
– Translation accessed by a previous miss on any of the other
cores with the same virtual page, physical page, context ID,
and page size
– Leader-Follower prefetching
• Inter-core Predictable Stride (ICPS) TLB Misses
– A stride of S if its virtual page V+S differs by S from the
virtual page V of the preceding matching miss
• Core 0 TLB Miss virtual pages : 3, 4, 6, 7
• Core 1 TLB Miss virtual pages :
7, 8, 10, 11
– Core distances are 1, 2, 1
• Although the cores are missing on different virtual pages, they both
have the same distance pattern in their misses
– Distance-based cross-core prefetching
Leader-Follower Prefetching
• If a core (the leader) TLB misses on a particular virtual
page entry, other cores (the followers) will also
typically TLB miss on the same virtual page eventually
• Pushing virtual page entry into the followers’ TLB
• Not directly into the TLB, but instead insert into a
small separate Prefetch Buffer(PB).
– The bad prefetch may be harmful in that it will be unused.
– The prefetch may be harmful in that it will evict existing PB
entries too early
Leader-Follower Prefetching
• Case 1
– D-TLB miss / PB hit on core 0
• remove the entry from core 0’s PB
• Add the entry to its TLB
• Case 2
– D-TLB miss / PB miss on core 1
• Translation is located and refilled into the D-TLB
• Prefetched(pushed) into PBs of the other cores
Leader-Follower Prefetching
• Prefetch a translation into all the follower cores
every time a TLB and PB miss occurs on the
leader core
– This approach may be over-aggressive
• Confidence estimation
– 2-bit saturating counters
• Core 0 has counters for cores 1 to N-1
• B-bit confidence counter is greater or equal to 2B-1,
prefetch to a follower
Leader-Follower Prefetching
•
Case 1
•
Case 2
•
Case 3
– PB hit on core 0 and insert PB entry into D-TLB
– Identify the initiating core(core 1)
– Increment core 1’s confidence counter corresponding to core 0
– D-TLB / PB miss on core 1
– Check the confidence counter ≥2B-1
– If core 1’s counter corresponding to core 0 is above this value, pushes the
translation into core 0’s PB
– PB entry is evicted from core N-1 without being used.
– Send message –bad prefetch- to the core that initiated this entry (core 1)
– Core 1’s counter corresponding to core N-1 is decremented
Distance-Based Cross-Core Prefetching
• Although the cores are missing on different
virtual pages, they can both have the same
distance pattern in their misses
• Record repetitive distance-pairs to find the
next predicted distance and hence the next
virtual pages.
– Find the stride patterns
Distance-Based Cross-Core Prefetching
• 1. PB miss : calculate the current distance (current TLB miss
virtual page - last virtual page)
• 2. Look up the distance table(DT) using the current distance
& the last distance
• 3. DT extracts predicted future distances from the stored
distance-pairs
– (1,2), (2,1)……
• 4. the predicted distances are used to calculate the
corresponding virtual pages and insert into PB
Result
16 entries in PB,
Average 46%
Download