Cache Coherence Ross Daly Chan Kim

advertisement
Cache Coherence
“Can we do a better job of supporting cache coherence?”
Ross Daly
Chan Kim
Definition of CC
• “For any given memory location, at any given moment
in time, there is either a single core that may write it (and
that may also read it) or some number of cores that may
read it.”
•
“Data-Value Invariant: the value of a memory location
at the start of an epoch is the same as the value of the
memory location at the end of its last read-write epoch”
- D. J. Sorin, M. D. Hill, and D. A. Wood. A Primer on
Memory Consistency and Cache Coherence, volume 6 of
Synthesis Lectures on Computer Architecture. Morgan &
Claypool Publishers, May 2011.
Goals
• Improve performance for cache coherency on
multi-core/many-core systems.
• Scaling the number of cores to increase
performance A
• Scaling the number of cores with out increasing
cache coherence complexity.
Xpoint Cache
• Motivation:
Xpoint: Architecture(2D)
Typical bus based Architecture
Xpoint Architecture
Xpoint: Architecture(3D)
Xpoint: Results
•
29x speedup for 32 core system
•
45x speedup for 64 core system
•
2.1 improvement over 64 core conventional bus
Increasing the Effectiveness of Directory Caches by Deactivating Coherence for
Private Memory Blocks: Motivation
• Keeping track of all the blocks in directory entails
huge storage requirements.
• Directory cache requires less storage, but it will
suffer from directory cache misses.
• Most of the accessed blocks (about 75% on avg.)
are private.
Increasing the Effectiveness of Directory Caches by Deactivating Coherence for
Private Memory Blocks: Private vs. Shared blocks
•
Coarse-grain strategy (page granularity)
•
OS detects when a private page must become shared.
•
Every new page load is private
•
When another processor access private blocks,
it becomes shared.
Increasing the Effectiveness of Directory Caches by Deactivating Coherence for
Private Memory Blocks
Increasing the Effectiveness of Directory Caches by Deactivating Coherence for
Private Memory Blocks: Coherence Recovery Mechanism
• Flushing-based Recovery Mechanism
- Flushing all the blocks within a page may increase
the miss rate.
• Updating-based Recovery Mechanism
Increasing the Effectiveness of Directory Caches by Deactivating Coherence for
Private Memory Blocks: Results
• Directory caches can avoid the tracking of about
57%
• Shorten the runtime of parallel application by 15%
while keeping directory cache size or to maintain
system performance while using directory caches 8
times smaller.
Complexity-Effective Multicore Coherence
• Similarity
- Motivation
- Private and Shared blocks
• Difference
- Simplifying the protocol
- directory-less
Complexity-Effective Multicore Coherence:
Simplifying the protocol
• Dynamic write policy
- Write-back vs. Write-through
• VIPS Cache coherency protocol
- Valid/Invalid – Private/Shared
Complexity-Effective Multicore Coherence:
Directory-less
• Self-invalidation
- Readers are allowed to make unregistered copies
of a memory location, as long as they promise to
invalidate these at the next synchronization point.
- Doe this follow cache coherency?
• Selective Flushing
• Write-through at a word granularity with per-word
dirty bit
Complexity-Effective Multicore Coherence:
Simplifying the protocol: Synchronization
• Synchronization relies on data race
• Atomic instructions spin locally in it’s L1 until the
condition is changed by another core.
• In this paper, a core does not send invalidation
signal to other cores when executes write inst.
• Solution?
Complexity-Effective Multicore Coherence:
Simplifying the protocol: Results
• Outperformed MESI directory protocol by 4.8%
• Reduced network energy consumption by 14.2%
• Simulated for 15 parallel benchmarks, on 16 cores
Download