Architecture Support for Data Isolation & Memory Monitoring Motivation Memory Monitoring

advertisement
Architecture Support for Data Isolation & Memory Monitoring
Arrvindh Shriraman, Sandhya Dwarkadas, and Michael L. Scott
Department of Computer Science, University of Rochester
Motivation
Multi-core processors based on shared memory
programming will soon dominate the computing spectrum
P
P
}
P
………
M
RTM [ ISCA’07 ]
Memory Monitoring
Programmer’s view
Alert-On-Update (AOU)
New instruction, ALoad, loads and marks cache line
A-tagged line on invalidation jumps to handler
➡ trigger event type can be capacity eviction or coherence
Coordinating and synchronizing data shared
across multiple threads is hard!
Execution Pipeline
Ld
Remote Store
Add
Handler
/ Eviction
….
ALoad/ Clear
A TAG
FlexTM [ ISCA’08 ]
Integrated Hardware-Software
transactional memory
approach
to
HTM cache entry
Hardware-Software
transactions
STM organization
flexible
FlexTM deploys
➡ Signatures for detecting and notifying conflicts
➡ CSTs for noticing and managing conflicts
➡ Lazy caches for in-cache data isolation and Redo-Buffer for
handling cache overflows
➡ AOU for propagating abort events to remote transactions
Meta
➡ RDIMM
mechanisms
operations
W TAG
Data accelerate common STM Data
Data
➡ software makes
policy decisions
Version
Version
Conflict
&
management
managementsupport uncommon
resolution events
➡ software routines
Data
FlexTM software
Rochester Transactional Memory
➡ checkpoints registers at Begin_Tx
➡ manages conflicts; controls Tx aborts using AOU trigger
➡ controls commit phase
Cache Line
Tracking memory location accesses is difficult
because of transparent coherence events
Cannot issue speculative operations to memory because
hardware protocol does not support undoing of writes
Shared Memory ++
Memory Monitoring (MM)
➡ provides read/write access summaries of code blocks
➡ event-style notification of desired coherence events
Apps: Reliability, Security, Watchpoints, and Debugging
Access summary Signatures
A TAG
1
Insert addresses accessed by thread in hardware bloom
filters. (Reads update Rsig & Writes update Wsig)
+ unboundedness, decouples tracking from caches
- false positives
Special instructions access cache blocks and insert physical
address into bloom filter
Coherence requests snoop signatures, test for membership
and piggy-back conflict type on response message
FWD_REQ 0xff83ff48
Address
h1
h1
h2
Meta
Data
Fastpath Transactions
TxD_2
TxD_1
COMMIT
OH(A)
Owner
#S
TxD_1
COMMIT
COMMIT
OH(A)
Owner
#S
CAS
Overflow
Readers
A
(current)
A
current
Overflow
Readers
FlexTM
L2 Directory
TG
d_
Fw
2
Wsig:{A}
Rsig:{}
W-W
1
X
ET
C1
1 TG
INV
ta
K_
2 Da
4 AC
ET
X
A;M@C0,C1
A;M@C0
RTM
10
➡ Alert-On-Update: precise but bounded size
➡ Signatures: imprecise but unbounded
➡ CST: track inter-processor conflicts for all watched locations
Data Isolation primitives
➡ PDI: private caches
speculative-write buffer
➡ Redo-Log: holds cache overflows in virtual memory
Registers
Control Regs.
Read/Write
Locations Summary
Read Signature
Write Signature
R-W
W-R
W-W
Conflict Tables
Processor Context
Cache block Isolation
ASI
Overflow Sig.
Inter-Processor
Data Conflicts
iss
1m
T
A
Track stores to cache-line
Tag
Data
L
Base Address Hash Param.
Overflow Count
C/A
Overflow Table Controller
TST B
Issue TLoad/TStore for speculative
memory operations
AOU
……….
PDI
In Cache
Foreach I set in W-R or W-W
Iterate over CSTs and update status
word of conflicting transactions
CAS (Status[i], ACT, ABORT)
A’
Logically commit on status word; start
physical commit of hardware state
CAS-Commit Status[id]
new version
8
2.3X
1.8X
4
4
2
2
0
0
HashTable
RBTree
Delaunay
3.8X
Caches detach lines selectively from coherence protocol
➡ track coherence messages and choose time to enforce rules
Cache protocol extended by two ‘T’ bit tagged states
TMI allows concurrent sharers & isolates data in cache
TMI & TI require just a flash-clear to convert lines to MESI
47---------------
100
TAGS
80
104
Data
---------------35
1
FlexWatcher
1.5X
1.15X
Discover
75X
17X
Compiler/ Programmer specifies
addresses to be tracked
GZIP2
IV
1.05X
N/A
Man
BO
1.80X
65X
Squid
ML
2.50X
N/A
Hardware triggers trampoline on
snoop hits
Discover is a SPARC binary instrumentation tool from OpenSPARC
Discover overheads were estimated on a Sun T1000 server
Other Uses
➡ watchpoints and race detectors
---------------35
Redo-Buffer controller
Base 1000
OSig
Config Sets,Ways Count
Commtd 0
0
Bug
BO
BO
Debugging
Store %o1, (80) /*o1 = 35 */
L1 Data Cache
C0
0
Benchmark
BC
GZIP
➡ fast mutexes and asynchronous messages
➡ performs look-aside transparently on L1 misses
---------------23
1
2
4
8
16
1
RBTree
Synchronization
A per-thread hash-table in virtual memory
Hardware controller
➡ fills table with “TMI” write-back data blocks
80
5
Extend ISA to support signatures
and AOU as first-class entities
Redo-Buffer
TMI
2
Vacation-L Vacation-H
➡ insert,member,activate,clear etc
80
10
2
4
8
16
RandomGraph
FlexWatcher Memory Debugger
Lazy Coherence
12---------------
Lazy
8
6
1.9X
Eager
4.1X
W
Wsigsig:{A}
:{}
Rsig:{}
W-W
1
Data Isolation
Addr
00
Lazy encourages progress
16 Threads
STM
10
6
➡ TMI buffers TStores; TI allows incoherence with remote TMI
Memory Monitoring primitives
ACTIVE
DIMM aids improve software-controlled TMs
Record inter-processor R-W, W-W & W-R conflicts
Decouples access conflict tracking from access tracking
DIMM Hardware Support
➡ refine architecture incrementally
➡ software evolve the API and use in varying applications
➡ decouple policy from mechanism
TLD A
Hardware-acceleration of Software-controlled transactions
3Threat
Decoupled hardware primitives for DIMM help
TxD_2
CAS
Checkpoint processor registers and
record abort handler PC
Begin_Tx abort_pc
Overflow Transactions
h2
Conflict Summary Tables (CST)
C0
Programmable-Data-Isolation
for data versioning
Alert-On-Update
for conflict detection
Member ?
➡ allows control over propagation of writes to remote threads
➡ buffer written locations and commit or undo as an atomic unit
Apps: Sand-boxing, Transactional programming, Speculation
Data
m bits
m bits
Data Isolation (DI)
R W TAG
{80}
Security
➡ buffer overflow attacks, information-flow trackers &
drivers/plugin isolation
Speculation
Buffer Overflow (B0)
Pad all heap allocated buffers with 64bytes, watch
padded locations
Memory Leak (ML)
Monitor all heap allocated objects and update the
address’s timestamp on access.
Invariant Violation (IV)
ALoad cache line for variable X of interest. On AOU
handler trigger assert program specific invariants.
Conclusion
Data-Isolation and Memory-Monitoring primitives will help
multi-core chips achieve widespread use across traditional
and emerging application domains
Decoupling the hardware components will help refine the
architecture incrementally and help software evolve the API
Use simple hardware to accelerate the common case,
minimize hardware state and employ software for the
uncommon case
➡ Thread-level speculation and lock elision
1
Web : http://www.cs.rochester.edu/research/cosyn/
L1 Data Cache
1
Email: {ashriram, sandhya, scott}@cs.rochester.edu
Download