Log-based Transactional Memory Mark D. Hill Multifacet Project, Univ. of Wisconsin—Madison April 13, 2007 @ Michigan (Go Blue!) Multicore here: “Intel has 10 projects in the works that contain four or more computing cores per chip” —Intel CEO, Fall ’05 How program? “Blocking on a mutex is a surprisingly delicate dance” —OpenSolaris, mutex.c © 2007 Multifacet Project University of Wisconsin-Madison LogTM Contributors • Faculty – Mark Hill, Ben Liblit, Mike Swift, David Wood • Students – Jayaram Bobba, Derek Hower, Kevin Moore, Haris Volos, Luke Yen • Alumna – Michelle Moravan • Funding – Grants from U.S. National Science Foundation – Donations from Intel and Sun 7/26/2016 3 Wisconsin Multifacet Project Summary • Our Transactional Memory (TM) goals – Unlimited TM model: even large/long transactions – Facilitate SW composition: unlimited nesting – Accelerate with some HW support • Log-based TM (Signature Edition) – Supports unlimited TM w/ nesting – Accelerates commit by writing new values in place (after saving old values in a per-thread log) – Signatures summarize read/write sets – HW mechanisms: simple, policy-free, SW accessible 7/26/2016 4 Wisconsin Multifacet Project Outline • TM Motivation & Background – Why TM?, Terminlogy, & Taxonomy • LogTM Hardware Preview • LogTM Version Management • LogTM Conflict Detection • LogTM Evaluation • LogTM Operating System Interactions (optional) • Summary & Future Directions 7/26/2016 5 Wisconsin Multifacet Project Locks are Hard // WITH LOCKS void move(T s, T d, Obj key){ LOCK(s); LOCK(d); tmp = s.remove(key); d.insert(key, tmp); Moreover UNLOCK(d); UNLOCK(s); } Coarse-grain locking limits concurrency Thread 0 move(a, b, key1); Thread 1 Fine-grain locking difficult move(b, a, key2); DEADLOCK! 7/26/2016 6 Wisconsin Multifacet Project Transactional Memory (TM) • Programmer says – “I want this atomic” • TM system – “Makes it so” void move(T s, T d, Obj key){ atomic { tmp = s.remove(key); d.insert(key, tmp); } } • Software TM (STM) Implementations – Currently slower than locks – Always slower than hardware? • Hardware TM (HTM) Implementations – Leverage cache coherence & speculation – Fast – But hardware finite & should be policy-free 7/26/2016 7 Wisconsin Multifacet Project Some Transaction Terminology Transaction: State transformation that is: (1) Atomic (all or nothing) (2) Consistent (3) Isolated (serializable) (4) Durable (permanent) Commit: Transaction successfully completes Abort: Transaction fails & must restore initial state Read (Write) Set: Items read (written) by a transaction Conflict: Two concurrent transactions conflict if either’s write set overlaps with the other’s read or write set 7/26/2016 8 Wisconsin Multifacet Project Nested Transactions for Software Composition • Modules expose interfaces, NOT implementations • Example – Insert() calls getID() from within a transaction – The getID() transaction is nested inside the insert() transaction int getID() { // child TX begin_transaction(); id = global_id++; commit_transaction(); return id; } void insert(object o){ // parent TX begin_transaction(); t.insert(getID(), o); commit_transaction(); } 7/26/2016 9 Wisconsin Multifacet Project Closed Nesting Child transactions remain isolated until parent commits • On Commit child transaction is merged with its parent • Flat – Nested transactions “flattened” into a single transaction – Only outermost begins/commits are meaningful – Any conflict aborts to outermost transaction • Partial rollback – Child transaction can be aborted independently – Can avoid costly re-execution of parent transaction 7/26/2016 10 Wisconsin Multifacet Project Implementing TM • Version Management Large state (must be precise) – new values for commit – old values for abort – Must keep both • Conflict Detection – Find read-write, write-read or write-write conflicts among concurrent transactions – Allows multiple readers OR one writer 7/26/2016 11 Checked often (must be fast) Wisconsin Multifacet Project How Do Hardware TM Systems Differ? Version Management Conflict Detection Lazy: check on commit Lazy: buffer updates & Eager: update “in place” move on commit after saving old values Like Databases with Optimistic Conc. Ctrl. Stanford TCC No HTMs (yet) Illinois Bulk Herlihy/Moss TM Eager: check before read/write 7/26/2016 MIT LTM Intel/Brown VTM 12 Like Databases with Conservative C. Ctrl. MIT UTM Wisconsin LogTM Wisconsin Multifacet Project Transactional Memory Goals/Challenges • Unlimited TM Model – Large transactions: cache victimization & even paging – Long transactions: thread switching/mitgration – OS traps/calls? • Facilitate SW composition – Unlimited closed nesting (open nesting?) • Accelerate with at most modest HW support – Make the common case fast – Make HW simple, policy-free, & SW exposed 7/26/2016 13 Wisconsin Multifacet Project Outline • TM Motivation & Background • LogTM Hardware Preview • LogTM Version Management • LogTM Conflict Detection • LogTM Evaluation • LogTM Operating System Interactions (optional) • Summary & Future Directions 7/26/2016 14 Wisconsin Multifacet Project Single-CMP/Multicore System Core0 Core2 L1 $ L1$ … Core13 Core14 Core15 L1$ L1$ L1$ Interconnect L2 $ DRAM 7/26/2016 15 Wisconsin Multifacet Project LogTM Per-Core Hardware Registers Version Mgmt: Pointers to Segmented Log Register Checkpoint Read Write TMCount LogFrame LogPtr Conflict Detection: Signatures SummaryRead SummaryWrite Processor (SMT Context) No Explicit TM State Tag Data Data Caches 7/26/2016 16 Wisconsin Multifacet Project Outline • Motivation & Background • LogTM Hardware Preview • LogTM Version Management – Basic Logging & Segmented Logs for Nesting • LogTM Conflict Detection • LogTM Evaluation • LogTM Operating System Interactions (optional) • Summary & Future Directions 7/26/2016 17 Wisconsin Multifacet Project LogTM’s Eager Version Management • New values stored in place VA • Old values stored in transaction log 00 12-------------- 0 1 0 40 --------------24 --------------23 0 1 0 1 C0 56-------------34-------------- 0 0 1 1000 c0 34------------ 1040 -- 40 ------------ – Allocated per-thread in virtual memory (like per-thread stacks) – Filled by hardware (during transactions) – Read by software (on abort) 7/26/2016 Log Base 1000 Log Ptr 1090 TM count 1 1080 18 Memory Block --23 R W Sets Transaction Log Wisconsin Multifacet Project Segmented Transaction Log for Nesting • LogTM’s log is a stack of frames (like activation records) • A frame contains: – – – – – Header (including saved registers and pointer to parent’s frame) Undo records (block address, old value pairs) Garbage headers (headers of committed closed transactions) Commit action records Compensating action records Header LogFrame Undo record LogPtr TM count Undo record 2 0 1 Header Undo record Undo record 7/26/2016 19 Wisconsin Multifacet Project Closed Nested Commit • Merge child’s log frame with parent’s – Mark child’s header as “dummy header” – Copy pointer from child’s header to LogFrame Header LogFrame Undo record LogPtr TM count Undo record 2 1 Header Undo record Undo record 7/26/2016 20 Wisconsin Multifacet Project LogTM Version Management Discussion • Eager Version Management via Segment Log • Advantages: – – – – – – Transaction read new values normally (w/o bypassing) No data movement at commit Both old & new data in (virtual) memory Both old & new data can be cached or victimized Supports unbounded nesting No extra indirection (unlike STM) • Disadvantages – Aborts slower & handled by software – Adds HW to write log – Requires eager conflict detection? 7/26/2016 21 Wisconsin Multifacet Project Outline • TM Motivation & Background • LogTM Hardware Preview • LogTM Version Management • LogTM Conflict Detection – Signatures, Nesting, & Detection via Coherence • LogTM Evaluation • LogTM Operating System Interactions (optional) • Summary & Future Directions 7/26/2016 22 Wisconsin Multifacet Project LogTM-SE Read/Write Set Summary • Use Per-Thread Signatures (adapted from Bulk) • (Original LogTM used in-cache read/write bits) Program: xbegin LD A ST B LD C LD D ST C … 7/26/2016 External F A C D ST E B Hash Function(s) R 00100100 00000100 00000000 W 00100010 00000000 00000010 23 ALIAS FALSE POSITIVE: NO CONFLICT CONFLICT! Wisconsin Multifacet Project Conflict Detection for Unbounded Nesting • Nesting Affects Signatures (not coherence next) • • • • Nested Begin: Save R/W Signatures on Log Partial Abort: Restore R/W Signatures (Closed) Nested Commit: Discard Saved Signatures Open Nesting also handled • Recall LogTM’s Segmented Log already supports version management for unbounded nesting – Add saved signature space to frame header • <Skip Nested Signature Example> 7/26/2016 24 Wisconsin Multifacet Project Nested Begin Program Processor State xbegin R 01001000 00000000 W 00000000 01010010 LD … 01001000 01010010 Undo entry 1 TMCount Undo entry Log Frame Xact header Log Ptr 7/26/2016 Xact header Undo entry ST … xbegin Transaction Log 25 Wisconsin Multifacet Project Nested Begin Program xbegin LD … Processor State R 01001000 W 01010010 Undo entry 2 TMCount Log Frame Undo entry 01001000 Xact header 01010010 Log Ptr 7/26/2016 Xact header Undo entry ST … xbegin Transaction Log 26 Wisconsin Multifacet Project Partial Abort Program xbegin LD … Processor State R 01001001 01001000 W 01010010 01110110 LD … Xact header Undo entry ST … xbegin Transaction Log Undo entry 1 2 TMCount Log Frame Undo entry 01001000 Xact header 01010010 Log Ptr Undo entry ST … Undo entry ABORT! 7/26/2016 27 Wisconsin Multifacet Project Nested Commit Program xbegin LD … Processor State R 01001000 01001001 W 01010010 01110110 LD … Xact header Undo entry ST … xbegin Transaction Log Undo entry 1 2 TMCount Log Frame Undo entry 01001000 Xact header 01010010 Garbage Hdr Log Ptr Undo entry ST … Undo entry xend 7/26/2016 28 Wisconsin Multifacet Project Unbounded Nesting Support Summary • Closed nesting: – Begin: save signatures – Abort: restore signatures – Commit: No signature action • Open nesting: – Begin: save signatures – Abort: restore signatures – Commit: restore signatures 7/26/2016 29 Wisconsin Multifacet Project LogTM’s Eager Conflict Detection (before access) LogTM detects conflicts using coherence (1) Requesting core issues coherence request (2) L2 directory forwards to other core(s) (3) Responding core – Detects conflict using local signatures – Informs requesting processor of conflict (4) Requesting core resolves conflict 7/26/2016 30 Wisconsin Multifacet Project Protocol Animation: Transactional Write • Core C0 store – C0 sends get exclusive (GETX) request – L2 Directory responds with data (old) – C0 executes store L2 Directory I [old] M@C0 [old] GETX C0 DATA 0 TM mode 1 (W-) (--) Signature IM [none] [new] [old] 7/26/2016 31 C1 TM mode 0 Signature (--) I [none] Wisconsin Multifacet Project Protocol Animation: Transactional Conflict • In-cache transaction conflict – C1 sends get shared (GETS) request – L2 Directory forwards to P0 – C1 detects conflict and sends NACK L2 Directory M@C0 [old] GETS Fwd_GETS C0 0 TM mode 1 (W-) Signature M [new] C1 TM mode 0 Signature (--) I [none] Conflict! 7/26/2016 32 NACK Wisconsin Multifacet Project Cache Victimization Gracefully Handled! • Consider eviction of transactional data from Core C0 • No Effect on R/W Set Summary via Signatures • For Conflict Detection, Forward Coherence Requests After Victimization – Trivial with broadcast coherence – Silent S replacements w/ directory: S @ C0 S @ C0 – Writeback to directory sticky: M @ C0 Sticky-M @ C0 • Recall Eager Version Management via Log – On commit: no need to re-fetch victimized block – On abort: SW log walk naturally re-fetches victimized block 7/26/2016 33 Wisconsin Multifacet Project Sticky States: No New Bits in L1 Cache or L2 Directory Private (L1) Cache State Shared (L2) Directory State 7/26/2016 M M M E E S S I I S “Sticky-M” 34 “Sticky-S” I Wisconsin Multifacet Project Conflict Resolution • Conflict Resolution – Can wait risking deadlock or abort risking livelock – Wait/abort transaction at requesting or responding proc? • LogTM resolves conflicts at requesting processor • Original LogTM included HW timestamps – Requesting processor can waits (using nacks/retries) – or aborts if other processor is waiting (deadlock possible) & it is logically younger • Current LogTM has requesting processor traps to software contention manager that decides who waits/aborts 7/26/2016 35 Wisconsin Multifacet Project LogTM Conflict Detection Discussion • Eager Conflict Detection via Signatures & Coherence • Advantages: – – – – Supports unbounded nesting Signatures are compact HW Signatures software-accessible: save/restore for nesting Coherence provide efficient conflict detection • Disadvantages – Signatures have false positives – Requires modest coherence protocol changes – Does not (yet) handle thread migration & paging (but coming later) 7/26/2016 36 Wisconsin Multifacet Project Outline • TM Motivation & Background • LogTM Hardware Preview • LogTM Version Management • LogTM Conflict Detection • LogTM Evaluation – Methods, vs. Lock, & vs. Perfect Signatures • LogTM Operating System Interactions (optional) • Summary & Future Directions 7/26/2016 37 Wisconsin Multifacet Project Single-CMP LogTM System 2-way Core0 2-way Core1 L1 $ L1$ … 2-way Core13 2-way 2-way Core14 Core15 L1$ L1$ Registers Register Checkpoint L1$ TMCount Interconnect LogFrame LogPtr L2 $ Read Write SummaryRead SummaryWrite Core 15 (SMT Context 0) (SMT Context 1) DRAM 7/26/2016 38 Wisconsin Multifacet Project Experimental Methodology • Infrastructure – Virtutech Simics full-system simulation – Wisconsin GEMS timing modules • System – – – – 32 transactional threads (16 cores x 2 SMT threads/core) 32kB 4-way L1 I and D, 64-byte blocks, 1cycle latency 8MB 8-way unified L2, 34 cycle latency L2 directory for coherence, maintains full sharer bit vector • Workloads – Radiosity, Raytrace, Mp3d, Cholesky 7/26/2016 39 Berkeley DB Wisconsin Multifacet Project Lock Results 7/26/2016 40 Wisconsin Multifacet Project Perfect Signature Results Perfect signatures similar or better than Locks 7/26/2016 41 Wisconsin Multifacet Project Realistic Signature Results Realistic Signatures similar to Perfect Signatures and Locks For our workloads, false positives are not a problem 7/26/2016 42 Wisconsin Multifacet Project What about scalability? • Bigger system • Bigger transactions • False positives are a function of: – – – – Transaction size Transactional duty cycle Number of concurrent transactional threads Filtering due to on-chip directory protocol • Signatures gracefully degrade to serialization 7/26/2016 43 Wisconsin Multifacet Project LogTM Evaluation Discussion • LogTM Running Splash & BerkeleyDB • Good News: – Works! – Performs similar to locks – Signature false postive not (yet) an issues • Bad News – Baby workloads – Baby workloads – Baby workloads 7/26/2016 44 Wisconsin Multifacet Project Outline • TM Motivation & Background • LogTM Hardware Preview • LogTM Version Management • LogTM Conflict Detection • LogTM Evaluation • LogTM Operating System Interactions (optional) – Escape Actions, Thread Switching, (& Paging) • Summary & Future Directions 7/26/2016 45 Wisconsin Multifacet Project Escape Actions • Allow non-transactional escapes from a transaction – (e.g., system calls, I/O) – Similar to Zilles’s pause/unpause • Escape actions never: – – – – Abort Stall Cause other transactions to abort Cause other transactions to stall • Commit and compensating actions – similar to open nests Not recommended for the average programmer! 7/26/2016 46 Wisconsin Multifacet Project Thread Switching Support • Why? Support long-running transactions • What? – Conflict Detection for descheduled transactions • How? – Summary Read / Write Signatures w/ Invariant: If thread t of process P is scheduled to use an active signature, the corresponding summary signature holds the union of the saved signatures from all descheduled threads from process P. Updated using TLB-shootdown-like mechanism <skip example> 7/26/2016 49 Wisconsin Multifacet Project Handling Thread Switching W W00000000 W 00000000 00000000 Summary Summary Summary R 00000000 R 00000000 00000000 R OS T2 T1 T3 W R Summary 01001000 W R 01010010 P1 7/26/2016 0100000 W R 00000000 W 01000010 R P2 00000000 P3 50 00000000 00000000 0100000 W R 01010010 P4 Wisconsin Multifacet Project Handling Thread Switching W 01001000 00000000 SummaryR 01010010 00000000 OS Deschedule T2 T1 W R Summary W 00000000 00000000 01001000 01001000 01010010 R 01010010 P1 7/26/2016 W R Summary T3 W R 00000000 00000000 Summary 0100000 W R 00000000 W 01000010 R P2 00000000 00000000 00000000 P3 51 W R Summary 00000000 00000000 0100000 W R 01010010 P4 Wisconsin Multifacet Project Handling Thread Switching W 01001000 W W 01001000 01001000 SummaryR SummarySummary R 01010010 01010010 R 01010010 OS Deschedule T2 T1 W R Summary 00000000 00000000 01001000 W R 01010010 P1 7/26/2016 W R Summary T3 W R 00000000 00000000 Summary 0100000 W R 00000000 W 01000010 R P2 00000000 00000000 00000000 P3 52 W R Summary 00000000 00000000 0100000 W R 01010010 P4 Wisconsin Multifacet Project Handling Thread Switching W 01001000 SummaryR 01010010 OS T2 W R Summary 00000000 00000000 00000000 W R 00000000 P1 7/26/2016 W R Summary T3 W R 01001000 01010010 Summary 0100000 W R T1 00000000 W 01000010 R P2 01001000 01010010 00000000 P3 53 W R Summary 00000000 00000000 0100000 W R 01010010 P4 Wisconsin Multifacet Project Thread Switching Support Summary • Summary Read / Write signatures – Summarizes descheduled threads with active transactions • One OS structure per process • Check summary signature on every memory access Coherence • Updated on transaction deschedule – Similar to TLB shootdown 7/26/2016 54 Wisconsin Multifacet Project Paging Support Summary Problem: – Changing page frames – Need to maintain isolation on transactional blocks Solution: On Page-Out: – Save Virtual -> Physical mapping On Page-In: – If different page frame, update signatures with physical address of transactional blocks in new page frame. 7/26/2016 55 Wisconsin Multifacet Project LogTM OS Interaction Discussion • OS Call/Traps via Escape Actions, Thread Migration via Summary Signatures, & Crude Paging Support • Advantages: – Summary Signatures are compact, software-accessible HW – Most complexity for rare events religated to SW – Coherence not used/modfied for rare events (e.g., conflict detection after thread migration) • Disadvantages – Summary Signatures have false positives – TLB-shootdown-like unscalable SE – Paging support still crude 7/26/2016 56 Wisconsin Multifacet Project Summary • Our Transactional Memory (TM) goals – Unlimited TM model: even large/long transactions – Facilitate SW composition: unlimited nesting – Accelerate with some HW support • Log-based TM (Signature Edition) – Supports unlimited TM w/ nesting – Accelerates commit by writing new values in place (after saving old values in a per-thread log) – Signatures summarize read/write sets – HW mechanisms: simple, policy-free, SW accessible 7/26/2016 58 Wisconsin Multifacet Project Improving Transactional Memory • Non-Transactional Events – I/O & system calls • Development tools for TM – Profiling transaction conflicts – Detecting & preventing false sharing • What is the right interface for TM? – ISA changes will outlast implementations 7/26/2016 59 Wisconsin Multifacet Project Deconstructing Transactional Memory • Hardware should provide primitives not solutions • TM is a programming language solution • Instead let hardware provide: – Checkpointing (version management) – Address Matching (conflict detection) • Software can build transactional memory • Other uses for these primitives 7/26/2016 60 Wisconsin Multifacet Project http://www.cs.wisc.edu/multifacet/ [HPCA 2006] LogTM: Log-based Transactional Memory [ASPLOS 2006] Supporting Nested Transactional Memory in LogTM [HPCA 2007] LogTM-SE: Decoupling Hardware Transactional Memory from Caches [ISCA 2007] Performance Pathologies in Hardware Transactional Memory FriendlyFire, StarvingWriter, SerializedCommit, FutileStall, StarvingElder, RestartConvoy, & DuelingUpgrades 7/26/2016 61 Wisconsin Multifacet Project Google “Wisconsin GEMS” Trace flie Contended locks Random Tester Deterministic Works w/ Simics (free to academics) commercial OS/apps SPARC out-of-order, x86 in-order, CMPs, SMPs, & LogTM GPL release of GEMS used in seven non-Multifacet papers Simics Microbenchmarks 7/26/2016 62 Opal Detailed Processor Model Wisconsin Multifacet Project BACKUP SLIDES 7/26/2016 63 Wisconsin Multifacet Project HTM Virtualization Mechanisms Before Virtualization After Virtualization Thread Switch Paging $Eviction Abort Commit $Miss $Eviction Abort Commit $Miss UTM - - - H H H HC H H H VTM - - - S S SC S S S SWV UnrestrictedTM - - - A B B B B AS AS XTM XTM-g - - - ASC SC - SCV SCV S S SC SC SC SC AS AS PTM-Copy PTM-Select - - - SC S S H S S SC S SC S S S S S LogTM-SE - - SC - - S SC - S S Shaded = virtualization event - = handled in simple HW H = complex hardware 7/26/2016 LogTM-SE S = handled in software A = abort transaction C = copy values 64 W = walk cache V = validate read set B = block other transactions Wisconsin Multifacet Project LogTM Overview • Hardware Transactional Memory promising • But, most HTMs require a slow, complex virtualization scheme to handle cache overflow, thread switch etc. • New LogTM: Log-based Transactional Memory – Virtualization “Built-In” • Policy-Free Hardware • Simple hardware primitives • Software-accessible state – Supports Transactions with: • • • • 7/26/2016 Large memory footprints Thread switching Unbounded nesting Paging 65 Wisconsin Multifacet Project Paging Support • Why? – Support Large Transactions. • What? – Physical Relocation of Virtual Pages • How? – Update Signatures on paging activity 7/26/2016 66 Wisconsin Multifacet Project Updating Signatures Suppose: Virtual Page (VP) 0x40000 -> Physical Frame(PP) 0x1000 {0x1040,0x1080, 0x30c0} Signature A: At Page Out: Remember 0x40000->0x1000 At Page In: Suppose 0x40000->0x2000 Signature A: {0x1040,0x1080, 0x2040, 0x2080,0x30c0} 7/26/2016 67 Wisconsin Multifacet Project