MetaTM & TxLinux

advertisement
MetaTM &
TxLinux
Hany Ramadan, Christopher Rossbach, Donald Porter,
Owen Hofmann, Aditya Bhandari, Emmett Witchel
University of Texas at Austin
TM Background


Transactional programming is an
emerging alternative to locks

Avoids problems such as deadlock

Avoids performance-complexity tradeoffs
HTM holds the promise of

simpler programming and

good performance
TM: “What’s the OS got to do with it?”

Lack of realistic workloads
(counter, splash-2)

Will current results hold on real programs?

Unclear design tradeoffs; Feature set unsettled

OS is a real-life, parallel workload

OS will benefit from transactions


Reduces synchronization complexity

System-call and interrupt control paths will benefit
Architectural support is needed for OS
Average Transaction Count
3,500,000
Average Tx/Benchmark
3,000,000
2,500,000
2,000,000
1,500,000
1,000,000
500,000
0
Other TMs
Neares t TM
MetaTM
Outline

TxLinux

MetaTM

Goals

Features

Interrupt handling

Issue: Stack memory

Experimental results
TxLinux 2.6.16.1

Converted ~30% of dynamic synchronization to transactions
Sequence
locks
Spin-locks
Directory
cache
IP routing
Pathname
translation
Socket
locking
File system
Networking
RCU
(read-copyupdate)
Slab
allocator
Zone
Various MM
allocator structures
Memory management
MetaTM: Design goals


HTM model co-designed with TxLinux

Extensions to x86 ISA

Architectural support for OS

Execution-driven simulation
A platform for TM research

Multiple HTM design points

Eager & lazy version management

Eager conflict detection
MetaTM: Model features
Tx demarcation
xbegin
xend
Multiple Tx
xpush
xpop
Contention
management
polite
karma
eruption
timestamp
polka
sizematters
exponential
linear
random
(eager)
Backoff policy
Version
management
commit cost
(lazy)
abort cost
(eager)
TxLinux: Interrupt handling




Question: What happens to active tx on
an interrupt?
Interrupt handlers allowed to use
transactions
Factors weighing against abort

Transaction length growing

Interrupt frequency
Answer: Active transactions are
suspended on interrupt
MetaTM: Multiple Tx support

Multiple active transactions on a processor



At most one running, all others are suspended
Interface

xpush suspends current transaction

xpop resumes suspended transaction

Suspended transactions maintained in LIFO order
New execution context is unrelated to old one

Same conflict semantics with all other transactions

May start new transactions
Outline

TxLinux

MetaTM

Goals

Features

Interrupt handling

Issue: Stack memory

Experimental results
Issue: Stack memory

Transactions can span stack frames

Why: Retain same flexibility as locks

Problem: Live stack overwrite (correctness)

Solution: Stack Pointer Checkpoint
foo()
{
atomic
{
}
}
foo()
{
bar()
baz()
}
bar() { xbegin }
baz() { xend }
Live stack overwrite
0xC0
StkPtr
0x80
foo
locals
foo+4: call bar
foo+8: <work>
foo+12:xend
Error: invalid
bar
do_IRQ
return address
intr
foo+8
state
bar+0: xbegin
bar+4: ret
0x40
do_irq: iret
0x00
Tx Reg. Checkpoint
PC: bar+4
StkPtr: 0x40
Conflict
(other regs..)

Only interrupts that arrive in kernel mode have this problem
Live stack overwrite, fixed
0xC0
StkPtr
0x80
0x40
foo
bar
locals
foo+8
do_irq
intr state
0x00
foo+4: call bar
foo+8: <work>
foo+12:xend
bar+0: xbegin
bar+4: ret
do_irq: iret
Tx Reg. Checkpoint
PC: bar+4
StkPtr: 0x40
Conflict
(other regs..)

Fixed by setting ESP to Checkpointed ESP on interrupt
Outline

TxLinux

MetaTM

Goals

Features

Interrupt handling

Issue: Stack memory

Experimental results
Experiments



Setup
Workloads
System characteristics




Execution time
Transaction rates
Transaction origins
Studies


Contention management
Commit & Abort penalties
Setup

Simics 3.0.17

8-processor, x86 system (1 Ghz)

Memory hierarchy


L1: sep D/I, 16KB, 4-way, 1-cycle hit

L2: 4MB, 8-way, 16-cycle hit, MESI protocol

Main memory: 1GB, 200-cycle hit
Other devices

Disk device (DMA, 5.5ms latency)

Tigon3 gigabit nic (DMA,0.1ms latency)
Workloads to exercise TxLinux

counter


shared counter microbenchmark (8 threads)
Runs make -j 8 to
compile files from
libFLAC 1.1.2
netcat

streams data over TCP
network conn.
MAB

pmake




configure


simulates software
development file
system workloads
8 instances of
configure for tetex
find

8 instances of find on
a 78MB directory
searching for text
Note: Only TxLinux creates transactions
Kernel Execution Time
Kernel Execution Time (s)
14
12
Linux
10
TxLinux
8
6
4
2
0
%Kern. time

counter
counter
pmake
netcat
MAB
config
find
91%
13%
54%
57%
43%
50%
High kernel time justifies transactions in the OS
Transaction Rates
449,322
1,000,000
Transactions / Sec
100,000
32,486
182,072
121,808
16,635
10,000
1,000
100
10
1
pmake
Restart Rate

2.6%
netcat
MAB
config
find
3.1%
1.7%
2.1%
10.2%
Find workload has highest contention in TxLinux
Transaction Origins
% Transactions
100
80
60
40
20
0
pmake
netcat
MAB
config
find
System calls
Interrupts, kthreads

Kernel locks accessed from both system call and
interrupt handling contexts
Contention Management Study
4.00
counter
counter
pmake
Normalized Restart Rate .
3.50
netcat
MAB
3.00
conf igure
f ind
2.50
2.00
1.50
1.00
0.50
0.00
eruption
karma
kindergarten
polka
size matters
timestamp
Policy

Polka best performer, but complex to implement; SizeMatters viable

Stall-on-conflict – reduces conflicts, but not always performance
Commit & Abort Study
Time
Kernel
Normalized
Relative System
Time
3
0
Commit Cost
2.5
100
1,000
2
10,000
1.5
1
0.5
0
Kernel
Normalized
Relative System
TimeTime
counter
pmake
1.25
1.20
1.15
1.10
1.05
1.00
netcat
MAB
configure
find
0
Abort Cost
100
1,000
10,000
0.95
0.90
0.85
0.80
0.75
counter
pmake
netcat
MAB
configure
find

Performance sensitive to commit penalty, not abort

Confirms benefit of eager version management (fast commits)
Related Work

TM Models


Suspension techniques


Escape actions [Zilles06] – can’t start tx
Interrupt handling


TCC [Hammond04], UTM [Anaian05],
LogTM [Moore06], VTM [Rajwar05]
XTM [Chung06] – also tries to avoid aborts
Contention management

Scherer & Scott [PODC’05] – in STM context
Conclusions

TM needs realistic workloads


OS needs TM


TxLinux the largest TM benchmark
Complex synchronization; large % of runtime
Building & running TxLinux reveals much



Architectural support needed (Tx suspension)
Contention management is important
Cost studies confirm fast commits
… more in the paper
Download