Document 10410064

advertisement
A Lightweight Transac0onal Design LightTx: in Flash-­‐based SSDs to Support Flexible Transac0ons
Youyou Lu1, Jiwu Shu1, Jia Guo1, Shuai Li1, Onur Mutlu2
1Tsinghua University 2Carnegie Mellon University
ExecuHve Summary
•  Problem: Flash-­‐based SSDs support transacHons naturally (with out-­‐of-­‐place updates) but inefficiently: –  Only a limited set of isolaHon levels are supported (inflexible) –  IdenHfying transacHon status is costly (heavyweight) •  Goal: a lightweight design to support flexible transacHons •  ObservaHons and Key Ideas: –  Simultaneous updates can be wriQen to different physical pages, and the FTL mapping table determines the ordering => (Flexibility) make commit protocol page-­‐independent –  TransacHons have birth and death, and the near-­‐logged update way enables efficient tracking => (Lightweight) track recently updated flash blocks, and reHre the dead transacHons •  Results: up to 20.6% performance improvement, stable GC overhead, fast recovery with negligible persistence overhead 2
SSD Basics
Flash
Memory
Pkg #3
Die 0
Plane 1
Flash
Memory
Pkg #2
FTL
Plane 0
H/W
Inter
face
Flash
Memory
Pkg #1
Plane 0
Host
Interconnect
Flash
Memory
Pkg #0
Plane 1
•  FTL (Flash TranslaHon Layer) •  Address mapping, garbage collecHon, wear leveling •  Out-­‐of-­‐place Update (address mapping) •  Pages are updated to new physical pages instead of overwriHng original pages •  Internal Parallelism •  New pages are allocated from different pkgs/planes •  Page metadata (OOB): (4096 + 224)Bytes Die 1
Flash Memory Pkg
3
Two ObservaHons
Flash
Memory
Pkg #3
Die 0
Plane 1
Flash
Memory
Pkg #2
FTL
Block
Plane 0
H/W
Inter
face
Flash
Memory
Pkg #1
Plane 0
Host
Interconnect
Flash
Memory
Pkg #0
Plane 1
•  Simultaneous updates and FTL ordering •  (Out-­‐of-­‐place update) pages for the same LBA can be updated simultaneously •  (Ordering in mapping table) Only when the mapping table is updated, the write is visible to the external •  Near-­‐logged update way •  Pages are allocated from blocks over different parallel units •  Pages are sequenHally allocated from each block Die 1
Flash Memory Pkg
4
Outline
•  ExecuHve Summary •  Background •  TradiHonal Sodware TransacHons •  ExisHng Hardware TransacHons •  LightTx Design •  EvaluaHon •  Conclusions 5
TradiHonal S/W TransacHon
•  TransacHon: Atomicity and Durability •  Sodware TransacHon –  Duplicate writes –  SynchronizaHon for ordering Logical View
Logical View
Data Area
Data Area
Log Area
HDD
Log Area
FTL Mapping Table
SSD
6
We have both old and new versions in the SSD (out-­‐of-­‐place update). Why shall we write the log? Why not support transacHons inside the SSD? 7
ExisHng H/W Approaches
•  Atomic-­‐Write [HPCA’11] –  Log-­‐structured FTL –  Commit protocol: Tag the last page “1”, while the others “0” –  Limited Parallelism: one tx at a Hme –  High mapping persistence overhead: persistence on each commit •  SCC/BPCC (Cyclic commit protocols) [OSDI’08] –  Commit Protocol: Link all flash pages in a cyclic list by keeping pointers in page metadata –  High overhead in differenHate broken cyclic lists for par$al erased commi.ed txs and aborted txs –  SCC forces aborted pages erased before wriHng the new one –  BPCC delays the erase of pages to its previous aborted pages are erased –  Limited Parallelism: txs without overlapped accesses are allowed [HPCA’11] Beyond block i/o: Rethinking tradiHonal storage primiHves
[OSDI’08] TransacHonal flash
8
Problems: •  Tx support is inflexible (limited parallelism) –  Cannot meet the flexible demands from sodware –  Cannot fully exploit the internal parallelism of SSDs •  Tx state tracking causes high overhead in the device Our Goal: A lightweight design to support flexible transacHons
9
Outline
•  ExecuHve Summary •  Background •  LightTx Design •  Design Overview •  Page Independent Commit Protocol •  Zone-­‐based TransacHon State Tracking •  EvaluaHon •  Conclusions 10
Goal
A lightweight design to support flexible transacHons
Flexible •  Page-­‐independent commit protocol: support simultaneous updates, to enable flexible isolaHon level choices in the system Lightweight •  Zone-­‐based transacHon state tracking scheme: track only blocks that have live txs and reHre the dead ones, to reduce lower the cost 11
Page-­‐independent Commit Protocol
•  ObservaHons: –  Simultaneous Updates (Out-­‐of-­‐place update) –  Version order (FTL mapping table) •  How to support this? A
B
C
D
Page
E
1
2
–  Extend the storage 3
interface Version –  Make commit protocol number
page-­‐independent 12
Design Overview
Applications
Database Systems
File Systems
READ, WRITE,
BEGIN, COMMIT, ABORT
FTL Mapping Table
Active TxTable
Commit
Logic
Recovery
Logic
Garbage
Collection
Data Area
•  TransacHon PrimiHve Read/Write
Cache
Wear
Levelling
FTL
Free Blocks and
Zone Mgmt.
Mapping
Table
Flash
Media
–  BEGIN(TxID) – 
COMMIT(TxID
) –  ABORT(TxID) –  WRITE(TxID, LBA, len …)
13
Page-­‐independent Commit Protocol
LPN
TxID
•  TransacHonal metadata: <TxID, TxCnt, TxVer> TxCnt
ECC
Page Metadata
(OOB)
Page Data
A
–  TxID B
–  TxCnt: (00…0N) T0, T0,
0
–  TxVer: commit sequence 1 0
•  Keep it in the page metadata of each flash page TxVer
T1,
2
0
T1,
2
3
T3,
1
Version number
C
T0,
0
Page
D
E
T0,
0
T0,
5
14
Zone-­‐based TransacHon State Tracking
•  TransacHon LifeHme
BEGIN
COMMIT/ABORT
CHECKPOINT
ERASED
Completed
Active
Live
Dead
–  ReHre the dead: write back the mapping table, and remove the dead from tx state maintenance Can we write back the mapping back for each commit? -­‐  Ordering cost (waiHng for mapping table persistence) -­‐  Mapping persistent is not atomic
•  Writes appended in the free flash blocks
–  Track the recently updated flash blocks
15
•  Block Zones –  Free block: all pages are free –  Available block: pages are available for allocaHon –  Unavailable block: all pages have been wriQen to but some pages belong to (1) a live tx, or (2) a dead tx but has at least one page in some available block –  Checkpointed block: all pages have been wriQen to and all pages belong to dead txs •  RespecHvely, we have Free, Available, Unavailable and Checkpointed Zones. 16
•  Checkpoint –  Periodically write back the mapping table (making the txs dead) –  And, sliding the zones (available + unavailable) •  Zone Sliding –  Check all blocks in available and unavailable zones •  Move the block to the checkpointed zone if the block is checkpointed •  Move the block to the unavailable zone if the block is unavailable –  Pre-­‐allocate free blocks to the available zone –  Garbage collecHon is only performed on the checkpointed zone
17
(1) Available Zone UpdaHng
Tx3
Tx4
Tx5
2-­‐0
4-­‐0
6-­‐1
6-­‐2
2-­‐2
7-­‐0
2-­‐3
4-­‐1
3-­‐3
5-­‐0
5-­‐1
Tx4, 0
Tx4, 0
Tx5, 0
Tx3, 0
Tx4, 0
Tx3, 0
Tx3, 0
Tx3, 4
Tx5, 0
Tx5, 0
Tx4, 0
18
(2) Zone Sliding
Tx3
Tx4
Tx5
Tx6
Tx7
Tx8
2-­‐0
4-­‐0
6-­‐1
6-­‐2
2-­‐2
7-­‐0
2-­‐3
4-­‐1
3-­‐3
5-­‐0
5-­‐1
Abort Tx5
4-­‐2
4-­‐3
Tx9
7-­‐3
Tx3, 0
Tx3, 0
Tx3, 4
Tx7, 0
Tx7, 2
Tx4, 0
Tx4, 0
Tx3, 0
Tx4, 0
Tx6, 0
Tx6, 0
Tx4, 0
Tx8, 0
Tx8, 0
Tx4, 5
Tx8, 3
Tx9, 0
Tx5, 0
Tx5, 0
Tx5, 0
Tx9, 0
Tx5, 0
6-­‐3
8-­‐0
7-­‐1
7-­‐2
5-­‐2
9-­‐1
9-­‐0
19
(3) Zone Sliding
Tx7
Tx3
Tx8
Tx4
Tx5
Tx6
2-­‐0
4-­‐0
6-­‐1
6-­‐2
2-­‐2
7-­‐0
2-­‐3
4-­‐1
3-­‐3
5-­‐0
5-­‐1
Abort Tx5
4-­‐2
4-­‐3
Tx9
7-­‐3
Tx3, 0
Tx3, 0
Tx3, 4
Tx7, 0
Tx7, 2
Tx4, 0
Tx4, 0
Tx3, 0
Tx4, 0
Tx6, 0
Tx6, 0
Tx4, 0
Tx8, 0
Tx8, 0
Tx4, 5
Tx8, 3
Tx9, 0
Tx5, 0
Tx5, 0
Tx5, 0
Tx9, 0
Tx5, 0
6-­‐3
8-­‐0
7-­‐1
7-­‐2
5-­‐2
9-­‐1
9-­‐0
20
(4) System Failure
Tx3
Tx4
Tx5
Tx6
Tx7
Tx8
2-­‐0
4-­‐0
6-­‐1
6-­‐2
2-­‐2
7-­‐0
2-­‐3
4-­‐1
3-­‐3
5-­‐0
5-­‐1
Abort Tx5
4-­‐2
4-­‐3
Tx9
7-­‐3
Tx10
Tx11
6-­‐3
8-­‐0
7-­‐1
7-­‐2
9-­‐0
5-­‐2
9-­‐1
9-­‐2
11-­‐0
11-­‐1
10-­‐0
8-­‐1
Tx3, 0
Tx3, 0
Tx3, 4
Tx7, 0
Tx7, 2
Tx10, 0
Tx4, 0
Tx4, 0
Tx3, 0
Tx4, 0
Tx6, 0
Tx6, 0
Tx4, 0
Tx8, 0
Tx8, 0
Tx4, 5
Tx8, 3
Tx9, 0
Tx9, 0
Tx11, 0
Tx11, 3
Tx5, 0
Tx5, 0
Tx5, 0
Tx9, 0
Tx5, 0
Tx11, 0
21
Recovery
Tx7
•  Scan the available zone •  Scan the unavailable zone
Tx8
Tx9
Tx10
Tx11
8-­‐0
6-­‐3
9-­‐0
7-­‐1
7-­‐2
9-­‐1
9-­‐2
5-­‐2
11-­‐0
11-­‐1
10-­‐0
8-­‐1
Tx3, 0
Tx4, 0
Tx6, 0
Tx6, 0
Tx3, 0
Tx3, 0
Tx3, 4
Tx7, 0
Tx7, 2
Tx10, 0
Tx5, 0
Tx5, 0
Tx9, 0
Tx5, 0
Tx4, 0
Tx8, 0
Tx8, 0
Tx4, 5
Tx8, 3
Tx9, 0
Tx9, 0
Tx11, 0
Tx11, 3
Tx11, 0
22
•  Recovery –  Scan the available zone •  If TxCnt matches, completed tx •  If not, add the tx to the pending list –  Scan the unavailable zone •  If TxID in the pending list, check TxCnt again. If TxCnt matches, completed tx •  If txID not in the pending list, discard it •  If TxCnt sHll doesn’t match, uncompleted tx –  Replay with the sequence of TxVer
23
Outline
• 
• 
• 
• 
• 
ExecuHve Summary Background LightTx Design EvaluaHon Conclusions 24
Experimental Setup
•  SSD simulator –  SSD add-­‐on from Microsod on DiskSim –  Parameters from Samsung K9F8G08UXM NAND flash •  Trace –  TPC-­‐C benchmark: DBT2 on PostgreSQL 8.4.10
25
Flexibility
(1)  For a given isolaHon level, LightTx provides as good or beQer tx throughput than other protocols. (2)  In LightTx, no-­‐page-­‐conflict and serializaHon isolaHon improve throughput by 19.6% and 20.6% over strict isolaHon.
26
A bort R a tio (0% )
A bort R a tio (10% )
A bort R a tio (20% )
A bort R a tio (50% )
T ra ns a c tion T hroug hput (tx s /s )
600
500
400
300
200
100
0
A tom ic -­‐W rite
SCC
BPC C
T ra ns a c tion P rotoc ol
L ig htT x
N orm a liz e d G a rba g e C olle c tion C os t
Garbage CollecHon Cost 45
A bort R a tio (0% )
A bort R a tio (10% )
A bort R a tio (20% )
A bort R a tio (50% )
40
35
30
25
20
15
10
5
0
A tom ic -­‐W rite
SCC
BPC C
T ra ns a c tion P rotoc ol
L ig htT x
(1)  LightTx significantly outperforms SCC/BPCC when abort raHo is not zero. (2) Garbage collecHon overhead in SCC/BPCC goes extremely high when abort raHo goes up.
27
8
R e c ove ry T im e (s e c onds )
7
A tom ic -­‐W rite
S C C /B P C C
L ig htT x
6
0.30
0.25
0.20
0.15
0.10
0.05
0.00
4M
16M
64M 256M S iz e of the A va ila ble Z one (byte )
1G
Ma pping P e rs is te nc e O ve rhe a d (% )
Recovery Time and Persistence Overhead
5
4
A tom ic -­‐W rite
S C C /B P C C
L ig htT x
3
2
1
0
4M
16M
64M 256M S iz e of the A va ila ble Z one (byte )
1G
LightTx achieves fast recovery with low mapping persistence overhead.
28
Outline
• 
• 
• 
• 
• 
ExecuHve Summary Background LightTx Design EvaluaHon Conclusions 29
Conclusion
•  Problem: Flash-­‐based SSDs support transacHons naturally (with out-­‐of-­‐place updates) but inefficiently: –  Only a limited set of isolaHon levels are supported (inflexible) –  IdenHfying transacHon status is costly (heavyweight) •  Goal: a lightweight design to support flexible transacHons •  ObservaHons and Key Ideas: –  Simultaneous updates can be wriQen to different physical pages, and the FTL mapping table determines the ordering => (Flexibility) make commit protocol page-­‐independent –  TransacHons have birth and death, and the near-­‐logged update way enables efficient tracking => (Lightweight) track recently updated flash blocks, and reHre the dead transacHons •  Results: up to 20.6% performance improvement, stable GC overhead, fast recovery with negligible persistence overhead 30
Thanks
A Lightweight Transac0onal Design LightTx: in Flash-­‐based SSDs to Support Flexible Transac0ons
Youyou Lu1, Jiwu Shu1, Jia Guo1, Shuai Li1, Onur Mutlu2
1Tsinghua University 2Carnegie Mellon University
31
Backup Slides
32
ExisHng Approaches (1)
+ No logging, no commit record + No tx state maintenance cost –  Log-­‐structured FTL –  TransacHon state: <00…1> -­‐ Poor parallelism -­‐ Mapping persistence overhead •  Atomic Writes [HPCA’11] Beyond block i/o: Rethinking tradiHonal storage primiHves
33
ExisHng Approaches (2)
•  SCC/BPCC (Cyclic commit protocol) –  Use pointers in the page metadata to put all pages in a cycle for each tx TransacHonal flash
[OSDI’08] A
B
C
D
Page
E
1
2
3
4
Version number
34
•  SCC A
–  Block erase forced for aborted pages B
C
D
Page
E
1
-­‐ Low garbage collecHon efficiency: lots of data moves due to forced block erase 2
3
4
Version number
35
A
•  BPCC –  SRS: Straddle Responsibility Set –  Erasable only ader SRS is empty
-­‐ Complex and costly SRS updates -­‐ Low garbage collecHon efficiency: wait unHl SRS is empty
B
C
D
Page
E
1
2
3
4
Version number
SRS(D3) = {C2, D2} SRS(E3) = {D2} SRS(D4) = {C2, D2, D3} 36
Atomic Writes
SCC/BPCC
+ No logging, no commit + No logging, no commit record record + No tx state + Improved parallelism maintenance overhead -­‐ Limited parallelism -­‐ Poor parallelism -­‐ High tx state -­‐ Mapping persistence maintenance overhead overhead Problems: •  Tx support is inflexible (limited parallelism) –  Cannot meet the flexible demands from sodware –  Cannot fully exploit the internal parallelism of SSDs •  Tx state tracking causes high overhead in the device A lightweight design to support flexible transacHons
37
Download