ppt

advertisement
Query Processing and Optimizing on SSDs
Flash Group
Qingling Cao qingling1220@sina.com
Outline
Introduction
Page Layout on SSD
Scan Approaches
Join Algorithms
Conclusion
Outline
Introduction
Page Layout on SSD
Scan Approaches
Join Algorithms
Conclusion
Introduction
• Page layout and data structure
• Leverage fast random read to speed up
selection、projection and join operation
• Database query processing engines traditionally
emphasize on sequential I/O
Outline
Introduction
Page Layout on SSD
Scan Approaches
Join Algorithms
Conclusion
Page Layout on SSD
Row Layout
slot
Column Layout
-Attributes of one column stored in continuous pages
Page Layout on SSD
PAX Layout
PAX Layout is efficient for SSD but not for
disk. Why?
Page Layout on SSD
• Disk, the sequential read speed is 100MB/s. A skip
takes 3-4ms. So a mini-page should be 300-400KB.
Then full page size will be MB.
• IDE flash drive, the sequential read bandwidth is
28MB/s. Seek time is 0.25ms, so mini-page should be
7KB. Then full page size can be 32-128KB.
Page Layout on SSD
• Disk, the sequential read speed is 100MB/s. A skip
takes 3-4ms. So a mini-page should be 300-400KB.
Then full page size will be MB.
• IDE flash drive, the sequential read bandwidth is
28MB/s. Seek time is 0.25ms, so mini-page should be
7KB. Then full page size can be 32-128KB.
Page Layout on SSD
• Disk, the sequential read speed is 100MB/s. A skip
takes 3-4ms. So a mini-page should be 300-400KB.
Then full page size will be MB.
• IDE flash drive, the sequential read bandwidth is
28MB/s. Seek time is 0.25ms, so mini-page should be
7KB. Then full page size can be 32-128KB.
Page Layout on SSD
• Disk, the sequential read speed is 100MB/s. A skip
takes 3-4ms. So a mini-page should be 300-400KB.
Then full page size will be MB.
• IDE flash drive, the sequential read bandwidth is
28MB/s. Seek time is 0.25ms, so mini-page should be
7KB. Then full page size can be 32-128KB.
Page Layout on SSD
• Disk, the sequential read speed is 100MB/s. A skip
takes 3-4ms. So a mini-page should be 300-400KB.
Then full page size will be MB.
• IDE flash drive, the sequential read bandwidth is
28MB/s. Seek time is 0.25ms, so mini-page should be
7KB. Then full page size can be 32-128KB.
Page Layout on SSD
• Disk, the sequential read speed is 100MB/s. A skip
takes 3-4ms. So a mini-page should be 300-400KB.
Then full page size will be MB.
• IDE flash drive, the sequential read bandwidth is
28MB/s. Seek time is 0.25ms, so mini-page should be
7KB. Then full page size can be 32-128KB.
Page Layout on SSD
• Disk, the sequential read speed is 100MB/s. A skip
takes 3-4ms. So a mini-page should be 300-400KB.
Then full page size will be MB.
• IDE flash drive, the sequential read bandwidth is
28MB/s. Seek time is 0.25ms, so mini-page should be
7KB. Then full page size can be 32-128KB.
Page Layout on SSD
• Disk, the sequential read speed is 100MB/s. A skip
takes 3-4ms. So a mini-page should be 300-400KB.
Then full page size will be MB.
• IDE flash drive, the sequential read bandwidth is
28MB/s. Seek time is 0.25ms, so mini-page should be
7KB. Then full page size can be 32-128KB.
Page Layout on SSD
• Disk, the sequential read speed is 100MB/s. A skip
takes 3-4ms. So a mini-page should be 300-400KB.
Then full page size will be MB.
• IDE flash drive, the sequential read bandwidth is
28MB/s. Seek time is 0.25ms, so mini-page should be
7KB. Then full page size can be 32-128KB.
Outline
Introduction
Page Layout on SSD
Scan Approaches
Join Algorithms
Conclusion
Scan Approaches
• NSMScan – Always read the whole relation.
• FlashScan – Read only the related columns.
e.g. select S from R where J
Scan Approaches
• FlashScanOPT(U) – read only the mini-pages
consist the tuples needed.
e.g. select S from R where J
• FlashScanOPT(S) – Attributes are sorted, so
the mini-pages are read at most once.
Scan Approaches
Table: 70m tuples, 11columns, 10GB
System: Intel Core 2 Duo at 2.33GHz, 4GB of RAM
Mtron 32GB SSD
Outline
Introduction
Page Layout on SSD
Scan Approaches
Join Algorithms
Conclusion
Join Algorithms – past lessons
•
•
•
•
Block Nested Loops Join
Sort-Merge Join
Grace Hash Join
Hybrid Hash Join
Join Algorithms – past lessons
☆Algorithms
that tuples,
stress random
and avoid
random
Customer: 450w
730MB reads
Order: ,4500w
tuples,
5GB writes
HDD:
5400RPM,
320GB
SSD: OCZ
Core series on
60GB
SATA II
as
much
as possible
see bigger
improvements
flash
Join Algorithms – RARE-join
Select Name, Team from Player, Game where Player.Team=Game.Geam
J1
J2
Player
Blue, P:4
Green, P:3
Red, P:2 → Red, P:5
Orange, P:1 → Orange, P:6
Game
Blue, G:4
Red, G:1
Orange, G:2 → Orange, G:3
Join Algorithms – RARE-join
Join Index:
<G:4 , P:4>
<G:1 , P:2>
<G:1 , P:5>
<G:2 , P:1>
Join Result:
<G:2 , P:6>
<G:3 , P:1>
<G:3 , P:6>
Total I/O cost: |J1|+ σ1|V1|+|J2|+ σ2|V2|
<Sarah , Blue>
<Julie , Red>
<Alex , Red>
<Ben , Orange>
<Lena , Orange>
<Ben , Orange>
<Lena , Orange>
Join Algorithms – FlashJoin
id1,id2,id3
hashG, id1,id2
hashK, id3
id1,id2
hashA, id1
Read(A)
hashD, id2
Read(D)
Join Algorithms – Fetch Kernel
Join Index:
<G:4 , P:4>
<G:1 , P:2>
<G:1 , P:5>
<G:2 , P:1>
<G:2 , P:6>
<G:3 , P:1>
<G:3 , P:6>
Join Index:
<G:1 , P:2>
<G:1 , P:5>
<G:2 , P:1>
<G:2 , P:6>
<G:3 , P:1>
<G:3 , P:6>
<G:4 , P:4>
Each page is read no more than once.
Join Algorithms – Fetch Kernel
Join Index:
<Red, G:1, P:2>
<Red, G:1, P:5>
<Orange, G:2, P:1>
<Orange, G:2, P:6>
<Orange, G:3, P:1>
<Orange, G:3, P:6>
<Blue, G:4, P:4>
Join Index:
<Orange, G:2, P:1>
<Orange, G:3, P:1>
<Red, G:1, P:2>
<Blue, G:4, P:4>
<Red, G:1, P:5>
<Orange, G:2, P:6>
<Orange, G:3, P:6>
Join Algorithms – FlashJoin
R: 70m tuples, 10GB S: 7m tuples, 1GB
System: Intel Core 2 Duo at 2.33GHz, 4GB of RAM
Mtron 32GB SSD
Join Algorithms – DigestJoin
• Row-based
• {JI, idx, idy}
• Minimize the IO to fetch the join result
Join Algorithms – Page Fetching(1)
•
•
•
•
Sort-merge join
Join results are clustered
Memory is enough
Fetch the pages of the tuples as soon as they
are produced
Join Algorithms – Page Fetching(2)
• Fetching instruction table
• Join candidate table
Join Index: (x1,A:1,C:1) (x2,B:1,D:1)
(x3,A:2,C:2) (x4,B:2,D:2)
ft1={A:1, B:1, A:2, B:2} ft2={C:1, D:1, C:2, D:2}
ft1={A:1, A:2, B:1, B:2} ft2={C:1, C:2, D:1, D:2}
jct1={x1,x2,x3,x4}
jct2={y1,y2,y3,y4}
Join Algorithms – Page Fetching(3)
• Join Graph G=(V1 ∪ V2, E) E  V1  V2
• Segment e.g. {1, a, b, c}, {a, 1, 2}
Join Algorithms – Page Fetching(3)
• Required storage size(RSS)
• Required cache size(RCS)
• <join_atrr,tid1,tid2>
Outline
Introduction
Page Layout on SSD
Scan Approaches
Join Algorithms
Conclusion
Conclusion
PAX:
•
•
•
•
•
Scan algorithm has little room for improvement.
RARE-Join、FlashJoin.
No write.
Join index will be sorted many times.
The size of minipage is not fixed.
Conclusion
Row:
• DigestJoin.
• IO is much more than other join algorithms.
Column:
• None
• Storage is more flexible.
• Utilize the technology of tuple reconstruction.
Download