Query Processing and Optimizing on SSDs Flash Group Qingling Cao qingling1220@sina.com Outline Introduction Page Layout on SSD Scan Approaches Join Algorithms Conclusion Outline Introduction Page Layout on SSD Scan Approaches Join Algorithms Conclusion Introduction • Page layout and data structure • Leverage fast random read to speed up selection、projection and join operation • Database query processing engines traditionally emphasize on sequential I/O Outline Introduction Page Layout on SSD Scan Approaches Join Algorithms Conclusion Page Layout on SSD Row Layout slot Column Layout -Attributes of one column stored in continuous pages Page Layout on SSD PAX Layout PAX Layout is efficient for SSD but not for disk. Why? Page Layout on SSD • Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be 300-400KB. Then full page size will be MB. • IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be 32-128KB. Page Layout on SSD • Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be 300-400KB. Then full page size will be MB. • IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be 32-128KB. Page Layout on SSD • Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be 300-400KB. Then full page size will be MB. • IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be 32-128KB. Page Layout on SSD • Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be 300-400KB. Then full page size will be MB. • IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be 32-128KB. Page Layout on SSD • Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be 300-400KB. Then full page size will be MB. • IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be 32-128KB. Page Layout on SSD • Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be 300-400KB. Then full page size will be MB. • IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be 32-128KB. Page Layout on SSD • Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be 300-400KB. Then full page size will be MB. • IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be 32-128KB. Page Layout on SSD • Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be 300-400KB. Then full page size will be MB. • IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be 32-128KB. Page Layout on SSD • Disk, the sequential read speed is 100MB/s. A skip takes 3-4ms. So a mini-page should be 300-400KB. Then full page size will be MB. • IDE flash drive, the sequential read bandwidth is 28MB/s. Seek time is 0.25ms, so mini-page should be 7KB. Then full page size can be 32-128KB. Outline Introduction Page Layout on SSD Scan Approaches Join Algorithms Conclusion Scan Approaches • NSMScan – Always read the whole relation. • FlashScan – Read only the related columns. e.g. select S from R where J Scan Approaches • FlashScanOPT(U) – read only the mini-pages consist the tuples needed. e.g. select S from R where J • FlashScanOPT(S) – Attributes are sorted, so the mini-pages are read at most once. Scan Approaches Table: 70m tuples, 11columns, 10GB System: Intel Core 2 Duo at 2.33GHz, 4GB of RAM Mtron 32GB SSD Outline Introduction Page Layout on SSD Scan Approaches Join Algorithms Conclusion Join Algorithms – past lessons • • • • Block Nested Loops Join Sort-Merge Join Grace Hash Join Hybrid Hash Join Join Algorithms – past lessons ☆Algorithms that tuples, stress random and avoid random Customer: 450w 730MB reads Order: ,4500w tuples, 5GB writes HDD: 5400RPM, 320GB SSD: OCZ Core series on 60GB SATA II as much as possible see bigger improvements flash Join Algorithms – RARE-join Select Name, Team from Player, Game where Player.Team=Game.Geam J1 J2 Player Blue, P:4 Green, P:3 Red, P:2 → Red, P:5 Orange, P:1 → Orange, P:6 Game Blue, G:4 Red, G:1 Orange, G:2 → Orange, G:3 Join Algorithms – RARE-join Join Index: <G:4 , P:4> <G:1 , P:2> <G:1 , P:5> <G:2 , P:1> Join Result: <G:2 , P:6> <G:3 , P:1> <G:3 , P:6> Total I/O cost: |J1|+ σ1|V1|+|J2|+ σ2|V2| <Sarah , Blue> <Julie , Red> <Alex , Red> <Ben , Orange> <Lena , Orange> <Ben , Orange> <Lena , Orange> Join Algorithms – FlashJoin id1,id2,id3 hashG, id1,id2 hashK, id3 id1,id2 hashA, id1 Read(A) hashD, id2 Read(D) Join Algorithms – Fetch Kernel Join Index: <G:4 , P:4> <G:1 , P:2> <G:1 , P:5> <G:2 , P:1> <G:2 , P:6> <G:3 , P:1> <G:3 , P:6> Join Index: <G:1 , P:2> <G:1 , P:5> <G:2 , P:1> <G:2 , P:6> <G:3 , P:1> <G:3 , P:6> <G:4 , P:4> Each page is read no more than once. Join Algorithms – Fetch Kernel Join Index: <Red, G:1, P:2> <Red, G:1, P:5> <Orange, G:2, P:1> <Orange, G:2, P:6> <Orange, G:3, P:1> <Orange, G:3, P:6> <Blue, G:4, P:4> Join Index: <Orange, G:2, P:1> <Orange, G:3, P:1> <Red, G:1, P:2> <Blue, G:4, P:4> <Red, G:1, P:5> <Orange, G:2, P:6> <Orange, G:3, P:6> Join Algorithms – FlashJoin R: 70m tuples, 10GB S: 7m tuples, 1GB System: Intel Core 2 Duo at 2.33GHz, 4GB of RAM Mtron 32GB SSD Join Algorithms – DigestJoin • Row-based • {JI, idx, idy} • Minimize the IO to fetch the join result Join Algorithms – Page Fetching(1) • • • • Sort-merge join Join results are clustered Memory is enough Fetch the pages of the tuples as soon as they are produced Join Algorithms – Page Fetching(2) • Fetching instruction table • Join candidate table Join Index: (x1,A:1,C:1) (x2,B:1,D:1) (x3,A:2,C:2) (x4,B:2,D:2) ft1={A:1, B:1, A:2, B:2} ft2={C:1, D:1, C:2, D:2} ft1={A:1, A:2, B:1, B:2} ft2={C:1, C:2, D:1, D:2} jct1={x1,x2,x3,x4} jct2={y1,y2,y3,y4} Join Algorithms – Page Fetching(3) • Join Graph G=(V1 ∪ V2, E) E V1 V2 • Segment e.g. {1, a, b, c}, {a, 1, 2} Join Algorithms – Page Fetching(3) • Required storage size(RSS) • Required cache size(RCS) • <join_atrr,tid1,tid2> Outline Introduction Page Layout on SSD Scan Approaches Join Algorithms Conclusion Conclusion PAX: • • • • • Scan algorithm has little room for improvement. RARE-Join、FlashJoin. No write. Join index will be sorted many times. The size of minipage is not fixed. Conclusion Row: • DigestJoin. • IO is much more than other join algorithms. Column: • None • Storage is more flexible. • Utilize the technology of tuple reconstruction.