The Rebirth of Database Machines Dina Bitton Jim Gray 1 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt Outline • • • • Active Disks are coming Disk Tutorial (not presented, but slides in deck) Disk Arms are important (optimize them) The Rebirth of Database Machines 2 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt Disks of 30 Years Ago • 10 MB • Failed every few weeks • Cost more than 400$ 3 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt Disk Arrays • 24 cpus • 384 disks • More mips in the disks than in the cpus 4 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt Year 2003 Disks • Big disk (10 $/GB) – – – – 3” 200 GB 150 kaps (k accesses per second) 30 MBps sequential • Small disk (20 $/GB) – – – – 2” 40 GB 100 kaps 20 MBps sequential • Both running DBMS, Mail, Web, and OS Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt 5 • From CMU Active Disk web site http://www.pdl.cs.cmu.edu/Active/ 6 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt Research Problem: When every disk is a super-computer… And there are thousands of them... • Who manages data placement? • Query plans among 1,000 severs? • How does – mirroring work? – backup work? • Where does my program run? 7 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt Relevant University Research on Active Disks • Kim Keeton & Dave Patterson @ UC Berkeley http://www.cs.berkeley.edu/~pattrsn/talks/sigmod98-keynote.ppt • Erik Riedel & Garth Gibson @ CMU http://www.pdl.cs.cmu.edu/Active/ • Mike Franklin @ U Maryland http://www.cs.umd.edu/projects/bdisk • Anurag Acharya, Mustafa Uysal @ UC SB http://www.cs.ucsb.edu/TRs/TRCS98-06.html 8 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt Outline • • • • Active Disks are coming Disk Tutorial (not presented, but slides in deck) Disk Arms are important The Rebirth of Database Machines 9 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt Disk Access Time • Access time = SeekTime + RotateTime + ReadTime • Rotate time: 6 ms 3 ms 1 ms – 5,000 to 10,000 rpm • ~ 12 to 6 milliseconds per rotation • ~ 6 to 3 ms rotational latency 12 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt Disk Access Time Improves Slowly • Access time = SeekTime + RotateTime + ReadTime • Other useful facts: 6 ms 3 ms 1 ms 8%/y 8%/y 40%/y – Power rises more than size3 (small is indeed beautiful) – Small devices are more rugged – Small devices can use plastics (forces are much smaller) e.g. bugs fall without breaking anything 13 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt Disk Seek Time • Seek time is ~ Sqrt(distance) (distance = 1/2 acceleration x time2) • Specs assume seek is 1/3 of disk • Short seeks are common. (over 50% are zero length) • Typical 1/3 seek time: 6 ms • 4x improvement in 20 years. time 14 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt Disk Access Ratios Have Changed • Key metrics: $/GB Kaps/GB (KB accesses per second per GB) SCAN: time to scan the disk • Scan going from minutes to days • Disk arms are precious resource (disk capacity is no longer the precious resource) Kaps/GB went from 500 to 7 and going to 1 year 1988 1998 2003 Capacity kaps/ Scan Scan GB $/GB kaps GB Sequential Random 0.25 20,000 30 1200 2 minutes 20 minutes 18 50 120 7 20 minutes 5 hrs 200 5 200 1 2 hrs 1.2 days Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt 20 Stripe For More Bandwidth • N-stores have N-times the bandwidth • Works great! • Supported by most file systems 21 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt Mirrors: Replicate Stores for Availability • • • • Read one, write all If one fails, rebuild from survivor Run scrubber in background to fix faults N-replicas can give N-times the bandwidth • UnAvailabity ~ MTTF MTTF N 1day 50 years 2 1,000 ,000 years A Million Years!!! 22 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt RAID5: Parity Saves Storage Space • Mirrors: 50% storage overhead – read one, write both • RAID5: 12% Storage overhead: – read one, write one plus parity PARITY 23 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt Interesting Fact: Mirrored Disks Optimize Disk Arms seek is min seek. • Doubles write cost (write both) – Write time increases because seek is max seek. Seek Distance vs Disks 0.8 Fraction of disk surface for seek • Doubles read bandwidth Sequential: Read stagger reads from each drive (stripe) Random: Read closest arm 0.7 0.6 0.5 Write Seek 0.4 0.3 Read Seek 0.2 0.1 0 1 2 3 4 5 6 7 8 Number of disks Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt 24 If Mix Reads & Writes Mirror is Better Than Partition • 2 servers are better than one • Benefit is better than 2x write cost if reads writes Fraction of disk surface for seek Normalized Seek Time Write 2.0 25% Read 1.5 50% read 1.0 75% read 0.5 Read 0.0 1 2 3 4 5 6 7 8 Number of disks 25 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt What if you have LOTS of Disks • When you have BIG disks (200 GB), arms are precious, space is cheap. • If you replicate 1000x – write seek time asymptotically approaches 1.7x – read seek time asymptotically approaches zero. Distance to Seek 1000 Time to Seek Time to Seek 900 Write 700 600 2 2 Write 1.8 1.8 1.6 1.6 1.4 400 1.2 time 500 300 1 0.8 200 100 Read 0 1 10 100 Write 1.4 time 800 1.2 1 0.8 0.6 0.6 0.4 0.4 0.2 0.2 0 1000 0 1 2 3 4 5 6 disks 7 Read 0 8 Read 9 10 200 400 600 disks Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt 800 1000 26 Outline • • • • Active Disks are coming Disk Tutorial (not presented, but slides in deck) Disk Arms are important The Rebirth of Database Machines 27 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt The Rebirth of Database Machines Dina Bitton IDS Jim Gray Microsoft 28 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt Outline • Performance hungry databases • History: life and death of database machines • What has changed that can make database machines work today • Shared-Nothing Database Machine • Where is the required bandwidth • DMP : Shared-Nothing & SharedEverything 29 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt Demand for Database Performance • Larger Databases: – marketing data warehouses: TB of historical data – daily news broadcasts: 1 TB of searchable video/audio data • Large Scans: Searches require access to large fraction of database • Repeated Scans: DSS queries, Data mining algorithms 30 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt Life, Death & Reincarnation • Database Machines are coming, Database Machines are coming ... (Hsiao 1979) • Then there was Britton-Lee, Direct, ICL … – Teradata builds highly-parallel shared-nothing SQL server – many university “paper” designs • “Database Machines, An Idea whose time has Passed?” (Boral- DeWitt 1983) • Then there was MMDBs, Grace, Gamma and more Teradata • Then there was Software (Parallel Database Query) • Next: PDQ + lots of disks with power controllers 31 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt And All Along Stonebraker’s Opinion: “The history of DBMS research is littered with innumerable proposals to construct hardware database machines to provide high performance operations. In general these have been proposed by hardware types with a clever solution in searchof a problem on which it might work.” Readings in Database Systems, Morgan-Kaufmann 32 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt Why Not then, but Yes now • Too early: small databases on 1 disk TB databases span thousands disks, need partitioning • Disk filter designs: addressed only small part of DBMS requirements disk controllers are fast computers • Exotic technologies (bubbles, CCD…) went away • Special purpose hardware increased design time and cost Higher level of integration, VLSI design tools better • Parallel query processing was not well-understood Large body of research, successful commercial implementations Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt 33 Parallel Query Processing [DeWitt-Gray CACM91] Merge Sort Sort Sort Sort Sort Scan Scan Scan Scan Scan Source Data Source Data Source Data Source Data Source Data Pipelining Partitioning data streams flow from one operator to the next tables are partitioned to allow concurrent processing on partitions 34 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt Data Pathway Contention [Patterson Sigmod 1998] • Disk external I/O bus bottleneck to transfer rate, cost • Network internal I/O bus interface is bottleneck to delivered bandwidth • Memory-Processor processor-memory interface (cache+memory bus) is bottleneck to delivered bandwidth 35 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt A Shared-Nothing Database Machine Scalable Interconnect Processor & Memory Processor & Memory . . . Processor & Memory Processor & Memory No contention in memory access or parallel disk access => “Embarrassingly Parallel” Scan [Patterson] But: how fast need Interconnect be? Each processor has own OS, communication protocols,DB instance Exchange data streams for pipelining ops, for sort, merge Can’t support M:N mapping between disks & threads Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt 36 Share-Everything? • Need more bandwidth for shipping data streams than network can provide • Need M:N mapping from disks to processors for sort/merge • Control & synchronization: Data-flow best to synchronize processors 37 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt Where to Get the Bandwidth? Level of Components Integration Chip Board System Transistor / Gate Chips / Discrete components Board / Interface Links Throughput Connection lines 30 GB/Sec (16 64-bit registers 1-8 internal clocks at 200 MHz) 1. Point-topoint connections 2. Buses 1. Crossbars 2. Buses 1. 800 MB/Sec 2. 150 MB/Sec 1. 200-500 MB/Sec Latency 1.Half of transaction (10 clocks of the slowest device ) 1. 10-50 bus clocks 1. 10 crossbar clocks 2. 10-50 bus clocks 2. 80 MB/Sec Network Node (Systems) Fibre Channel / Bridges Ethernet / VIA Fibre Channel : 100–200 MB/Sec Sender overhead + Receiver overhead + transmission latency + link availability 38 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt The Data Manipulation Platform To Host Computer To other DMP Boards via high-speed switch I/O interface adapter NP 1 NP 2 Bus adapter ... RFM NP 16 ... P4 P1 BAM • Massive Parallel Operation data-flow control • M:N thread-to-disk RAM Direct processor to disk access Direct disk to memory connect Direct connection ... 1 80 DMP BOARD 39 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt A DSS Query Execution Plan Exchange 5 Sort 1 Exchange 4 Select sum(tabX.amount*.08), tabY.region from tabX,tabY where tabX.key=tabY.region Sort 2 group by tabY.region, order by tabY.region; Group 1 Group 2 1/10 grouped Exchange 3 Temp Disks HJoin HJoin Exchange 1 Scan tabX 1 Scan tabX 2 ... Exchange 2 Scan tabX 32 1/3 selected Scan tabY 1 ... 1 2 1/10 joined HJoin ... Scan tabY 3 ... 32 1 Database Disks Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt 3 40 Bandwidth Requirements Exchange 5 Sort 1 Sort 2 2.1 MB/s Exchange 4 Group 1 Group 2 21 MB/s Exchange 3 Temp Disk Contentio n HJoin HJoin HJoin Exchange 1 Scan tabX 1 Scan tabX 2 ... Exchange 2 Scan tabX 32 Scan tabY 1 ... Database Disks 1 2 210 MB/s Scan tabY 3 ... 32 1 32*20MB/s= 640 MB/s Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt 3 41 Conclusion DMP: shared-nothing and shared-everything IT ISN’T THAT YOU CAN’T SHARE IT IS WHERE YOU SHARE ON A CHIP ON A BOARD ON A NETWORK 42 Bitton & Gray: The Rebirth of Database Machines, http://research.microsoft.com/~Gray/talks/DB_Machine_Rebirth.ppt