When Data Management Systems Meet Approximate Hardware: Challenges and Opportunities Author: Bingsheng He (Nanyang Technological University, Singapore) Speaker: Jiong He (Nanyang Technological University, Singapore) 1 What is Approximate Hardware? • Approximate hardware can trade off the accuracy of results for increased performance, reduced energy consumption, or both. • Computer architecture researchers have proposed various emerging designs on approximate hardware, such as CPU, main memory and storage. 2 What is Approximate Hardware? Precise CPU Precise SSD Multiplier using 8-bit mantissas reduces over 78% of energy than a full 24-bit Approximate CPU multiplier. Approximate SSD Improves write latencies by 1.7 × on average by trading off less than 10% of accuracy 3 A Radical Vision: Data Management on Approximate Hardware • ApproxiDB is a radical data management system with its design, implementation and optimization aware of approximate hardware. – ApproxiDB will run on a hybrid machine consisting of both approximate hardware and precise hardware. – Enabling techniques: approximate query processing and precise query processing. – Several examples demonstrate the effectiveness of ApproxiDB in performance and energy consumption. 4 Outline • • • • Motivations Our proposal: ApproxiDB Open problems Summary 5 Hardware Evolution Drives Database Architectural Innovations Disk-based databases • Open-source: PostgreSQL, MySQL • Most commercial databases • Taught in current textbooks Main memory databases • Built on large-sized main memory • More efficient in data accessing and algorithm optimization • Hot research topic 6 Hardware Evolution Drives Database Architectural Innovations (con’t) (new) Parallel databases • Emerge with CMP, SMP and SMT • Hardware-conscious optimizations Query co-processor • Hardware: GPUs, FPGA, etc. • Efficient query processing with massively parallel processors • Examples: GPUQP, Ocelot, etc. 7 Prediction is Always Difficult, especially about the Future What is the next? Approximate hardware! 8 Approximate Hardware: An Example with Solid State Storage (a) Precise MLC 1. 2. Guard bands are to separate different analog values so that they can safely represent digital values. Solid state storages adopt iterative program-and-verify (P&V). Figures are reproduced from [14] in our paper. 9 Approximate Hardware: An Example with Solid State Storage (a) Precise MLC The guard band is reduced -> a smaller number of P&V iterations to achieve the acceptable accuracy (b) Approximate MLC Figures are reproduced from [14] in our paper. 10 What’s New? • O1: The data are inherently imprecise, and thus can tolerate loss of accuracy. – For example, the reading from a temperature sensor may not need the accuracy to “last decimal”. • O2: The query processing itself can tolerate loss of accuracy, and the result can be imprecise/approximate. – Approximate query processing • O3: Although the query processing requires precise final result, a hybrid execution on precise hardware and approximate hardware could have better performance/energy consumption than the execution with precise hardware only. – The focus of this talk. 11 Outline • • • • Motivations Our proposal: ApproxiDB Open problems Summary 12 Our Proposal: Approximate and Refine • Design for hybrid hardware (including both precise hardware and approximate hardare). • Approximate-and-refine consists of two steps: – Step 1: use approximate hardware to obtain intermediate results (superset of the final results) within some query processing steps. – Step 2: use precise hardware to refine the intermediate results and obtain the final precise results. • We show two examples (selection and merge sort) to illustrate this paradigm. 13 Example 1: Selection Select tuples where: R.x > 4.5 and R.x < 5.9 (a) cost: 1 * 8 = 8 Precise storage Precise execution Approximate execution (b) cost: 0.5 * 8 + 1 * 2 = 6 Approximate storage 5.7 5.7 5.7 R.x 10.5 10.1 1.1 12.5 5.7 (a) Selection on precise CPU 8.5 4.3 8.2 R.x 10.5 10.1 1.1 4.3 12.5 5.7 8.5 (b) Selection on hybrid processor 4.3 8.2 Example 2: Merge Sort Sort input in ascending order Precise storage Precise execution Approximate execution Approximate storage 1.1 4.3 r3 1.1 4.3 r3 r7 5.7 8.2 8.5 10.1 10.5 12.5 r5 r6 r8 1.1 10.1 10.5 12.5 4.3 r3 r2 r1 r4 r1 r2 r7 5.7 8.2 8.5 r1 r3 r5 r4 r8 r5 10.1 10.5 1.1 12.5 5.7 8.5 r2 r4 r6 4.3 8.2 r7 r6 r8 10.5 10.1 1.1 12.5 5.7 8.5 4.3 8.2 r1 r2 r3 r4 r5 (c) Sort on precise storage r6 r7 r8 r7 0.9 4.4 r3 r7 5.7 8.2 8.5 10.1 10.5 12.5 r5 r6 r8 r2 r5 r8 r6 r1 r7 r4 r2 r1 r4 r3 r5 r4 r1 r4 5.7 8.4 8.3 r5 10.0 10.4 1.2 12.4 5.8 8.4 r2 r1 5.5 8.2 8.4 9.8 10.2 12.4 1.0 9.9 10.3 12.2 4.4 r3 r2 r6 r8 r6 4.4 8.3 r7 r8 10.5 10.1 1.1 12.5 5.7 8.5 4.3 8.2 r1 r2 r3 r4 r5 (d) Sort on hybrid storage r6 r7 r8 Initial Design of ApproxiDB CPU Precise CPU Approximate CPU Query optimizer Query operators (e.g., joins and sort) Cost estimation Access methods (e.g., scans) Precise storage Other Components in DBMS Approximate storage DBMS (ApproxiDB) Storage 16 Extensions to existing DBMS • Allow users to specify what should be stored in approximate storage as well as accuracy requirement. • We propose four query processing modes: – Precise storage + precise query processing – Approximate storage + precise query processing – Precise storage + approximate query processing – Approximate storage + approximate query processing 17 Extensions to Existing DBMS (con’t) • The cost model needs to consider new factors like the tradeoff between performance/energy consumption and accuracy. • Query optimizer should revisit the physical operator implementation and query processing executions to optimally utilize the hybrid system. 18 Outline • • • • Motivations Our proposal: ApproxiDB Open problems Summary 19 Open Problems • We are facing a lot of open problems in ApproxiDB. – Automatic physical design to ease burden of users. – Problems in multi-level approximate hardware. – Synthetize techniques in probabilistic databases in ApproxiDB. – Query-level tradeoff between accuracy and performance/energy. 20 Outline • • • • Motivations Our proposal: ApproxiDB Open problems Summary 21 Summary • We sketch a radical vision of ApproxiDB on hybrid hardware with both approximate hardware and precise hardware. • We demonstrate our initial design of ApproxiDB to exploit those optimization opportunities. • We conjecture that approximate hardware can be one of the interesting driving forces in database community in the future. 22 Q&A • Thank you. • Our research group: Xtra Computing Group http://pdcc.ntu.edu.sg/xtra/ 23