In-Memory BLU Acceleration in IBM’s DB2 and dashDB: Optimized for Modern Workloads and Hardware Architectures Guy Lohman Research Manager Disruptive Information Management Architectures IBM Research – Almaden 14 April 2015 1 © 2015 IBM Corporation “In-Memory” BLU Acceleration: Agenda 1. Who cares about in-memory? a. In-memory is too expensive! b. In-memory is too limiting! c. In-memory is too slow for BLU! 2. What is BLU Acceleration? 3. The cloud is what is important! 4. Conclusions 2 © 2015 IBM Corporation Moore’s Law has snookered us! Main Memory Source: http://www.jcmit.com/mem2013.htm © 2015 IBM Corporation So, we conclude, … Ergo Memory is: – Unlimited – Free “It all fits!” Ergo 4 It must all fit! Right? © 2013 IBM Corporation WRONG!!! 5 © 2015 IBM Corporation In-Memory is Too Expensive! Economics + Greed There will always be a “memory hierarchy” – Yes, DRAM is getting cheaper • Moore’s Law has not (yet) been repealed! – BUT our appetite for data is growing even faster • The death of update-in-place (time travel) • “Big Data” Analytics craves large volumes of data – Why pay for DRAM for cold columns? Some (cold) data will spill to disk • Infrequently-referenced columns • Infrequently-referenced rows We’ve just moved up one level in our focus… © 2015 IBM Corporation Focus of Memory Hierarchy Has Shifted Up 1 Level DRAM CACHE DISK DRAM TAPE DISK “Disk is the new tape; Memory is the new disk.” -- Jim Gray © 2015 IBM Corporation In-Memory is Too Limiting! DBA must choose which tables can fit in-memory – Adds database design complexity for DBA – Workloads and tables referenced change over time Base tables aren’t the whole story! Must also include: – – – – Indexes Temporary tables Materialized views Query working space for each user (typically 1000s): • Hash tables for joins, GROUP BY • Space for sorts • …and much more! Have to persist anyway! (DRAM is still volatile) 8 © 2013 IBM Corporation In-Memory is Too Slow! CPU cache is many times faster than DRAM BLU’s run-time is carefully designed to: Operate on compressed values, bit-aligned as vectors Auto-detect HW cache sizes Adapt algorithms to them: Partition data into cache-sized blocks Exploit L2 & L3 caches Minimize cache-line misses (to DRAM) © 2015 IBM Corporation 9 What is BLU Acceleration? New technology for accelerating BI queries • • • • 2nd generation of Almaden’s Blink Research technology Columnar database within DB2 for Linux, UNIX, & Windows Run-time that is optimized to exploit modern hardware: − Multi-core for data parallelism − Cache and large main memories Operates on compressed, bit-aligned data vectors Order-of-magnitude benefits 1. Performance 2. Storage savings 3. Simplicity and Time to Value! DB2 LUW with BLU DB2 Compiler Run-time DB2 Classic Run-time BLU Run-time DB2 Classic Bufferpool Deeply integrated within DB2 10.5 • • • New columnar page format & run-time Memory-optimized (not limited to “in-memory”) Exploits DB2 full functionality, utilities, & tools “Revolution via Evolution” • • Storage DB2 Classic Row Tables C1 C2 C3 C4 C5 C6 C7 C8 BLU Encoded Columnar Tables C1 C2 C3 C4 C5 C6 C7 C8 Easy conversion of row tables to BLU (columnar) tables BLU tables can co-exist with traditional row tables − In same query, schema, storage, & memory • • • Query any combination of BLU or row data No need to change applications or SQL queries DB2 run-time compensates for any missing functionality in BLU © 2015 IBM Corporation Memory-Optimized =/ In-Memory Buzzwords is Memory-Optimized Analytics to Accelerate Your Applications… …and Improve Your Productivity! …and is now in the Cloud, too! © 2015 IBM Corporation Business Value 1: PERFORMANCE! DB2 10.5 with BLU Accel. DB2 10.1. Workload Speedup on Terabyte Class Data 133x 60 Faster .. . 44x Relative Performance 50 faster 40 25x faster 18x 30 faster 20 10 4TB 1TB 1TB 1TB 0 Intel Large European ISV Wall Street Cognos Dynamic Cubes “It was amazing to see the faster query times compared to the performance results with our row-organized tables. The performance of four of our queries improved by over 100-fold! The best outcome was a query that finished 137x faster by using BLU Acceleration.” - Kent Collins, Database Solutions Architect, BNSF Railway 12 © 2015 IBM Corporation Business Value 2: Storage Savings! ~2x-3x storage reduction vs. DB2 10.1 (comparing all objects – tables, indexes, etc.) – Patented columnar compression techniques – Fewer storage objects (indexes, materialized views) required DB2 with BLU Accel. © 2015 IBM Corporation 13 Business Value 3: SIMPLICITY and Time to Value! Create, LOAD, and then… Run Queries! – Significantly reduced or no need to tune No indexes (other than primary keys and foreign keys ) No storage reclaim (it’s automated) No memory configuration (it’s automated) No process model configuration (it’s automated) No statistics collection (it’s automated) No materialized views No statistical views No optimizer profiles or hints BLU Acceleration automatically adapts to: – Any size RAM – Any number of CPUs and cores – Any number of disks or SSDs “The BLU Acceleration technology has some obvious benefits: … But it’s when I think about all the things I don't have to do with BLU, it made me appreciate the technology even more: no tuning, no partitioning, no indexes, no aggregates.” -Andrew Juarez, Lead SAP Basis and DBA 14 © 2013 IBM Corporation What About Transactions? BLU tables may be updated with UPDATE, DELETE, and INSERT commands Changes made directly to BLU (column-organized) tables – No row-organized staging tables, unlike SAP HANA and SQL Server Multi-versioning – no in-place updates! Maintains DB2’s usual Isolation, Concurrency Control, and Durability – Fully logged, so recoverable – Supported: • Isolation Levels: CS + CC, UR • Searched UPDATE & DELETE • INSERT from VALUES, INSERT from sub-select, MERGE – Not supported: • Positioned update & delete (by cursor), select-from-UDI, update & delete of UNION views Insert speed on par with row-organized tables – Sometimes faster, because much fewer (or no) indexes – Best performance for large transactions, to amortize logging overheads • INSERTing or UPDATEing 100s or 1000s of rows, or more • DELETEing, if the clustering of pages matches that of the DELETE (e.g., time) New values compressed with page-level dictionaries, if beneficial – In addition to (on top of) column-level dictionary © 2015 IBM Corporation The cloud is what’s important! Introducing IBM dashDB! Fully managed service in the cloud using IBM BlueMix Cloudant JSON ready! Tightly integrated with Cloudant, providing analytics on JSON data Or, import data from Excel or CSV files In-database analytics Statistical analysis with R Spatial analytics with Esri. • In under an hour, anyone can access awesome data warehousing and BI using BLU Acceleration • No infrastructure or IT resources required • Visit: http://dashDB.com BinLsiUde © 2013 IBM Corporation dashDB – Use R Studio for Predictive Analytics © 2013 IBM Corporation Conclusions In-memory is: Too expensive! Too limiting! Too slow! DB2’s BLU Acceleration columnar run-time: Exploits cache, large memories, & multi-core parallelism Provides – >10x faster BI querying… and transactions, too! – 10x storage savings – Simpler, much less tuning: • • • • No secondary indexes or MVs to choose Automated stats collection, WLM, etc. More predictable and reliable performance Adapts automatically to your hardware Now available in the cloud as IBM dashDB: DB2 BLU + Cloudant + R For more details on BLU: – V. Raman et al., “DB2 with BLU Acceleration: So Much More than a Column Store”, VLDB 2013 18 © 2015 IBM Corporation Hindi Greek Thai Gracias Russian Traditional Chinese Thank You Obrigado Brazilian Portuguese English Arabic Spanish Danke German Grazie Merci Italian French Simplified Chinese Tamil Korean Japanese © 2015 IBM Corporation BACKUP © 2015 IBM Corporation “Related Work” SybaseIQ IQ TREX P*TIME MaxDB HANA Ingres X100 MonetDB (CWI) Vectorwise Data SPSS Distilleries C-Store Blink IWA ISAO 1995 2000 2005 2010 DB2 BLU 2015 © 2015 IBM Corporation Frequency (Dictionary) Compression Vol Prod Origin Dictionary for Origin Column Partitions Number of Occurrences Sales (Volume, Product, Origin) Histogram on Origin Common Values 0 = CN 1 = US Partition 1 (1 bit) 000 = BR 001 = FR 010 = GE 011 = IN … 111 = UK Partition 2 (3 bits) 00000000 = AU 00000001 = CA … Partition 3 (8 bits) China GER, USA FRA, … Rest Rare values NOTE: Within each partition, dictionary codes are: Fixed in length! Order-preserving! © 2015 IBM Corporation 7 Big Ideas: 2 Operate on Compressed Values Frequency compression (approximate Huffman encoding) exploits skew – The more frequent the value, the fewer bits it is encoded with – For example, typically a few populous states may dominate the number of sales • New York and California may be encoded with only 1 or 2 bits • Alaska and Rhode Island may be encoded in 6 bits Conceptual Compression Dictionary STATE New York California Illinois Michiga Florida n Alaska Rhode Isl Encoding Perform SQL Operations on the Encoded Data! – Apply predicates (=, <, >, >=, <=, <>, BETWEEN, IN, etc.) – Perform joins & grouping Register Length Encoded data is smaller, uses less machine resources – Encoded values packed together densely in register-width chunks – Fewer I/Os, better memory & cache utilization, fewer CPU cycles to process 23 © 2015 IBM Corporation 7 Big Ideas: 4 Core- & Cache-Friendly Parallelism BLU’s legacy: main-memory DBMS BLU’s run-time was built from the ground up to automatically: Exploit multi-core parallelism within queries Minimize sharing of common data structures, to minimize latching Pay careful attention to physical attributes of the server, e.g. cache sizes Maximize CPU cache hit rate & cache-line efficiency core cache line cache Cacheline ‘ping-pong’ core 0 working on blue data 24 core core cache cache Minimal Traffic core cache core 1 working on green data © 2015 IBM Corporation Joins Dimension Table(s) BLU supports all – SQL join types (inner-, LEFT OUTER, RIGHT OUTER, ANTI-, …) – Data types (VARCHARs, trailing blanks,…) No assumption that anything fits in memory, including inners – Partition to fit in L3 cache, if memory-resident – Else first partition to fit in memory Novel compacted hash table for cache-mostly processing [Build Phase] Thread 1 Scan & Apply Local Predicates Load Join Column(s), Re-encode, & Build Join Filter Scan & Apply Local Predicates Load Join Column(s), Re-encode, & Build Join Filter P1 Load Payloads Load Payloads Partition P2 P3 Partition P4 Thread 2 Fact Table Lookup Load Join Column FK1 Load Join Column FK2 P1 Apply Join Filter on FK2 Partition a stride P3 P4 HT 1 Lookup HT 2 Lookup HT 3 Lookup HT 4 P2 Apply Join Filter on FK1 Thread A HT 1 Thread B HT 2 Thread C HT 3 Thread D HT 4 Compacted Hash Tables [Probe Phase] Scan & Apply Local Predicates Compacted Hash Tables Result payloads De-partition Dim1 payload(s) Join with Dim2 © 2015 IBM Corporation 25 Group By / Aggregation Need to perform well on queries that output from few tens to billions of groups Cache- and NUMA*-aware (* Non-Uniform Memory Architecture) [Phase 1] Local Hash Table (HT) probes and appends to Overflow Buckets (OBs) Global partitioned HTs WorkUnit Encoded keys Unencoded keys Threads Local HTs, fixed size (1 per thread) [P1] Probe local HT Overflow Buckets (OBs) [P2] Merge OBs [Phase 2] Final partition merging Global lists of Overflow Blocks (1 per partition) © 2015 IBM Corporation 26 Business Value 3: SIMPLICITY and Time to Value! “Super Fast, Super Easy” – Just Create, Load, and Go! Database Design and Tuning DB2 with BLU Acceleration 1. 2. 3. 4. 5. 1. 2. Repeat 6. 7. 8. 9. 27 Decide on partition strategies Select Compression Strategy Create Table Load data Create Auxiliary Performance Structures • Materialized views • Create indexes • B+ indexes • Bitmap indexes Tune memory Tune I/O Statistics collection Add Optimizer hints Create Table Load data © 2015 IBM Corporation “Super Fast, Super Easy” – Just Create, Load, and Go! Create • Single parameter to configure entire database for BLU: db2set DB2_WORKLOAD=ANALYTICS • Create the database, table spaces, bufferpools, and tables • Tip: Useful to define “mem_percent” db2 “create database mydb autoconfigure using mem_percent 95 apply db and dbm” db2 “create table mytable (c1 integer not null, …)” Load your data • Same as before - no new syntax! db2 “load from file.dat of del replace into mytable” Go! • Begin running your workload 28 db2 “select SUM(SALES) from mytable where PURCHASEDATE > ‘20140101’ group by CITY” © 2013 IBM Corporation © 2013 IBM Corporation Cloudant – Create a dashDB Warehouse © 2013 IBM Corporation dashDB Welcome Page Automatic schema discovery, analyzes your JSON data in Cloudant, then discovers and automatically creates a relational schema for dashDB. © 2013 IBM Corporation dashDB – Load data from CVS, or Excel © 2013 IBM Corporation dashDB – Getting Started © 2013 IBM Corporation Shadow Tables for Mixed Workloads • Faster OLTP – fewer indexes • Dramatic reduction in indexes on the row table • Faster Reporting – BLU Acceleration! • 10X-40X faster. • Dual representation. Data stored as both row and column. The best of both worlds. • No application change. Database query compiler decides which format to access. Fully automated. Roworganized Columnorganized BinLsiUde Sales • Small memory needs. © 2013 IBM Corporation Shadow Tables Architecture Optimizer can route queries OLTP Workload to shadow tables if data is not older than the desired refresh age. OLAP Reporting SET CURRENT REFRESH AGE … ; DB2 Optimizer Log SYSTOOLS. REPL_MQT_LATENCY CDC Capture and Apply Engine Server Change Data Capture IBM InfoSphere Change Data Capture (CDC) included in DB2 AWSE and AESE (for shadow table usage) © 2013 IBM Corporation