Slide - ICDE 2015

advertisement
In-Memory BLU Acceleration
in IBM’s DB2 and dashDB:
Optimized for Modern Workloads and Hardware Architectures
Guy Lohman
Research Manager
Disruptive Information Management Architectures
IBM Research – Almaden
14 April 2015
1
© 2015 IBM Corporation
“In-Memory” BLU Acceleration: Agenda
1. Who cares about in-memory?
a. In-memory is too expensive!
b. In-memory is too limiting!
c. In-memory is too slow for BLU!
2. What is BLU Acceleration?
3. The cloud is what is important!
4. Conclusions
2
© 2015 IBM Corporation
Moore’s Law has snookered us!
Main
Memory
Source: http://www.jcmit.com/mem2013.htm
© 2015 IBM Corporation
So, we conclude, …
Ergo
Memory is:
– Unlimited
– Free
“It all fits!”
Ergo
4
It must all fit!
Right?
© 2013 IBM Corporation
WRONG!!!
5
© 2015 IBM Corporation
In-Memory is Too Expensive!
 Economics + Greed  There will always be a “memory hierarchy”
– Yes, DRAM is getting cheaper
• Moore’s Law has not (yet) been repealed!
– BUT our appetite for data is growing even faster
• The death of update-in-place (time travel)
• “Big Data” Analytics craves large volumes of data
– Why pay for DRAM for cold columns?
Some (cold) data will spill to disk
• Infrequently-referenced columns
• Infrequently-referenced rows
 We’ve just moved up one level in our focus…
© 2015 IBM Corporation
Focus of Memory Hierarchy Has Shifted Up 1 Level
DRAM
CACHE
DISK
DRAM
TAPE
DISK
“Disk is the new tape;
Memory is the new disk.”
-- Jim Gray
© 2015 IBM Corporation
In-Memory is Too Limiting!
 DBA must choose which tables can fit in-memory
– Adds database design complexity for DBA
– Workloads and tables referenced change over time
 Base tables aren’t the whole story! Must also include:
–
–
–
–
Indexes
Temporary tables
Materialized views
Query working space for each user (typically 1000s):
• Hash tables for joins, GROUP BY
• Space for sorts
• …and much more!
 Have to persist anyway!
(DRAM is still volatile)
8
© 2013 IBM Corporation
In-Memory is Too Slow!
 CPU cache is many times
faster than DRAM
 BLU’s run-time is carefully
designed to:
 Operate on compressed
values, bit-aligned as vectors
 Auto-detect HW cache sizes
 Adapt algorithms to them:
 Partition data into
cache-sized blocks
 Exploit L2 & L3 caches
 Minimize cache-line
misses (to DRAM)
© 2015 IBM Corporation
9
What is BLU Acceleration?
New technology for accelerating BI queries
•
•
•
•
2nd generation of Almaden’s Blink Research technology
Columnar database within DB2 for Linux, UNIX, & Windows
Run-time that is optimized to exploit modern hardware:
−
Multi-core for data parallelism
−
Cache and large main memories
Operates on compressed, bit-aligned data vectors
Order-of-magnitude benefits
1. Performance
2. Storage savings
3. Simplicity and Time to Value!
DB2 LUW with BLU
DB2 Compiler
Run-time
DB2 Classic
Run-time
BLU Run-time
DB2 Classic Bufferpool
Deeply integrated within DB2 10.5
•
•
•
New columnar page format & run-time
Memory-optimized (not limited to “in-memory”)
Exploits DB2 full functionality, utilities, & tools
“Revolution via Evolution”
•
•
Storage
DB2 Classic Row Tables
C1 C2 C3 C4 C5 C6 C7 C8
BLU Encoded Columnar
Tables
C1 C2 C3 C4 C5 C6 C7 C8
Easy conversion of row tables to BLU (columnar) tables
BLU tables can co-exist with traditional row tables
− In same query, schema, storage, & memory
•
•
•
Query any combination of BLU or row data
No need to change applications or SQL queries
DB2 run-time compensates for any missing functionality in BLU
© 2015 IBM Corporation
Memory-Optimized =/ In-Memory
Buzzwords
is Memory-Optimized Analytics
to Accelerate Your Applications…
…and Improve Your Productivity!
…and is now in the Cloud, too!
© 2015 IBM Corporation
Business Value 1: PERFORMANCE!
DB2 10.5 with BLU Accel.
DB2 10.1.
Workload Speedup on Terabyte Class Data
133x
60
Faster
..
.
44x
Relative Performance
50
faster
40
25x
faster
18x
30
faster
20
10
4TB
1TB
1TB
1TB
0
Intel
Large European ISV
Wall Street
Cognos Dynamic
Cubes
“It was amazing to see the faster query times compared to the performance results with our
row-organized tables. The performance of four of our queries improved by over 100-fold!
The best outcome was a query that finished 137x faster by using BLU Acceleration.”
- Kent Collins, Database Solutions Architect, BNSF Railway
12
© 2015 IBM Corporation
Business Value 2: Storage Savings!
 ~2x-3x storage reduction vs. DB2 10.1
(comparing all objects – tables, indexes, etc.)
– Patented columnar compression techniques
– Fewer storage objects (indexes, materialized views) required
DB2 with BLU Accel.
© 2015 IBM Corporation
13
Business Value 3: SIMPLICITY and Time to Value!
 Create, LOAD, and then… Run Queries!
– Significantly reduced or no need to tune








No indexes (other than primary keys and foreign keys )
No storage reclaim (it’s automated)
No memory configuration (it’s automated)
No process model configuration (it’s automated)
No statistics collection (it’s automated)
No materialized views
No statistical views
No optimizer profiles or hints
 BLU Acceleration automatically adapts to:
– Any size RAM
– Any number of CPUs and cores
– Any number of disks or SSDs
“The BLU Acceleration technology has some obvious benefits: … But it’s when I
think about all the things I don't have to do with BLU, it made me appreciate
the technology even more: no tuning, no partitioning, no indexes, no
aggregates.”
-Andrew Juarez, Lead SAP Basis and DBA
14
© 2013 IBM Corporation
What About Transactions?
 BLU tables may be updated with UPDATE, DELETE, and INSERT commands
 Changes made directly to BLU (column-organized) tables
– No row-organized staging tables, unlike SAP HANA and SQL Server
 Multi-versioning – no in-place updates!
 Maintains DB2’s usual Isolation, Concurrency Control, and Durability
– Fully logged, so recoverable
– Supported:
• Isolation Levels: CS + CC, UR
• Searched UPDATE & DELETE
• INSERT from VALUES, INSERT from sub-select, MERGE
– Not supported:
• Positioned update & delete (by cursor), select-from-UDI, update & delete of UNION views
 Insert speed on par with row-organized tables
– Sometimes faster, because much fewer (or no) indexes
– Best performance for large transactions, to amortize logging overheads
• INSERTing or UPDATEing 100s or 1000s of rows, or more
• DELETEing, if the clustering of pages matches that of the DELETE (e.g., time)
 New values compressed with page-level dictionaries, if beneficial
– In addition to (on top of) column-level dictionary
© 2015 IBM Corporation
The cloud is what’s important!
Introducing IBM dashDB!
 Fully managed service in the cloud using
 IBM BlueMix
 Cloudant
 JSON ready! Tightly integrated with
Cloudant, providing analytics on JSON data
 Or, import data from Excel or CSV files
 In-database analytics
 Statistical analysis with R
 Spatial analytics with Esri.
•
In under an hour, anyone can access awesome data
warehousing and BI using BLU Acceleration
•
No infrastructure or IT resources required
•
Visit: http://dashDB.com
BinLsiUde
© 2013 IBM Corporation
dashDB – Use R Studio for Predictive Analytics
© 2013 IBM Corporation
Conclusions
In-memory is:
 Too expensive!
 Too limiting!
 Too slow!
DB2’s BLU Acceleration columnar run-time:
 Exploits cache, large memories, & multi-core parallelism
 Provides
– >10x faster BI querying… and transactions, too!
– 10x storage savings
– Simpler, much less tuning:
•
•
•
•
No secondary indexes or MVs to choose
Automated stats collection, WLM, etc.
More predictable and reliable performance
Adapts automatically to your hardware
 Now available in the cloud as IBM dashDB: DB2 BLU + Cloudant + R
 For more details on BLU:
– V. Raman et al., “DB2 with BLU Acceleration: So Much More than a Column Store”, VLDB 2013
18
© 2015 IBM Corporation
Hindi
Greek
Thai
Gracias
Russian
Traditional Chinese
Thank You
Obrigado
Brazilian Portuguese
English
Arabic
Spanish
Danke
German
Grazie
Merci
Italian
French
Simplified Chinese
Tamil
Korean
Japanese
© 2015 IBM Corporation
BACKUP
© 2015 IBM Corporation
“Related Work”
SybaseIQ
IQ
TREX
P*TIME
MaxDB
HANA
Ingres
X100
MonetDB
(CWI)
Vectorwise
Data SPSS
Distilleries
C-Store
Blink
IWA
ISAO
1995
2000
2005
2010
DB2 BLU
2015
© 2015 IBM Corporation
Frequency (Dictionary) Compression
Vol Prod Origin
Dictionary for Origin
Column Partitions
Number of
Occurrences
Sales (Volume, Product, Origin)
Histogram
on Origin
Common
Values
0 = CN
1 = US
Partition 1 (1 bit)
000 = BR
001 = FR
010 = GE
011 = IN
… 111 = UK
Partition 2 (3 bits)
00000000 = AU
00000001 = CA
…
Partition 3 (8 bits)
China GER,
USA FRA,
…
Rest
Rare
values
NOTE: Within each partition, dictionary codes are:
 Fixed in length!
 Order-preserving!
© 2015 IBM Corporation
7 Big Ideas:
2
Operate on Compressed Values
 Frequency compression (approximate Huffman encoding) exploits skew
– The more frequent the value, the fewer bits it is encoded with
– For example, typically a few populous states may dominate the number of
sales
• New York and California may be encoded with only 1 or 2 bits
• Alaska and Rhode Island may be encoded in 6 bits
Conceptual
Compression
Dictionary
STATE
New York
California
Illinois
Michiga
Florida
n
Alaska
Rhode Isl
Encoding
 Perform SQL Operations on the Encoded Data!
– Apply predicates (=, <, >, >=, <=, <>, BETWEEN, IN, etc.)
– Perform joins & grouping
Register Length
 Encoded data is smaller, uses less machine resources
– Encoded values packed together densely in register-width chunks
– Fewer I/Os, better memory & cache utilization, fewer CPU cycles to process
23
© 2015 IBM Corporation
7 Big Ideas:
4
Core- & Cache-Friendly Parallelism
 BLU’s legacy: main-memory DBMS
 BLU’s run-time was built from the ground up to automatically:
 Exploit multi-core parallelism within queries
 Minimize sharing of common data structures, to minimize latching
 Pay careful attention to physical attributes of the server, e.g. cache sizes
 Maximize CPU cache hit rate & cache-line efficiency
core
cache
line
cache
Cacheline
‘ping-pong’
core 0 working
on blue data
24
core
core
cache
cache
Minimal
Traffic
core
cache
core 1 working
on green data
© 2015 IBM Corporation
Joins


Dimension Table(s)

BLU supports all
– SQL join types (inner-, LEFT OUTER, RIGHT OUTER, ANTI-, …)
– Data types (VARCHARs, trailing blanks,…)
No assumption that anything fits in memory, including inners
– Partition to fit in L3 cache, if memory-resident
– Else first partition to fit in memory
Novel compacted hash table for cache-mostly processing
[Build Phase]
Thread 1
Scan &
Apply Local
Predicates
Load Join
Column(s),
Re-encode, &
Build Join Filter
Scan &
Apply Local
Predicates
Load Join
Column(s),
Re-encode, &
Build Join Filter
P1
Load
Payloads
Load
Payloads
Partition
P2
P3
Partition
P4
Thread 2
Fact Table
Lookup
Load Join
Column FK1
Load Join
Column FK2
P1
Apply Join
Filter on FK2
Partition a
stride
P3
P4
HT
1
Lookup
HT
2
Lookup
HT
3
Lookup
HT
4
P2
Apply Join
Filter on FK1
Thread A
HT 1
Thread B
HT 2
Thread C
HT 3
Thread D
HT 4
Compacted
Hash Tables
[Probe Phase]
Scan &
Apply Local
Predicates
Compacted
Hash Tables
Result
payloads
De-partition
Dim1
payload(s)
Join
with
Dim2
© 2015 IBM Corporation
25
Group By / Aggregation
 Need to perform well on queries that output from few tens to billions of groups
 Cache- and NUMA*-aware (* Non-Uniform Memory Architecture)
[Phase 1] Local Hash Table (HT) probes and appends to Overflow Buckets (OBs)
Global partitioned HTs
WorkUnit
Encoded keys
Unencoded
keys
Threads
Local HTs, fixed size
(1 per thread)
[P1] Probe local HT
Overflow
Buckets (OBs)
[P2] Merge OBs
[Phase 2] Final partition merging
Global lists of Overflow Blocks
(1 per partition)
© 2015 IBM Corporation
26
Business Value 3: SIMPLICITY and Time to Value!
“Super Fast, Super Easy” – Just Create, Load, and Go!
Database Design and Tuning
DB2 with BLU Acceleration
1.
2.
3.
4.
5.
1.
2.
Repeat
6.
7.
8.
9.
27
Decide on partition strategies
Select Compression Strategy
Create Table
Load data
Create Auxiliary Performance Structures
•
Materialized views
•
Create indexes
•
B+ indexes
•
Bitmap indexes
Tune memory
Tune I/O
Statistics collection
Add Optimizer hints
Create Table
Load data
© 2015 IBM Corporation
“Super Fast, Super Easy” – Just Create, Load, and Go!
Create
• Single parameter to configure entire database for BLU:
db2set DB2_WORKLOAD=ANALYTICS
• Create the database, table spaces, bufferpools, and tables
• Tip: Useful to define “mem_percent”
db2 “create database mydb autoconfigure using mem_percent 95
apply db and dbm”
db2 “create table mytable (c1 integer not null, …)”
Load your data
• Same as before - no new syntax!
db2 “load from file.dat of del replace into mytable”
Go!
• Begin running your workload
28
db2 “select SUM(SALES) from mytable where
PURCHASEDATE > ‘20140101’ group by CITY”
© 2013 IBM Corporation
© 2013 IBM Corporation
Cloudant – Create a dashDB Warehouse
© 2013 IBM Corporation
dashDB Welcome Page
Automatic schema discovery, analyzes
your JSON data in Cloudant, then
discovers and automatically creates a
relational schema for dashDB.
© 2013 IBM Corporation
dashDB – Load data from CVS, or Excel
© 2013 IBM Corporation
dashDB – Getting Started
© 2013 IBM Corporation
Shadow Tables for Mixed Workloads
• Faster OLTP – fewer indexes
• Dramatic reduction in
indexes on the row table
• Faster Reporting – BLU
Acceleration!
• 10X-40X faster.
• Dual representation. Data
stored as both row and column.
The best of both worlds.
• No application change.
Database query compiler
decides which format to access.
Fully automated.
Roworganized
Columnorganized
BinLsiUde
Sales
• Small memory needs.
© 2013 IBM Corporation
Shadow Tables Architecture
Optimizer can route queries
OLTP
Workload
to shadow tables if data is
not older than the desired
refresh age.
OLAP
Reporting
SET CURRENT REFRESH AGE … ;
DB2
Optimizer
Log
SYSTOOLS.
REPL_MQT_LATENCY
CDC Capture and Apply Engine
Server
Change Data Capture
IBM InfoSphere Change Data Capture (CDC) included in DB2 AWSE and AESE (for shadow table usage)
© 2013 IBM Corporation
Download