Oracle Database In-Memory Tirthankar Lahiri Vice President Oracle Data Technologies and TimesTen Vineet Marwah Senior Director Oracle Data Technologies Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Database In-Memory Goals Orders of Magnitude Faster Analytics 100x Accelerate Mixed Workload OLTP 2x-10x No Changes to Applications Simple Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Cost Effective $$$ $ 3 What is a Controversy? “A discussion marked especially by the expression of opposing views” Merriam Webster Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Decades Long Controversy in Database Systems “A discussion marked especially by the expression of opposing views” Merriam Webster Column Row Format Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Row Format Databases vs. Column Format Databases SALES – Example: Insert or query a sales order – Fast processing for few rows, many columns Row SALES Column Transactions run faster on row format Analytics run faster on column format – Example : Report on sales totals by region – Fast accessing few columns, many rows Until Now Must Choose One Format and Suffer Tradeoffs Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 6 Oracle Database In-Memory: Dual Format Architecture Existing Buffer Cache New In-Memory Format SALES SALES Row Format Column Format • BOTH row and column formats for same table • Simultaneously active and consistent • OLTP uses existing row format • Analytics uses new In-Memory Column format Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 7 In-Memory Columnar Format Pure In-Memory Columnar • Highly optimized compressed columnar format • Pure in-memory format: • Cheap to maintain – no logging or IO • Allows efficient OLTP • No changes to disk format SALES • Transparent to Applications • Can be enabled for subset of database Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 8 Selective Enabling In-Memory Columnar Storage ALTER TABLE orders INMEMORY; • Selectively enable in-memory storage CREATE TABLE PARTITION BY (PARTITION (PARTITION • New INMEMORY clause sales …… RANGE(date) p1 …… INMEMORY, p2 …… NO INMEMORY); ALTER TABLE accounts INMEMORY NO INMEMORY (photo); • Applies to Tables, Partitions, Sub-Partitions , Materialized Views, Tablespaces • Can also exclude unneeded columns • Coming – automatic inmemory placement Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 9 Populating the In-Memory Column Store CREATE TABLE orders (c1 number, c2 varchar(20), c3 number) INMEMORY PRIORITY CRITICAL; ALTER TABLE sales INMEMORY PRIORITY MEDIUM; ALTER TABLE accounts INMEMORY PRIORITY NONE; • In-Memory objects are populated in the background • Always accessible via buffer cache in the meantime • Queries do not block for populate • Default behavior - initiate populate on first access • Optional pre-populate via PRIORITY sub-clause • Initiates populate without waiting for queries • CRITICAL > HIGH > MEDIUM > LOW • Controls order (not speed) of populate Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 10 In-Memory Column Store Performance Optimizations • Vector Processing • Software on Silicon • Operation pushdown • In-Memory Storage Index • Elimination of Analytic Indexes Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 11 Vector Processing: Additional Advantage of Column Format Memory • Each CPU core scans only required columns STATE Example: Find all sales in state of CA CPU Load multiple region values Vector Register CA CA CA Vector Compare all values in 1 instruction CA > 100x Faster • SIMD vector instructions used to process multiple values in each instruction • E.g. Intel AVX instructions with 256 bit vector registers • Billions of rows/sec scan rate per CPU core • Row format is millions/sec Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 12 DBIM Software On Silicon SPARC M7 On-Chip DBIM Accelerator/Compressor MEMORY or L3$ Row Format DB MEMORY or L3$ Column Format Compressed DB Up to 32 Concurrent DB Streams Bit/Byte-packed, Padded, Indexed Vectors M7 In-Silicon Query Engines Up to 32 Concurrent Result Streams • SIMD Vector Instructions were originally designed for High Performance Computing and Graphics – not for Databases • New SPARC M7 chip has 32 Database Acceleration Engines (DAX) on chip • Like having 32 specialized cores for DBIM query processing Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Highly Restricted 13 Performance: Database In-Memory Acceleration Engines SPARC M7 Core Core Core • DAX includes specialized query functions for predicates, conversions, set membership tests, etc. Core Shared Cache DB Accel DB Accel DB Accel DB Accel • Independently process streams of columns: –E.g. find all values that match ‘California’ –10x performance gains –Up to 170 billion rows per second 32 Database Accelerators (DAX) Copyright © 2015, 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Highly Restricted 14 Capacity: Database In-Memory Decompression Engines • Compression increases in-memory capacity • FOR QUERY mode compresses column values Fastest for queries - no decompression needed Core Core Core Core Data must be decompressed prior to access Uses custom Oracle Zip (OZIP): Superfast bit pattern decompressor Shared Cache DB OZIP DB OZIP DB OZIP • FOR CAPACITY mode compresses column bit patterns DB OZIP 32 OZIP Decompressors 2x Memory capacity • DAX includes specialized OZIP decompression engine • Run OZIP decompress at full memory speed, > 120 GB/sec • Pipelines decompression and data processing (predicate evaluation, aggregation, comparisons, etc.) in hardware • Doubles memory capacity with no performance penalty Copyright © 2015, 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Highly Restricted 15 Operation Pushdown: Bloom Filter Example: Find total sales in outlet stores Sales Stores • Bloom filter created on dimension scan Type=‘Outlet’ Amount StoreID in 15, 38, 64 Store ID Store ID Type Bloom Filter • Bloom filter pushdown: • Filtering pushed down to fact scan • Returns only rows that are likely to be join candidates • Joins tables 10x faster Sum Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 16 Operation Pushdown: Vector Group By Example: Report sales of footwear in outlet stores Products In-Memory Report Outline Sales Outlets Stores report outline during dimension scan • Push down report outline Footwear Footwear • Create (empty) in-memory aggregation to fact scan $ $$ • Reduces complex aggregations $ to series of fast inmemory scans $$$ • Reports run 10x faster • Without predefined cubes Outlets Sales Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 17 In-Memory Storage Index Example: Find stores with sales greater than $10,000 • In-Memory Compression Units (IMCU) – unit of column store allocation ~ 0.5 million rows • Per-column min/max values – Check predicates against min/max values – Skip IMCU if predicate not satisfied Min $4000 Max $7000 Min $8000 Max $12000 • Eliminates accessing unnecessary IMCUs • Eliminates predicate evaluation when all values pass Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Min $13000 Max $15000 18 Complex OLTP is Slowed by Analytic Indexes Table 1–3 OLTP Indexes 10 – 20 Analytic Indexes • Most Indexes in complex OLTP (e.g. ERP) databases are only used for analytic queries • Inserting one row into a table requires updating 10-20 analytic indexes: Slow! • Indexes only speed up predictable queries & reports Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 19 OLTP is Slowed Down by Analytic Indexes Insert rate decreases as number of indexes increases # of Fully Cached Indexes (Disk Indexes are much slower) Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 20 Column Store Replaces Analytic Indexes Table 1–3 OLTP Indexes In-Memory Column Store • Fast analytics on any columns • Better for unpredictable analytics • Less tuning & administration • Column Store not persistent so update cost is much lower • OLTP & batch run faster Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 21 Schneider In-Memory Compression Schneider General Ledger Compression Factors 20 • Over 2 billion General Ledger Entries 15 10 5 8.6x 13x 16x 19x 0 Query Low Query High Capacity Low Capacity High Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 22 Schneider Speedup Across 1545 Queries 7x to 128x faster Seconds per Query 100 • 2 billion General Ledger Entries 90 80 Buffer Cache 70 • 1545 queries 60 50 IN-MEMORY – Originally took 34 hours to complete – Combination of filter queries, aggregations and summations 40 30 20 10 0 2000M 300M 30M 5M 0.5M Million rows returned by query Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 23 Schneider Transactions Speedup 4 000 3 500 • Data – Sales Accounts Primary Index Only 3 000 2 500 • Main table has 1 Primary Key + 21 secondary indexes Primary Index Plus In-Memory Columns No Index 2 000 No Index + IM 1 500 Full Index • Test - 303 million transactions 1 000 500 All Indexes 75 70 65 60 55 50 45 40 35 30 25 20 15 10 5 - 0 Millions of transaction per day From 5x to 9x faster Millions of records in the target table Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 24 Schneider Storage and IO Reduction Over 70% reduction in storage usage due to analytic index removal Size in GBs SECONDARY INDEXES 350 300 250 200 150 TABLES & PK INDEXES 100 50 0 Objects Redo Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 25 Scale-Out In-Memory Database to Any Size • Scale-Out across servers to grow memory and CPUs • DISTRIBUTE clause: by Partition, Sub-Partition, or Rowid Range • In-Memory queries parallelized across servers to access local column data • Direct-to-Wire InfiniBand protocol speeds messaging Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 26 Scale-Out In-Memory with Fault Tolerance • Ability to Duplicate IMCUs on another node • Enabled via DUPLICATE subclause • Application transparent • Similar to storage mirroring • Downtime eliminated by using duplicate after failure Only Available on Engineered Systems Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 27 Conclusion: The End of a Controversy Dual Format Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | In-Memory Database Technology Across Tiers In-Memory Row Store Application Tier Application Application Application Application Application Application In-Memory Column Store Database Tier TimesTen In-Memory Database • Embeddable In-Memory Database for Application Tier • Primary Usecase: Latency-critical custom OLTP Microsecond Response Time • Standalone Database or as Application-Tier Cache for Oracle Database Oracle Database In-Memory • Dual-Format In-Memory Database • Primary Usecase: Real Time Analytics on any source Billions of Rows/Sec analytic data access • Faster mixed-workload enterprise OLTP • Storage-Tiering: Combines best of memory, flash, disk • Transparent: packaged apps run with no changes Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Oracle Confidential – Highly Restricted 29 Summary • Dual Format Architecture – Fully consistent row and column format – Best of both worlds OLTP and Analytics performance. – Typically, row format (Buffer cache) memory < 10% of column format memory • New In-Memory Column Format – In-memory only representation – Seamlessly built into Oracle Database Engine – Compatible with all Oracle Database features • Cost Effective – Use in-memory for hot data, flash for intermediate data, disk for cold data Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 30 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | Backup Slides Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | In-Memory Area: Composition In Memory Area IMCU IMCU IMCU IMCU • Contains two subpools: IMCU SMU SMU SMU SMU SMU SMU SMU SMU IMCU IMCU IMCU Metadata – Pool of In Memory Compression Units (IMCUs) – Pool of Snapshot Metadata Units (SMUs) • IMCUs contain column formatted data • SMUs contain metadata and transactional information Column Format Data Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 34 Column Compression Unit (CU) Dictionary VALUE Audi BMW Cadillac ID 0 1 2 Column value list BMW Audi BMW Cadillac BMW Audi Audi Column CU • Contiguous storage per column in an IMCU Min: Audi Max: Cadillac 2 0 2 1 2 0 0 • All CUs automatically store Min/Max values • Multiple formats: – For example, Dictionary Compression: CU stores (smaller) dictionary IDs instead of full values – Additional compression schemes Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 35 In-Memory Compression Unit (IMCU) IMCU header Column CUs ROWID EMPID NAME DEPT • Unit of column store allocation SALARY – Columnar representation of a large number of rows (e.g. 0.5 million) – Rows in one or more table extents – Variable size • Contains contiguous runs for each column (column compression units) Extent #13 Blocks 20-120 Extent #14 Blocks 82-182 Extent #15 Blocks 201-301 Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 36 In-Memory Column Store Populate IMCU SMU • Populate: Initial creation of IMCU from Row Format • Repopulate: Recreation of IMCU after modification – Threshold Repopulate, after exceeding change threshold – Trickle Repopulate, constant activity, any changed IMCU is a potential candidate Change threshold or trickle action DML operations Workload transactions IMCU SMU Repopulate (Initial) Populate Row Format Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 37 In-Memory Column Store Compression • IMCUs compressed during population ALTER MATERIALIZED VIEW mv1 INMEMORY MEMCOMPRESS FOR QUERY; • Controlled by MEMCOMPRESS sub-clause • Ascending order of compression levels: • FOR DML (for heavily updated tables, slower for queries) CREATE TABLE history (Name varchar(20), Desc varchar(200)) INMEMORY MEMCOMPRESS FOR CAPACITY LOW; • FOR QUERY LOW (default: light compression and fastest) • FOR QUERY HIGH (slightly more compression) • FOR CAPACITY LOW (balances capacity and performance) • FOR CAPACITY HIGH (maximizes capacity) • Allows in-memory storage tiering Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 38 DML QUERY LOW QUERY HIGH Hotter Partitions In-Memory Column Store Compression • IMCUs compressed during population • Controlled by MEMCOMPRESS sub-clause • Ascending order of compression levels: • FOR DML (for heavily updated tables, slower for queries) CAPACITY LOW CAPACITY HIGH Colder Partitions • FOR QUERY LOW (default: light compression and fastest) • FOR QUERY HIGH (slightly more compression) • FOR CAPACITY LOW (balances capacity and performance) • FOR CAPACITY HIGH (maximizes capacity) • Allows in-memory storage tiering Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 39 In-Memory Column Store Compression Mode Compression Factor Query Speed Typical Range Observed Max QUERY 2x-7x Up to 10x Fastest CAPACITY LOW 4x-9x Up to 20x Very Fast CAPACITY HIGH 7x-12x Up to 30x Fast • MEMCOMPRESS FOR QUERY LOW or HIGH – LOW: Fastest performance – HIGH: Slightly greater compression – Lightweight compression schemes – Queries run on compressed data Compression ratios can be highly inflated by choosing a bad uncompressed format, or reporting most compressible table. Our results are measured relative to Oracle’s efficient row format for customer data. . – Faster than on uncompressed data Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 40 In-Memory Column Store Compression Mode Compression Factor Query Speed Typical Range Observed Max QUERY 2x-6x Up to 10x Fastest CAPACITY LOW 4x-9x Up to 20x Very Fast CAPACITY HIGH 7x-12x Up to 30x Fast • MEMCOMPRESS FOR CAPACITY LOW – Balances throughput and capacity – Adds Oracle custom ZIP (OZIP) on top of COMPRESS FOR QUERY – World’s fastest decompressor, tuned for DB query performance Compression ratios can be highly inflated by choosing a bad uncompressed format, or reporting most compressible table. Our results are measured relative to Oracle’s efficient row format for customer data. . – 2x - 3x faster than LZO (standard for fast zip) – Further optimized on SPARC M7 via Software on Silicon Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 41 In-Memory Column Store Compression Mode Compression Factor Query Speed Typical Range Observed Max QUERY 2x-6x Up to 10x Fastest CAPACITY LOW 4x-9x Up to 20x Very Fast CAPACITY HIGH 7x-12x Up to 30x Fast Compression ratios can be highly inflated by choosing a bad uncompressed format, or reporting most compressible table. Our results are measured relative to Oracle’s efficient row format for customer data. . • MEMCOMPRESS FOR CAPACITY HIGH – Compress with emphasis on space-savings – Heavier weight decompression required before query processing – Trades off some performance for capacity – Extra 1.5-2x compression Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 42 Scale-Up for Maximum In-Memory Performance • Scale-Up on large SMPs • NUMA optimizations • Parallel Execution • SMP scaling removes overhead of distributing queries across servers • Memory interconnect far faster than any network Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 43 M6-32 Big Memory Machine World’s largest SMP System •32 TB DRAM •32 Sockets •384 Cores •3072 processing threads •3 Terabyte/sec Bandwidth Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 44 Scale-Out In-Memory Database to Any Size • Distribution allows in memory segments larger than single host memory ALTER TABLE sales INMEMORY DISTRIBUTE BY PARTITION; ALTER TABLE COSTS INMEMORY DISTRIBUTE AUTO; • Policy is automatic or user-specifiable • Controlled by DISTRIBUTE subclause • • • • Distribute by rowid range Distribute by partition Distribute by sub-partition Distribute AUTO (default) Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 45 Scale-Out: Distribute by Partition • Distribute by Partition (toplevel partition for composite partitioned tables) • Ideal for Hash Partitions ORDERS PARTITION BY HASH ON ORDER_ID 0 • Also for other partition types if uniformly accessed 1 • Allows in-memory partitionwise joins 4 2 3 …. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 46 Scale-Out: Distribute by Sub-Partition • For composite partitions, can distribute by Sub-Partition • Ideal for Hash Sub-Partitions • Also for other sub-partition types if uniformly accessed • Allows in-memory partitionwise joins ORDERS PARTITION BY RANGE ON ORDER_DATE SUBPARTITION BY HASH ON ORDER_ID Nov ‘13 1 Nov ‘13 2 Nov ’13 3 Nov ’13 4 Dec ‘13 1 …. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 47 Scale-Out: Distribute by Rowid Range • Distributes IMCUs by uniform hash on first rowid • For non-partitioned tables • Also for partitioned tables with skewed access across partitions • Ensures uniform distribution of load across instances ORDERS Rowid Ranges 1-105 106-201 202-310 311-421. 422-535 …. Copyright © 2014, Oracle and/or its affiliates. All rights reserved. | 48