InfiniDB Overview
What is InfiniDB?
•
•
•
•
•
•
Massively Parallel MySQL Storage Engine for Fast Analytics
Linear scale to handle exponential growth
Open-Source
Runs on premise, on AWS cloud or Hadoop HDFS cluster
Standard ANSI SQL compliance
First MySQL storage engine to support ANSI SQL11compliant windowing functions
Copyright © 2014 InfiniDB. All Rights Reserved.
Custom Handler Class
User Connections
MySQL Functions
•
MySQL Client
•
MySQL Connectivity (JDBC, ODBC)
•
MySQL Security
•
Initial SQL Statement Parsing
•
Initial SQL Optimization
< Custom Handler Class >
•
Execute final sort and final limit
•
Display final results
--------------------------------------------------------------------InfiniDB ExeMgr Functions
•
SQL Optimization
•
Distribute work for scan, filter, join,
functions, expressions, group by,
aggregation, etc. to the all available
Performance Modules to be run in
parallel.
•
Collect the results returned by the
Performance Modules
•
Return the final results to MySQL for
display
InfiniDB Server
User
Module
MySQL
----------------------InfiniDB ExeMgr
Performance
Module(s)
Storage
3
InfiniDB Design Principles
®
Scalable
Fast
4
Simple
InfiniDB Parallelism
 User Module – Processes SQL Requests
 Performance Module – Executes the Queries
Single Server
MPP
or
Copyright © 2014 InfiniDB. All Rights Reserved.
Tiered MPP Building Blocks
Module
Process
Functionality
Value
• Hosts MySQL
• Connection management
• SQL parsing & optimization
 Familiar DBMS interface
 Leverages existing partner integrations
 Delivers full SQL syntax support
Extent Map
• Abstracts physical and logical
storage
• Metadata store
 Enables shared nothing and shared
everything storage
 Enables partition elimination
 Built-in failover
ExeMgr
• Work distribution
• Final results management and
aggregation
 Independent scalability and tunable
concurrency
 Multi-threaded to take advantage of multicore HW platforms
MySQL
6
Tiered MPP Building Blocks
Module
Process
PrimProc
Data
Functionality
Value
• Scale-out cache management
• Distributed scan, filter, join and
aggregation operations
• Resource management
 Independent scalability and tunable
performance
 Multi-threaded to take advantage of multicore HW platforms
• High Speed Bulk Load
• Transactional DML and DDL
• Online schema extensions
 Enables concurrent reads and writes, nonblocking read enabled
 Multi-threaded to take advantage of multicore HW platforms
7
InfiniDB Foundation - Parallelism
•Purpose-built C++ engine
•Parallelism is at the thread level
•Example: 12 PM Servers with 8 cores each
yields 96 parallel processing engines.
•SQL is translated into thousands or tens of
thousands of discrete jobs or “primitives”.
•The UM sends primitives to the processing
engines.
8
InfiniDB Parallelism – Fixed Thread Pool
•User Module – Processes SQL Requests
•Performance Module – Executes the Queries
Single Server
MPP
Primitives are issued into a
thread queue within each
performance module.
Local disk / EBS
GlusterFS / HDFS
Copyright © 2014 InfiniDB. All Rights Reserved.
Architectural Differentiation
Greenplum, Netezza, etc
Parent
Process
Worker
Process
Parent
Process
Worker
Process
Database Layer 1
- Executing SQL
Worker
Process
Database Layer 2
- Executing SQL
Database Layer
- Executing SQL
Block Processing Layer
- Custom DoW
10
Architectural Differentiation
Greenplum, Netezza, etc
Parent
Process
Worker
Process
Parent
Process
Worker
Process
Worker
Process
Threads dedicated for the duration
of a query.
Threads operate from queue,
dedicated for a fraction of a second.
11
InfiniDB Design Principles
®
Scalable
Fast
12
Simple
Row-Oriented vs. Column-Oriented
Row-oriented: rows stored sequentially
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Column-oriented: each column is stored in
a separate file
Key
1
2
3
4
5
Fname
Bugs
Yosemite
Daffy
Elmer
Witch
Lname
Bunny
Sam
Duck
Fudd
Hazel
State
NY
CA
NY
ME
MA
Zip
11217
95389
10013
04578
01970
Phone
(718) 938-3235
(209) 375-6572
(212) 227-1810
(207) 882-7323
(978) 744-0991
Age
34
52
35
43
57
Sex
M
M
M
M
F
Each column for a given row is at the same offset.
Copyright © 2014 InfiniDB. All Rights Reserved.
2-Dimensional Data Partitioning
•Vertical Partitioning by Column
o Not Column-Family (no relation to HBase)
o Only do I/O for columns requested
•Horizontal Partitioning by range of rows
o Meta-data stored within in-memory structure
•10 TB of data maps to ~150k-300k discrete
files.
Copyright © 2014 InfiniDB. All Rights Reserved.
Column Restriction and Projection
|-------- Column # Seventeen -----------|
Extent # 27
Filter 3
Filter 2
Filter 1
|-------------- Column # Six ---------------|
|-------------- Column # Four ---------------|
Projection
Extent # 5
Projection
• Automatic Vertical Partitioning + Horizontal Partitioning
• Just-In-Time Materialization
15
InfiniDB Design Principles
®
Scalable
Fast
16
Simple
Simplicity – Automated Everything
Column storage
Compression /compression type
No index build or maintenance required
Extent Map partitioning – Vertical/ Horizontal
Distribution of data across server/disk
resources
Distribution of work
Ad-hoc performance
17
InfiniDB What’s New
®
• Open Source – GPL
v2
Fast
Simple
Scalable
• New Company Name
• Funding
• InfiniDB for Hadoop
• Windowing Analytic Functions
18
What is InfiniDB for Hadoop?
 Fast SQL for Hadoop offering for real-time and
ad-hoc reporting and analytics
 Non-map/reduce engine for real-time SQL
 40x to 100x faster than Hive
 SQL in Hadoop
 Reads and writes directly to HDFS/GPFS
 Best of breed SQL in Hadoop
 Superior ad-hoc usage, syntax vs. Impala/Presto
 MySQL Compatibility
 InfiniDB presents Hadoop as MySQL data source
InfiniDB Background – InfiniDB for Hadoop
 InfiniDB is a non-map/reduce engine
 Reads and writes natively to HDFS
Pig/Hive
HBase
Map Reduce
InfiniDB
for
Hadoop
Hadoop Distributed File System
20
Value Proposition For InfiniDB for Hadoop
 Enables access to Hadoop data via
familiar interface
 Response to competitive challenge
from Cloudera Impala
 Complete the Hadoop Checklist

Cost-effective storage

Robust transforms via map/reduce

 Real-time SQL for analytics with InfiniDB for Hadoop
Benchmark Hive, Presto, Impala, InfiniDB
http://infinidb.co/system/files/RadiantAdvisors_Benchmark_SQL-on-Hadoop_2014Q1.pdf
Copyright © 2014 InfiniDB. All Rights Reserved.
PARTITION and FRAME

For each row, calculation for an aggregation is done over a FRAME of rows

The PARTITION of a row is the group of rows that have a value for a
specific column same as the current row

FRAME for each row is a subset of a PARTITION for the row

SELECT x,y,sum(x) OVER (PARTITION BY y RANGE BETWEEN
CURRENT ROW AND UNBOUNDED FOLLOWING) FROM a
Row Number
X
Y
1
1
1
2
4
1
3
7
1
4
10
1
5
2
2
6
5
2
7
8
2
8
3
3
9
6
3
10
9
3
PARTITION
Partition for
rows 1 to 4
FRAME
Frame for
row 1
FRAME
FRAME
FRAME
Frame for row 2
sum(x) = 21
Frame for row 3
sum(x) = 17
Frame for row 4
sum(x) = 10
sum(x) =
22
Partition for
rows 5 to 7
Frame for
row 5
sum(x) =
15
Frame for row 6
sum(x) =
13
Frame for row 7
sum(x) = 8
Partition for
rows 8 to 10
Frame for
row 8
sum(x) =
18
Frame for row 9
sum(x) =
15
Frame for row 10
sum(x) = 9
23
InfiniDB Use Cases
®
Scalable
Fast
• Who is using it?
• When to use it?
24
Simple
InfiniDB Customers
Copyright © 2014 InfiniDB. All Rights Reserved.
InfiniDB’s place in the Big Data world
• Designed for high performance analytics
• Provides flexibility for ad hoc queries
 Not suited for OLTP, NoSQL, KeyValue
Copyright © 2014 Calpont. All Rights Reserved.
Workload – Query Vision/Scope
1
100 10,000
1,000,000
100,000,000 10,000,000,000
Query Vision/Scope
OLTP/NoSQL Workloads
Analytic Workloads
General DBMS missed the target
(dated database technology generally suboptimal)
Copyright © 2014 Calpont. All Rights Reserved.
What is your typical query?
1
100 10,000
1,000,000
100,000,000 10,000,000,000
Query Vision/Scope
OLTP/NoSQL Workloads
Analytic Workloads
• There is no “average” query.
• The challenges are at the extremes:
o The challenge of high concurrency levels with OLTP/NoSQL.
o The challenge of latency for very large queries.
• Most use cases imply multiple data technologies.
28
Columnar Appropriate Workloads
1
100 10,000
1,000,000
100,000,000 10,000,000,000
Query Vision/Scope
OLTP/NoSQL Workloads
ROLAP/Analytic/Reporting Workloads
Pure Columnar about
10x worse I/O for
single record lookups
Pure Columnar about
10x better I/O for large
data access patterns
29
Benefits of InfiniDB
Real-time, Consistent Query Performance
Linear Scale for Massive Data
Removes Limits to Dimensions and Granularity
Easy to Deploy and Maintain
30
Core Features of InfiniDB
 Scalable MPP architecture
 Performant ad hoc analysis
 Consistent query response time
 Simplified data administration
 Analytic window functions
 Native MySQL® driver support
 Open source license
 Deployable on premise, in the cloud, & on
Apache Hadoop™
 Optional Enterprise support subscription
Copyright © 2014 Calpont. All Rights Reserved.