InfiniDB_Overview_Q2_2014

InfiniDB Overview What is InfiniDB? • • • • • • Massively Parallel MySQL Storage Engine for Fast Analytics Linear scale to handle exponential growth Open-Source Runs on premise, on AWS cloud or Hadoop HDFS cluster Standard ANSI SQL compliance First MySQL storage engine to support ANSI SQL11compliant windowing functions Copyright © 2014 InfiniDB. All Rights Reserved. Custom Handler Class User Connections MySQL Functions • MySQL Client • MySQL Connectivity (JDBC, ODBC) • MySQL Security • Initial SQL Statement Parsing • Initial SQL Optimization < Custom Handler Class > • Execute final sort and final limit • Display final results --------------------------------------------------------------------InfiniDB ExeMgr Functions • SQL Optimization • Distribute work for scan, filter, join, functions, expressions, group by, aggregation, etc. to the all available Performance Modules to be run in parallel. • Collect the results returned by the Performance Modules • Return the final results to MySQL for display InfiniDB Server User Module MySQL ----------------------InfiniDB ExeMgr Performance Module(s) Storage 3 InfiniDB Design Principles ® Scalable Fast 4 Simple InfiniDB Parallelism  User Module – Processes SQL Requests  Performance Module – Executes the Queries Single Server MPP or Copyright © 2014 InfiniDB. All Rights Reserved. Tiered MPP Building Blocks Module Process Functionality Value • Hosts MySQL • Connection management • SQL parsing & optimization  Familiar DBMS interface  Leverages existing partner integrations  Delivers full SQL syntax support Extent Map • Abstracts physical and logical storage • Metadata store  Enables shared nothing and shared everything storage  Enables partition elimination  Built-in failover ExeMgr • Work distribution • Final results management and aggregation  Independent scalability and tunable concurrency  Multi-threaded to take advantage of multicore HW platforms MySQL 6 Tiered MPP Building Blocks Module Process PrimProc Data Functionality Value • Scale-out cache management • Distributed scan, filter, join and aggregation operations • Resource management  Independent scalability and tunable performance  Multi-threaded to take advantage of multicore HW platforms • High Speed Bulk Load • Transactional DML and DDL • Online schema extensions  Enables concurrent reads and writes, nonblocking read enabled  Multi-threaded to take advantage of multicore HW platforms 7 InfiniDB Foundation - Parallelism •Purpose-built C++ engine •Parallelism is at the thread level •Example: 12 PM Servers with 8 cores each yields 96 parallel processing engines. •SQL is translated into thousands or tens of thousands of discrete jobs or “primitives”. •The UM sends primitives to the processing engines. 8 InfiniDB Parallelism – Fixed Thread Pool •User Module – Processes SQL Requests •Performance Module – Executes the Queries Single Server MPP Primitives are issued into a thread queue within each performance module. Local disk / EBS GlusterFS / HDFS Copyright © 2014 InfiniDB. All Rights Reserved. Architectural Differentiation Greenplum, Netezza, etc Parent Process Worker Process Parent Process Worker Process Database Layer 1 - Executing SQL Worker Process Database Layer 2 - Executing SQL Database Layer - Executing SQL Block Processing Layer - Custom DoW 10 Architectural Differentiation Greenplum, Netezza, etc Parent Process Worker Process Parent Process Worker Process Worker Process Threads dedicated for the duration of a query. Threads operate from queue, dedicated for a fraction of a second. 11 InfiniDB Design Principles ® Scalable Fast 12 Simple Row-Oriented vs. Column-Oriented Row-oriented: rows stored sequentially Key 1 2 3 4 5 Fname Bugs Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NY CA NY ME MA Zip 11217 95389 10013 04578 01970 Phone (718) 938-3235 (209) 375-6572 (212) 227-1810 (207) 882-7323 (978) 744-0991 Age 34 52 35 43 57 Sex M M M M F Column-oriented: each column is stored in a separate file Key 1 2 3 4 5 Fname Bugs Yosemite Daffy Elmer Witch Lname Bunny Sam Duck Fudd Hazel State NY CA NY ME MA Zip 11217 95389 10013 04578 01970 Phone (718) 938-3235 (209) 375-6572 (212) 227-1810 (207) 882-7323 (978) 744-0991 Age 34 52 35 43 57 Sex M M M M F Each column for a given row is at the same offset. Copyright © 2014 InfiniDB. All Rights Reserved. 2-Dimensional Data Partitioning •Vertical Partitioning by Column o Not Column-Family (no relation to HBase) o Only do I/O for columns requested •Horizontal Partitioning by range of rows o Meta-data stored within in-memory structure •10 TB of data maps to ~150k-300k discrete files. Copyright © 2014 InfiniDB. All Rights Reserved. Column Restriction and Projection |-------- Column # Seventeen -----------| Extent # 27 Filter 3 Filter 2 Filter 1 |-------------- Column # Six ---------------| |-------------- Column # Four ---------------| Projection Extent # 5 Projection • Automatic Vertical Partitioning + Horizontal Partitioning • Just-In-Time Materialization 15 InfiniDB Design Principles ® Scalable Fast 16 Simple Simplicity – Automated Everything Column storage Compression /compression type No index build or maintenance required Extent Map partitioning – Vertical/ Horizontal Distribution of data across server/disk resources Distribution of work Ad-hoc performance 17 InfiniDB What’s New ® • Open Source – GPL v2 Fast Simple Scalable • New Company Name • Funding • InfiniDB for Hadoop • Windowing Analytic Functions 18 What is InfiniDB for Hadoop?  Fast SQL for Hadoop offering for real-time and ad-hoc reporting and analytics  Non-map/reduce engine for real-time SQL  40x to 100x faster than Hive  SQL in Hadoop  Reads and writes directly to HDFS/GPFS  Best of breed SQL in Hadoop  Superior ad-hoc usage, syntax vs. Impala/Presto  MySQL Compatibility  InfiniDB presents Hadoop as MySQL data source InfiniDB Background – InfiniDB for Hadoop  InfiniDB is a non-map/reduce engine  Reads and writes natively to HDFS Pig/Hive HBase Map Reduce InfiniDB for Hadoop Hadoop Distributed File System 20 Value Proposition For InfiniDB for Hadoop  Enables access to Hadoop data via familiar interface  Response to competitive challenge from Cloudera Impala  Complete the Hadoop Checklist  Cost-effective storage  Robust transforms via map/reduce   Real-time SQL for analytics with InfiniDB for Hadoop Benchmark Hive, Presto, Impala, InfiniDB http://infinidb.co/system/files/RadiantAdvisors_Benchmark_SQL-on-Hadoop_2014Q1.pdf Copyright © 2014 InfiniDB. All Rights Reserved. PARTITION and FRAME  For each row, calculation for an aggregation is done over a FRAME of rows  The PARTITION of a row is the group of rows that have a value for a specific column same as the current row  FRAME for each row is a subset of a PARTITION for the row  SELECT x,y,sum(x) OVER (PARTITION BY y RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) FROM a Row Number X Y 1 1 1 2 4 1 3 7 1 4 10 1 5 2 2 6 5 2 7 8 2 8 3 3 9 6 3 10 9 3 PARTITION Partition for rows 1 to 4 FRAME Frame for row 1 FRAME FRAME FRAME Frame for row 2 sum(x) = 21 Frame for row 3 sum(x) = 17 Frame for row 4 sum(x) = 10 sum(x) = 22 Partition for rows 5 to 7 Frame for row 5 sum(x) = 15 Frame for row 6 sum(x) = 13 Frame for row 7 sum(x) = 8 Partition for rows 8 to 10 Frame for row 8 sum(x) = 18 Frame for row 9 sum(x) = 15 Frame for row 10 sum(x) = 9 23 InfiniDB Use Cases ® Scalable Fast • Who is using it? • When to use it? 24 Simple InfiniDB Customers Copyright © 2014 InfiniDB. All Rights Reserved. InfiniDB’s place in the Big Data world • Designed for high performance analytics • Provides flexibility for ad hoc queries  Not suited for OLTP, NoSQL, KeyValue Copyright © 2014 Calpont. All Rights Reserved. Workload – Query Vision/Scope 1 100 10,000 1,000,000 100,000,000 10,000,000,000 Query Vision/Scope OLTP/NoSQL Workloads Analytic Workloads General DBMS missed the target (dated database technology generally suboptimal) Copyright © 2014 Calpont. All Rights Reserved. What is your typical query? 1 100 10,000 1,000,000 100,000,000 10,000,000,000 Query Vision/Scope OLTP/NoSQL Workloads Analytic Workloads • There is no “average” query. • The challenges are at the extremes: o The challenge of high concurrency levels with OLTP/NoSQL. o The challenge of latency for very large queries. • Most use cases imply multiple data technologies. 28 Columnar Appropriate Workloads 1 100 10,000 1,000,000 100,000,000 10,000,000,000 Query Vision/Scope OLTP/NoSQL Workloads ROLAP/Analytic/Reporting Workloads Pure Columnar about 10x worse I/O for single record lookups Pure Columnar about 10x better I/O for large data access patterns 29 Benefits of InfiniDB Real-time, Consistent Query Performance Linear Scale for Massive Data Removes Limits to Dimensions and Granularity Easy to Deploy and Maintain 30 Core Features of InfiniDB  Scalable MPP architecture  Performant ad hoc analysis  Consistent query response time  Simplified data administration  Analytic window functions  Native MySQL® driver support  Open source license  Deployable on premise, in the cloud, & on Apache Hadoop™  Optional Enterprise support subscription Copyright © 2014 Calpont. All Rights Reserved.

InfiniDB_Overview_Q2_2014

Related documents

Products

Support

InfiniDB_Overview_Q2_2014

Related documents

Add this document to collection(s)

Add this document to saved

Suggest us how to improve StudyLib