Towards a Web-scale Data Management Ecosystem
Demonstrated by SAP HANA
Stefan Bäuerle, Jonathan Dees, Franz Faerber, Wolfgang Lehner
Agenda
•
Motivation & Requirements
•
Different Processing Engines and Integration
•
Scale out edition engine
© 2015 SAP SE or an SAP affiliate company. All rights reserved.
Public
2
Application requirements for a modern DBMS
Different:
© 2015 SAP SE or an SAP affiliate company. All rights reserved.

data types

consumption models

data models

notions of consistency

application and query language

levels of scaling

hardware capabilities
Public
3
HANA Platform
© 2015 SAP SE or an SAP affiliate company. All rights reserved.
Public
4
HANA System
© 2015 SAP SE or an SAP affiliate company. All rights reserved.
Public
5
Beyond relational data processing (1/3)
• Integrate as deep as possible into the engine
Bringing OLAP and OLTP together
Data mining and prediction
Unstructured data
Planning extensions
© 2015 SAP SE or an SAP affiliate company. All rights reserved.
•
Proven: works in thousands of customer systems
•
Simplicity: get rid of extracts, loads and redundancy, one system
•
OLAP dominates OLTP in real world systems: optimize accordingly
•
Examples: Basked analysis, different forecasting algorithms…
•
Easy interaction with R and SAS
•
Support text search > 30 languages including:
•
Stemming, speech tagging, noun extractions, …
•
Classification, clustering, named entity recognition, sentinel analysis
•
Planning: Define and align business figures for foreseeable future
•
Data heavy operators like disaggregation or logical snapshots
Public
6
Beyond relational data processing (2/3)
Graph processing
Hierarchy processing
Geospatial processing &
Time series
© 2015 SAP SE or an SAP affiliate company. All rights reserved.
•
Real world business data often resembles graphs
•
Model as graph: More explicit and more efficient operators
•
Distance, siblings, shortest path, reachability, transitive closure, …
•
Special type of general graphs
•
Used by almost every business application
•
Support for time dependent and versioned hierarchies
•
Extended graph operators: level, neighbor, is_ancestor, …
•
Native relational data types
•
Existing compression techniques + powerful specializations for sensor data
•
Spatial: WithinDistance, Contains, Area, …
•
Time series: Group by time interval, Interpolate Missing Values, …
Public
7
Beyond relational data processing (3/3)
Scientific processing
No SQL processing
Massive scale out
© 2015 SAP SE or an SAP affiliate company. All rights reserved.
•
Bring prominent operators into the engine
•
Simplifies and speeds up operations in scientific and financial area
•
Matrix operators: Eigenvalue, Multiply, …
•
Financial operators: Interest Rates, GarmanKohlagenProcess, …
•
Document based models, XML, JSON, …
•
Key value stores
•
Flexible Schema, in HANA via specific flexible table type
•
Conventional business applications fit on single box, but:
there is a new kind of applications requiring massive scale out
•
Deep and seamless integration with the Hadoop system
•
Scale out and single box application act as one system
Public
8
Application integration ( examples )
© 2015 SAP SE or an SAP affiliate company. All rights reserved.

Currency conversion

Hierarchy handling

Aging / dynamic tiering

Dictionary maintenance

Graph optimizations
Public
9
HANA Data Platform
Dynamic Tiering
HANA Dynamic Tiering
 Declare table to use disk storage
 Cost efficient for big data
 Optimized disk based processing powered by IQ
New warm option beside
 Hot (in-memory)
 Cold (Near Linear Storage)
CREATE TABLE „demo“.“SalesOrders_WARM“ (
ID
Integer NOT NULL,
CustomerID
Integer NOT NULL,
OrderDate
date
NOT NULL,
…,
PRIMARY KEY (id)
) USING EXTENDED STORAGE;
INSERT INTO „demo“.“SalesOrders_WARM“ VALUES ( … );
© 2015 SAP SE or an SAP affiliate company. All rights reserved.
Public
10
HANA Data Platform
BigData | Vision
HANA Data Management Platform
Information Management | Text | Search | Graph | Geospatial | Predictive
SAP HANA
In-Memory
HANA
Dynamic Tiering
0.1sec
Instant Results
Warm Data
∞
 Dynamic Tiering
 Smart Data Streaming
 NoSQL | Graph | Geo |
TimeSeries
HADOOP
HANA Scale Out
Infinite Storage
Raw Data
Smart Data Streaming
Administration | Monitoring | Operations | User Management | Security
© 2015 SAP SE or an SAP affiliate company. All rights reserved.
HANA native BigData
HANA & Hadoop




SDA  Hive | Spark
MapReduce | HDFS
Admin & Monitoring
User Mgmt / Security
Hadoop Extension
 Velocity Engine
 Integrated with HANA and
Hadoop
Public
11
SAP HANA Massive Scale Out Edition (Velocity)
Motivation:
•
Engine for massive scale out and big data
Key Features:
•
Scale to thousands of nodes
• Different data freshness and consistency levels
• Efficient fail safety design
•
First class citizen within Hadoop (Spark)
•
Support variety of hardware and operating systems
•
Extreme query performance by compiling SQL to native code
© 2015 SAP SE or an SAP affiliate company. All rights reserved.
Public
12
SAP HANA SOE (Velocity) and Hadoop (1/2)
Hadoop Ecosystem
Ambari Cluster Management
MLib
Machine
Learning
Hive
SQL
SparkSQL
SQL
Yarn Processing
Spark Processing
HDFS Distributed File System
© 2015 SAP SE or an SAP affiliate company. All rights reserved.
HBase
Database
Zookeeper
Coordination
Pig
Scripting
Public
13
SAP HANA SOE (Velocity) and Hadoop (2/2)
Steps
 Stage 1: Integration
with Spark (2015)
 Stage 2: Independent
execution cluster
Benefits
 Integration of SAP data
with data lakes
 HANA features add Value
into Hadoop
(e.g. SQL extensions like
time series, hierarchies, …)
 Performance
 Holistic data platform
© 2015 SAP SE or an SAP affiliate company. All rights reserved.
Public
14
Architecture to Support Different Data Freshness Levels
•
Separate component for
Transactions
•
Options
• read your own writes
• up-to-date data vs. certain age
DQP
R
R
Connection 1
(Session data)
R
Transaction Broker
Version Table
Query Engine 2
A, D
R
DTX
…
Connection n
Storage
1
Query Engine 1
A, B, C
Storage … Storage
2
n
Distributed Log
Query Engine 3
A, C, D
…
R
Storage (checkpoints)
© 2015 SAP SE or an SAP affiliate company. All rights reserved.
Public
15
SAP HANA scale out integration
© 2015 SAP SE or an SAP affiliate company. All rights reserved.
Public
16
Conclusion
•
Today’s applications have multidimensional set of specialized requirements
•
Gains from moving these requirements into a (single) DBMS:
• Simplified and more explicit data modeling and processing for applications
• Increased performance
• No complicated data transfer between specialized engines
•
Powerful orchestration required
•
Web-scale processing is key to support new applications
SAP HANA strives to answer all these requirements in a single data management platform.
© 2015 SAP SE or an SAP affiliate company. All rights reserved.
Public
17
Thank you
© 2015 SAP SE or an SAP affiliate company. All rights reserved.