1% - Spark Summit

advertisement
Why Spark on Hadoop Matters
© 2014 MapR Technologies
© 2014 MapR Technologies
1
MapR Overview
Exponential
Growth
Top Ranked
3X
500+
Customers
Cloud Leaders
bookings Q1 ‘13 – Q1 ‘14
90%
software licenses
80%
of accounts expand 3X
< 1%
lifetime churn
> $1B
in incremental revenue
generated by 1 customer
© 2014 MapR Technologies
2
Rapidly Evolving Landscape
Management
Batch
Tez*
ML,
Graph
Spark
APACHE HADOOP AND OSS ECOSYSTEM
NoSQL &
Data
SQL
Streaming
Security Workflow
Search
Integrtn.
&
& Access
Data Gov.
Drill*
Provision
Cascading
GraphX
Shark
Accumulo*
Storm*
Hue
Savannah*
Pig
MLLib
Impala
Solr
HttpFS
Juju
MR v1 & v2
Mahout
Hive
HBase
Spark
Streaming
YARN
EXECUTION ENGINES
Flume
Knox*
Falcon*
Whirr
Sqoop
Sentry*
Oozie
ZooKeeper
DATA GOVERNANCE AND OPERATIONS
MapR Data Platform
* 2014 TIMELINE
© 2014 MapR Technologies
3
The Complete Spark Stack on Hadoop
Management
Batch
Tez*
ML,
Graph
Spark
APACHE HADOOP AND OSS ECOSYSTEM
NoSQL &
Data
SQL
Streaming
Security Workflow
Search
Integrtn.
&
& Access
Data Gov.
Drill*
Provision
Cascading
GraphX
Shark
Accumulo*
Storm*
Hue
Savannah*
Pig
MLLib
Impala
Solr
HttpFS
Juju
MR v1 & v2
Mahout
Hive
HBase
Spark
Streaming
YARN
EXECUTION ENGINES
Flume
Knox*
Falcon*
Whirr
Sqoop
Sentry*
Oozie
ZooKeeper
DATA GOVERNANCE AND OPERATIONS
MapR Data Platform
* 2014 TIMELINE
© 2014 MapR Technologies
4
A Winning
Combination
© 2014 MapR Technologies
5
Spark Advantages:
• Easier APIs
• Python, Scala, Java
IN-MEMORY
PERFORMANCE
• Shark, ML,
Streaming, GraphX
EASE OF
DEVELOPMENT
• RDDs
• DAGs Unify Processing
COMBINE
WORKFLOWS
© 2014 MapR Technologies
6
Hadoop Advantages:
UNLIMITED
SCALE
• Reliability
• Multi-tenancy
• Security
WIDE RANGE OF
APPLICATIONS
• Multiple data sources
• Multiple applications
• Multiple users
ENTERPRISE
PLATFORM
• Files
• Databases
• Semi-structured
© 2014 MapR Technologies
7
The Combination of Spark on Hadoop
UNLIMITED
SCALE
EASE OF
DEVELOPMENT
IN-MEMORY
PERFORMANCE
ENTERPRISE
PLATFORM
WIDE RANGE OF
APPLICATIONS
COMBINE
WORKFLOWS
Operational
Applications
Augmented by
In-Memory
Performance
© 2014 MapR Technologies
8
Case Studies
2014
MapR
Technologies
©©
2014
MapR
Technologies
9
Industry Leading Ad-Targeting Platform
• High performance
analytics over MapR M7
NoSQL
• Load from M7 table into
RDD to augment scoring
in real-time
• Results fed back to M7 for
other applications
© 2014 MapR Technologies
10
Leading Pharma Company:
NextGen Genomics
Existing process takes several weeks
to align chemical compounds with genes
ADAM on Spark allows
realignment in a few hours
Geneticists can minimize
engineering dependency
© 2014 MapR Technologies
11
Cisco: Security Intelligence Operations
Sensor data lands in M7
Spark Streaming on M7 for
first check on known threats
Data next processed on
GraphX and Mahout
Results queried using SQL
via Shark and Impala
© 2014 MapR Technologies
12
Insurance Giant:
Addressing Health
Care Regulations
Patient information in M7
combined with clinical
records to compute readmittance probability
Process uses Spark with
transactional data in M7
Insurance options decided in
real-time on online portals
© 2014 MapR Technologies
13
In Summary
2014
MapR
Technologies
©©
2014
MapR
Technologies
14
Spark on
Hadoop
gains
traction for
Real-time
applications
© 2014 MapR Technologies
15
Pick the
Right Tool
for the Job
© 2014 MapR Technologies
16
MapR is Unbiased Open Source (a la Linux)
• Open source distribution is about providing choice
– Linux includes MySQL, PostgreSQL and SQLite
– Linux includes Apache httpd, nginx and Lighttpd
MapR Distribution for Hadoop
Distribution C
Distribution H
Spark
Spark (all of it) and Shark
Spark only
No
Interactive SQL
Shark, Impala, Drill, Hive/Tez
One option
(Impala)
One option
(Hive/Tez)
Versions
Hive 0.10, 0.11, 0.12, 0.13
Pig 0.11, 012
HBase 0.94, 0.98
One version
One version
© 2014 MapR Technologies
17
Thank you
Engage with us!
@mapr
maprtech
mapr-technologies
MapR
srivas@mapr.com
maprtech
© 2014 MapR Technologies
18
Download