Why Spark on Hadoop Matters © 2014 MapR Technologies © 2014 MapR Technologies 1 MapR Overview Exponential Growth Top Ranked 3X 500+ Customers Cloud Leaders bookings Q1 ‘13 – Q1 ‘14 90% software licenses 80% of accounts expand 3X < 1% lifetime churn > $1B in incremental revenue generated by 1 customer © 2014 MapR Technologies 2 Rapidly Evolving Landscape Management Batch Tez* ML, Graph Spark APACHE HADOOP AND OSS ECOSYSTEM NoSQL & Data SQL Streaming Security Workflow Search Integrtn. & & Access Data Gov. Drill* Provision Cascading GraphX Shark Accumulo* Storm* Hue Savannah* Pig MLLib Impala Solr HttpFS Juju MR v1 & v2 Mahout Hive HBase Spark Streaming YARN EXECUTION ENGINES Flume Knox* Falcon* Whirr Sqoop Sentry* Oozie ZooKeeper DATA GOVERNANCE AND OPERATIONS MapR Data Platform * 2014 TIMELINE © 2014 MapR Technologies 3 The Complete Spark Stack on Hadoop Management Batch Tez* ML, Graph Spark APACHE HADOOP AND OSS ECOSYSTEM NoSQL & Data SQL Streaming Security Workflow Search Integrtn. & & Access Data Gov. Drill* Provision Cascading GraphX Shark Accumulo* Storm* Hue Savannah* Pig MLLib Impala Solr HttpFS Juju MR v1 & v2 Mahout Hive HBase Spark Streaming YARN EXECUTION ENGINES Flume Knox* Falcon* Whirr Sqoop Sentry* Oozie ZooKeeper DATA GOVERNANCE AND OPERATIONS MapR Data Platform * 2014 TIMELINE © 2014 MapR Technologies 4 A Winning Combination © 2014 MapR Technologies 5 Spark Advantages: • Easier APIs • Python, Scala, Java IN-MEMORY PERFORMANCE • Shark, ML, Streaming, GraphX EASE OF DEVELOPMENT • RDDs • DAGs Unify Processing COMBINE WORKFLOWS © 2014 MapR Technologies 6 Hadoop Advantages: UNLIMITED SCALE • Reliability • Multi-tenancy • Security WIDE RANGE OF APPLICATIONS • Multiple data sources • Multiple applications • Multiple users ENTERPRISE PLATFORM • Files • Databases • Semi-structured © 2014 MapR Technologies 7 The Combination of Spark on Hadoop UNLIMITED SCALE EASE OF DEVELOPMENT IN-MEMORY PERFORMANCE ENTERPRISE PLATFORM WIDE RANGE OF APPLICATIONS COMBINE WORKFLOWS Operational Applications Augmented by In-Memory Performance © 2014 MapR Technologies 8 Case Studies 2014 MapR Technologies ©© 2014 MapR Technologies 9 Industry Leading Ad-Targeting Platform • High performance analytics over MapR M7 NoSQL • Load from M7 table into RDD to augment scoring in real-time • Results fed back to M7 for other applications © 2014 MapR Technologies 10 Leading Pharma Company: NextGen Genomics Existing process takes several weeks to align chemical compounds with genes ADAM on Spark allows realignment in a few hours Geneticists can minimize engineering dependency © 2014 MapR Technologies 11 Cisco: Security Intelligence Operations Sensor data lands in M7 Spark Streaming on M7 for first check on known threats Data next processed on GraphX and Mahout Results queried using SQL via Shark and Impala © 2014 MapR Technologies 12 Insurance Giant: Addressing Health Care Regulations Patient information in M7 combined with clinical records to compute readmittance probability Process uses Spark with transactional data in M7 Insurance options decided in real-time on online portals © 2014 MapR Technologies 13 In Summary 2014 MapR Technologies ©© 2014 MapR Technologies 14 Spark on Hadoop gains traction for Real-time applications © 2014 MapR Technologies 15 Pick the Right Tool for the Job © 2014 MapR Technologies 16 MapR is Unbiased Open Source (a la Linux) • Open source distribution is about providing choice – Linux includes MySQL, PostgreSQL and SQLite – Linux includes Apache httpd, nginx and Lighttpd MapR Distribution for Hadoop Distribution C Distribution H Spark Spark (all of it) and Shark Spark only No Interactive SQL Shark, Impala, Drill, Hive/Tez One option (Impala) One option (Hive/Tez) Versions Hive 0.10, 0.11, 0.12, 0.13 Pig 0.11, 012 HBase 0.94, 0.98 One version One version © 2014 MapR Technologies 17 Thank you Engage with us! @mapr maprtech mapr-technologies MapR srivas@mapr.com maprtech © 2014 MapR Technologies 18