<Insert Picture Here> Oracle’s Big Data solutions Jean-Philippe Breysse Oracle Suisse The following is intended to outline our general product direction. It is intended for information purposes only, and may not be incorporated into any contract. It is not a commitment to deliver any material, code, or functionality, and should not be relied upon in making purchasing decisions. The development, release, and timing of any features or functionality described for Oracle’s products remain at the sole discretion of Oracle. USE CASE 3: LOGS ANALYSIS OF SERVERS Short Description : – Daily logs analysis Issues: – Find correlations on what drives to failures – Log files stored as flat files 4 Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Insert Information Protection Policy Classification from Slide 13 Unstructured Semi-structured Structured Oracle Technology mapped to Analytics Landscape Master & Reference Transactions Oracle Data Integrator Machine Generated Oracle 12g Files Oracle BI Enterprise Endeca MDEX Oracle NoSQL Oracle Hadoop HDFS Text, Image, Video, Audio Acquire Copyright © 2012, Oracle and/or its affiliates. All rights reserved. Oracle R Enterprise & Oracle Data Mining Oracle Times Ten Oracle Golden Gate Data 5 Oracle 12g Oracle Hadoop MapReduce Organize Insert Information Protection Policy Classification from Slide 13 Oracle Essbas e Analyze Oracle Real Time Decisions Oracle Endeca Information Discovery Decide Agenda • • • • • • Big Data Solution Spectrum Inside the Big Data Appliance Big Data Applications Software Big Data Analytics Conclusions <Insert Picture Here> Big Data Why Everyone Should Care Tapping into Diverse Data Sets Video and Images Big Data: Decisions based on all your data Documents Social Data Information Architectures Today: Decisions based on database data Machine-Generated Data Transactions A bit of history ... : Developed initially by Doug Cutting (Nutch Opensource websearch engine) and Yahoo -> inspired by Google’s papers on MapReduce and GFS (2003-2004) resulted in Apache Hadoop (2006) Amazon Dynamo (2007): distributed systems technologies Cassandra: was developed at Facebook (2008) to power their Inbox Search feature (columnar oriented distributed DB) based initially on Dynamo and Bigtable (built by Google) Voldemort: is a distributed data store that is designed as a key-value store used by LinkedIn for high-scalability storage (NoSql key value) Cloudera: . It contributes to Hadoop and related Apache projects and provides a commercial distribution of Hadoop 9 So What is Big Data Anyway? It’s a matter of perspective. Big Data is both: • LARGE AND VARIABLE DATASETS that are difficult for traditional database tools to easily manage – including datasets that once seemed not important or too problematic to deal with. Big Data datasets include: − Extremely large files of unstructured or semi-structured data − Large and highly distributed datasets that are otherwise difficult to manage as a single unit of information • NEW SET OF TECHNOLOGIES that can economically capture, store, manage, and extract value from Big Data datasets – thus facilitating better, more informed business decisions Structured Data vs. Unstructured Data Relational databases work best with structured data – data which has underlying structure (schema) and size that easily fits the specific confines of database columns and rows. Unstructured data is highly variable, lacks fixed structure, and is often too large to easily handle by RDBMS systems. Source: IDC Digital Universe Study, Extracting Value from Chaos, June 2011 (sponsored by EMC) 10 <Insert Picture Here> Drive Value from Big Data Building a Big Data Platform Divided Solution Spectrum Data Variety Unstructured NoSQL Distributed File Systems Schema-less Schema Transaction (Key-Value) Stores DBMS (OLTP) Acquire Flexible Specialized Developer Centric MapReduce Solutions ETL DBMS (DW) Organize Advanced Analytics Analyze SQL Trusted Secure Administered Hadoop to Oracle – Bridging the Gap Data Variety Unstructured HDFS Hadoop MapReduce Schema-less Cassandra Oracle Loader for Hadoop Schema RDBMS (OLTP) Acquire ETL RDBMS Advanced (DW) Analytics Organize Analyze Oracle Integrated Software Solution Data Variety Unstructured HDFS Schema-less Oracle NoSQL DB Oracle Analytics: Hadoop Oracle Loader for Hadoop Oracle Data Integrator Schema Oracle (OLTP) Acquire Oracle (DW) Organize Data Mining R Spatial Graph mapreduce OBI EE Analyze <Insert Picture Here> Inside the Big Data Appliance Overview Oracle Engineered Solutions Data Variety Unstructured Schema Information Density 16 Big HDFSData Appliance Hadoop • Hadoop • NoSQL Database Oracle Loader Oracle NoSQLLoader for • Oracle for hadoop Hadoop DB • Oracle Data Integrator Oracle Data Integrator Oracle Database (OLTP) Acquire Copyright © 2011, Oracle and/or its affiliates. All rights reserved. In-DB Analytics “R” Mining Text Graph Spatial Oracle ExadataOracle • OLTP & DW Database (DW) R • Data Mining & Oracle • Semantics • Spatial Organize Insert Information Protection Policy Classification from Slide 8 Exalytics • Speed of Thought Analytics Oracle BI EE Analyze Big Data Appliance Usage Model Oracle Big Data Appliance Oracle Exadata InfiniBand Stream Acquire Organize Oracle Exalytics InfiniBand Analyze & Visualize Oracle Big Data Appliance Hardware •18 Sun X4270 M2 Servers – 48 GB memory per node = 864 GB memory – 12 Intel cores per node = 216 cores – 24 TB storage per node = 432 TB storage •40 Gb p/sec InfiniBand •10 Gb p/sec Ethernet Scale Out to Infinity Scale out by connecting racks to each other using Infiniband • Expand up to eight racks without additional switches • Scale beyond eight racks by adding an additional switch Oracle Big Data Appliance Software • Oracle Enterprise Linux 5.6 • Oracle Hotspot Java VM • Cloudera’s Distribution including Apache Hadoop • Cloudera Manager • Open Source Distribution of R • Oracle NoSQL Database Community Edition <Insert Picture Here> Big Data Application Software Acquire New Information Key-Value Store Workloads • Large dynamic schema based data repositories • Data capture • • • • Web applications (click-through capture) Online retail Sensor/statistics/network capture (factory automation for example) Backup services for mobile devices • Data services • • • • Scalable authentication Real-time communication (MMS, SMS, routing) Personalization Social Networks Oracle NoSQL DB A distributed, scalable key-value database • Simple Data Model • Key-value pair with major+sub-key paradigm • Read/insert/update/delete operations • Scalability • Dynamic data partitioning and distribution • Optimized data access via intelligent driver Application Application NoSQLDB Driver NoSQLDB Driver • High availability • One or more replicas • Disaster recovery through location of replicas • Resilient to partition master failures • No single point of failure • Transparent load balancing • Reads from master or replicas • Driver is network topology & latency aware • Elastic (Planned for Release 2) • Online addition/removal of Storage Nodes • Automatic data redistribution Storage Nodes Storage Nodes Data Center A Data Center B <Insert Picture Here> Big Data Application Software Organizing Data for Analysis Oracle Loader for Hadoop Features • Load data into a partitioned or non-partitioned table – Single level, composite or interval partitioned table – Support for scalar datatypes of Oracle Database – Load into Oracle Database 11g Release 2 • Runs as a Hadoop job and supports standard options • Pre-partitions and sorts data on Hadoop • Online and offline load modes 30 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Oracle Loader for Hadoop INPUT 1 MAP MAP ORACLE LOADER FOR HADOOP MAP REDUCE REDUCE MAP MAP REDUCE MAP REDUCE REDUCE MAP MAP REDUCE REDUCE MAP MAP SHUFFLE /SORT MAP MAP MAP MAP REDUCE MAP REDUCE MAP REDUCE MAP SHUFFLE /SORT INPUT 2 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. MAP REDUCE MAP REDUCE MAP SHUFFLE /SORT 31 SHUFFLE /SORT SHUFFLE /SORT REDUCE Oracle Loader for Hadoop: Online Option Read target table metadata from the database Perform partitioning, sorting, and data conversion Connect to the database from reducer nodes, load into database partitions in parallel ORACLE LOADER FOR HADOOP MAP REDUCE MAP REDUCE MAP MAP REDUCE MAP REDUCE MAP 32 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. SHUFFLE /SORT SHUFFLE /SORT REDUCE Oracle Loader for Hadoop: Offline Option Read target table metadata from the database Perform partitioning, sorting, and data conversion Write from reducer nodes to Oracle Data Pump files ORACLE LOADER FOR HADOOP MAP REDUCE MAP REDUCE MAP MAP REDUCE MAP REDUCE MAP 33 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. SHUFFLE /SORT SHUFFLE /SORT REDUCE Import into the database in parallel using external table mechanism Selection Output Option for Use Case Oracle Loader for Hadoop Output Option Use Case Characteristics Online load with JDBC The simplest use case for non partitioned tables Online load with Direct Path Fast online load for partitioned tables Offline load with datapump files Fastest load method for external tables On Oracle Big Data Appliance Leave data on HDFS Direct HDFS Parallel access from database Import into database when needed 35 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. Automate Usage of Oracle Loader for Hadoop Oracle Data Integrator (ODI) • ODI has knowledge modules to – Generate data transformation code to run on Hive/Hadoop – Invoke Oracle Loader for Hadoop • Use the drag-and-drop interface in ODI to – Include invocation of Oracle Loader for Hadoop in any ODI packaged flow 36 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. 37 Copyright © 2011, Oracle and/or its affiliates. All rights reserved. <Insert Picture Here> Big Data Analytics Real Time Analytics Platform R Statistical Programming Language Open source language and environment Used for statistical computing and graphics Strength in easily producing publication-quality plots Highly extensible with open source community R packages <Insert Picture Here> Drive Value from Big Data Conclusions Big Data Appliance Big Data for the Enterprise • Optimized and Complete • Everything you need to store and integrate your lower information density data • Integrated with Oracle Exadata • Analyze all your data • Easy to Deploy • Risk Free, Quick Installation and Setup • Single Vendor Support • Full Oracle support for the entire system and software set Oracle Integrated Solution Stack for Big Data Oracle NoSQL Database Enterprise Applications ACQUIRE Hadoop (MapReduce) Oracle Loader for Hadoop Oracle Data Integrator ORGANIZE Data Warehouse In-Database Analytics HDFS ANALYZE Oracle Analytic Applications DECIDE Oracle: Big Data for the Enterprise • The most comprehensive solution • Includes everything needed to acquire, organize and analyze all your data • Optimized for Extreme Analytics • Deepest analytics portfolio with access to all data • Engineered to Work Together • Eliminate deployment risk and support risk • Enterprise Ready • Deliver extreme performance and scalability Questions