Soluzioni Big Data IBM Marco Giovacchini Client Technical Specialist Power Systems Roma, 6 novembre 2014 In one minute... 2 Big Data Is All Data The increasing volume, variety and velocity of data is straining client IT infrastructures, that were never designed to handle this magnitude, complexity or workload. Volume Data at Scale Variety Velocity Veracity Data in Many Forms Data in Motion Data Uncertainty Access Matters To get new levels of visibility into customers and operations Speed Matters To accelerate insights in realtime at the point of impact Availability Matters To consistently deliver insights to the people and processes that need them Infrastructure matters: Transforming businesses with speed achieving new levels of insight… To get new levels of visibility into customers and operations Scalability (-in, -out, -up) Efficient virtualization and resource management Optimized data storage and access Data tiering and compression Parallel processing Optimized compute Availability Matters Speed Matters Access Matters To accelerate insights in real-time at the point of impact Parallel processing Low-latency resources, such as flash memory Ability to scale up, in and out rapidly Optimized systems brings analytics accelerators closer to systems of record Optimized hybrid inmemory analytics Powere d by To consistently deliver insights to the people and processes that need them Low latency resources Scalability (-in, -out, -up) Self-healing capabilities Enterprise-grade systems software that provides continuous management and access Single-site and multisite clustering solutions 4 BigData Innovation Center * Politecnico di Milano & IBM http://www.mip.polimi.it/mip/it/globahls/news/IBM-e-Politecnico-di-Milano.html BigData Innovation Center * Politecnico di Milano & IBM Politecnico Value Proposition: IBM Value Proposition: Largest Big Data and Analytics end-to-end provider Largest IT Private Research Organization in the World Analytics Internationalization and Global Scale Working Go-to-Market /Management Technological Excellence One of the main Technical Universities in Europe 150 years old institution Scientific Pre-eminence on Engineering and Business Management Start-up/Spin-offs Incubator and Business Clients Specialized Knowledge Joint capabilities and eminence Grow cultural awareness, education and innovation on Analytics Support the usage of Analytics in businesses, both start-up’s and corporate clients Foster the establishment of new Analytics-related jobs GPFS Data Node Management Node GPFS Data Node GPFS Data Node Infosphere BigInsights = Hadoop + IBM Innovation BigInsights includes the latest stable Open Source components, enhanced by enterprise edition components IBM InfoSphere BigInsights for Hadoop Applications & Development GPFS FPO Open Source IBM * In Beta Governance HDFS Data Privacy for Hadoop File System Flexible Scheduler HBase Audit & History Data Store Adaptive MapReduce Data Matching MapReduce Data Masking Big SQL Security Pig Data Security for Hadoop Sqoop LDAP Hive Kerberos ETL YARN* HCatalog Monitoring Flume Search Jaql Resource Management & Administration Streams Enterprise Search Runtime Text Analytics Solr/ Lucene Data Access Big R Console Stream Computing Advanced Analytics R Text Analytics Extractors Dashboard Oozie Charting BigSheets Reader and Macro Eclipse Tooling: MapReduce, Hive, Jaql, Pig, Big SQL, AQL BigSheets ZooKeeper Visualization & Ad Hoc Analytics IBM Solution for Hadoop – Power Systems Edition Key requirements & design parameters – focused on customer value Best-in-class hardware Dense storage subsystem Advanced software capabilities Better reliability & management Best in class file system Automated cluster provisioning IBM Platform Symphony IBM InfoSphere BigInsights or Open-source Hadoop IBM Platform Symphony IBM Platform Cluster Manager Distributed File System IBM Elastic Storage, HDFS Linux Operating Environment RHEL IBM Power Systems IBM Power 7+, Power8 9 Architecture Requirements Vary by Variety of Data and Range of Analytics POWER8 is Designed for Big Data POWER8 – New Innovations that boost Performance “POD–based” design: Standard Configurations Summary: POWER8 Delivers Faster Insights at Lower Cost Backup Chart Big Data & Analytics: POWER Systems Power Linux InfoSphere BigInsights BI applications OLAP Data Warehouse Dashboards Spreadsheets Cubes Master Data Predictive Power AIX (linux) DB2 10.5 BLU Acceleration ETL Data sources (operational, structured) 16 Data Integration Data Quality Data Delivery Power Linux InfoSphere Streams Access Matters To get new levels of visibility into customers and operations. Infrastructure must enable shared and secured access to all relevant data, no matter it’s type or where it resides. Powered by Obtain new levels of customer intimacy and differentiation with shared and secure access to all relevant information no matter what it is or where it resides. IBM offers industry leading capabilities Scalability (-in, -out, -up) Efficient virtualization and resource management Optimized data storage and access Data tiering and compression Parallel processing Optimized compute 1 Unformatted raw disk capacity Storage-dense integrated big data platform optimized to simplify & accelerate unstructured big data analytics Speed Matters To accelerate insights in real-time at the point of impact. Infrastructure must build intelligence into operational events and transactions. Optimize decisions in real-time by embedding intelligence into operational processes using integrated high performance infrastructure capabilities. IBM offers industry leading capabilities Parallel processing Low-latency resources, such as flash memory Ability to scale up, in and out rapidly Optimized systems brings analytics accelerators closer to systems of record Optimized hybrid in-memory analytics 1Based on STG Performance testing comparing to Cloudera/HP published benchmark Higher ingest rates delivers 37% faster insights than competitive Hadoop solutions with 31% fewer data nodes Availability Matters To consistently deliver insights to the people and processes that need them. Infrastructure must maximize the availability of information and insights at the point of impact. Empower employees with insights, when they need it, maximizing right-time availability to improve collaboration to solve problems and grow opportunities. IBM offers industry leading capabilities Low latency resources Scalability (-in, -out, -up) Self-healing capabilities Enterprise-grade systems software that provides continuous management and access Single-site and multisite clustering solutions Better reliability and resiliency with 73% fewer outages and 92% fewer performance problems over x86. 1 CLAIMS: Solitaire Interglobal Paper - Power Boost Your Big Data Analytics Strategy – http://www-03.ibm.com/systems/power/solutions/assets/bigdata-analytics.html?LNK=wf IBM Solution for Hadoop – Power Systems Edition Providing an agile solution optimized for time-critical big data workflows Integrated big data platform optimized to simplify and accelerate big data analytics, comprised of – – – Compute nodes: IBM P8 based Power Systems Management software: IBM Platform™ Cluster Manager Application software: IBM InfoSphere® BigInsights™ GPFS™ Contains: IBM Platform™ Symphony – Advanced Edition, IBM GPFS™ Benefits – – – – – – Complete: easy to procure, deploy, use and manage Shorter time to results at lower TCO Optimal application performance, robustness Lower risk, based on IBM Reference Architecture and IBM solution-level support Pre-defined configurations Runbook and automated installation scripts Faster time to insight, right-sized for your business needs Typical Biginsights deployment “Linear growth” design Big Data clusters are built using storagedense server offerings, with fixed disk/core ratio (1:1 for most offerings) Delivering faster time to value with an intuitive and powerful solution Clustered and optimized Highly evolved building blocks IBM InfoSphere BigInsights IBM Elastic Storage Delivered as an integrated solution IBM Platform Symphony Family IBM Platform Cluster Manager © 2014 IBM Corporation IBM Solutions for Big Data and Analytics – Detailed View IBM Solutions for Big Data and Analytics – Detailed View