Soluzioni Big Data IBM - Open Source Day 2014

advertisement
Soluzioni Big Data IBM
Marco Giovacchini
Client Technical Specialist
Power Systems
Roma, 6 novembre 2014
In one minute...
2
Big Data Is All Data
The increasing volume, variety and velocity of data is straining client IT
infrastructures, that were never designed to handle this magnitude, complexity
or workload.
Volume
Data at Scale
Variety
Velocity
Veracity
Data in Many Forms
Data in Motion
Data Uncertainty
Access Matters
To get new levels of visibility
into customers and operations
Speed Matters
To accelerate insights in realtime at the point of impact
Availability Matters
To consistently deliver
insights to the people and
processes that need them
Infrastructure matters: Transforming businesses with
speed achieving new levels of insight…
To get new levels of
visibility into customers
and operations






Scalability (-in, -out, -up)
Efficient virtualization and
resource management
Optimized data storage
and access
Data tiering and
compression
Parallel processing
Optimized compute
Availability Matters
Speed Matters
Access Matters
To accelerate insights in
real-time at the point of
impact





Parallel processing
Low-latency resources,
such as flash memory
Ability to scale up, in and
out rapidly
Optimized systems brings
analytics accelerators
closer to systems of
record
Optimized hybrid inmemory analytics
Powere
d by
To consistently deliver
insights to the people
and processes that
need them





Low latency resources
Scalability (-in, -out, -up)
Self-healing capabilities
Enterprise-grade systems
software that provides
continuous management and
access
Single-site and multisite
clustering solutions
4
BigData Innovation Center * Politecnico di Milano & IBM
http://www.mip.polimi.it/mip/it/globahls/news/IBM-e-Politecnico-di-Milano.html
BigData Innovation Center * Politecnico di Milano & IBM
Politecnico Value Proposition:
IBM Value Proposition:






Largest Big Data and Analytics
end-to-end provider
Largest IT Private Research
Organization in the World
Analytics
Internationalization and Global
Scale Working
Go-to-Market /Management
Technological Excellence




One of the main Technical
Universities in Europe
150 years old institution
Scientific Pre-eminence on
Engineering and Business
Management
Start-up/Spin-offs Incubator and
Business Clients
Specialized Knowledge
Joint capabilities and eminence
Grow cultural awareness, education and innovation on
Analytics
Support the usage of Analytics in businesses, both start-up’s
and corporate clients
Foster the establishment of new Analytics-related jobs
GPFS
Data Node
Management Node
GPFS
Data Node
GPFS
Data Node
Infosphere BigInsights = Hadoop + IBM Innovation
BigInsights includes the latest stable Open Source components,
enhanced by enterprise edition components
IBM InfoSphere BigInsights for Hadoop
Applications & Development
GPFS FPO
Open Source
IBM
* In Beta
Governance
HDFS
Data Privacy for Hadoop
File System
Flexible
Scheduler
HBase
Audit & History
Data Store
Adaptive MapReduce
Data Matching
MapReduce
Data Masking
Big SQL
Security
Pig
Data Security for Hadoop
Sqoop
LDAP
Hive
Kerberos
ETL
YARN*
HCatalog
Monitoring
Flume
Search
Jaql
Resource Management &
Administration
Streams
Enterprise
Search
Runtime
Text
Analytics
Solr/
Lucene
Data
Access
Big R
Console
Stream Computing
Advanced Analytics
R
Text Analytics
Extractors
Dashboard
Oozie
Charting
BigSheets Reader
and Macro
Eclipse Tooling:
MapReduce, Hive, Jaql, Pig,
Big SQL, AQL
BigSheets
ZooKeeper
Visualization & Ad
Hoc Analytics
IBM Solution for Hadoop – Power Systems Edition
Key requirements & design parameters – focused on customer value

Best-in-class hardware

Dense storage subsystem

Advanced software capabilities

Better reliability & management

Best in class file system

Automated cluster provisioning

IBM Platform Symphony
IBM InfoSphere BigInsights
or Open-source Hadoop
IBM Platform Symphony
IBM Platform Cluster Manager
Distributed File System
IBM Elastic Storage, HDFS
Linux Operating Environment
RHEL
IBM Power Systems
IBM Power 7+, Power8
9
Architecture Requirements Vary by Variety of Data
and Range of Analytics
POWER8 is Designed for Big Data
POWER8 – New Innovations that boost Performance
“POD–based” design: Standard Configurations
Summary:
POWER8 Delivers Faster Insights at Lower Cost
Backup Chart
Big Data & Analytics: POWER Systems
Power Linux
InfoSphere
BigInsights
BI applications
OLAP
Data
Warehouse
Dashboards
Spreadsheets
Cubes
Master Data
Predictive
Power AIX (linux)
DB2 10.5
BLU Acceleration
ETL
Data sources
(operational,
structured)
16
Data Integration
Data Quality
Data Delivery
Power Linux
InfoSphere
Streams
Access Matters
To get new levels of visibility into customers and
operations.
Infrastructure must enable shared and secured access to all
relevant data, no matter it’s type or where it resides.
Powered by
Obtain new levels of customer intimacy and differentiation with
shared and secure access to all relevant information no matter
what it is or where it resides.
IBM offers industry leading capabilities
Scalability (-in, -out, -up)
Efficient virtualization and resource management
Optimized data storage and access
Data tiering and compression
Parallel processing
Optimized compute
1 Unformatted raw disk capacity
Storage-dense
integrated big data
platform optimized
to simplify &
accelerate
unstructured big
data analytics
Speed Matters
To accelerate insights in real-time at the point of
impact.
Infrastructure must build intelligence into operational events
and transactions.
Optimize decisions in real-time by embedding intelligence into
operational processes using integrated high performance
infrastructure capabilities.
IBM offers industry leading capabilities
 Parallel processing
 Low-latency resources, such as flash memory
 Ability to scale up, in and out rapidly
 Optimized systems brings analytics accelerators closer to
systems of record
 Optimized hybrid in-memory analytics
1Based on STG Performance testing comparing to Cloudera/HP published benchmark
Higher ingest rates
delivers 37% faster
insights than competitive
Hadoop solutions with
31% fewer data nodes
Availability Matters
To consistently deliver insights to the people
and processes that need them.
Infrastructure must maximize the availability of information and
insights at the point of impact.
Empower employees with insights, when they need it,
maximizing right-time availability to improve collaboration to
solve problems and grow opportunities.
IBM offers industry leading capabilities
Low latency resources
Scalability (-in, -out, -up)
Self-healing capabilities
Enterprise-grade systems software that provides continuous
management and access
Single-site and multisite clustering solutions
Better
reliability and
resiliency
with 73%
fewer
outages and
92% fewer
performance
problems
over x86.
1 CLAIMS: Solitaire Interglobal Paper - Power Boost Your Big Data Analytics Strategy – http://www-03.ibm.com/systems/power/solutions/assets/bigdata-analytics.html?LNK=wf
IBM Solution for Hadoop – Power Systems Edition
Providing an agile solution optimized for time-critical big data workflows

Integrated big data platform optimized to simplify and
accelerate big data analytics, comprised of
–
–
–
Compute nodes: IBM P8 based Power Systems
Management software: IBM Platform™ Cluster Manager
Application software: IBM InfoSphere® BigInsights™


GPFS™
Contains: IBM Platform™ Symphony – Advanced Edition, IBM
GPFS™
Benefits
–
–
–
–
–
–
Complete: easy to procure, deploy, use and manage
Shorter time to results at lower TCO
Optimal application performance, robustness
Lower risk, based on IBM Reference Architecture and IBM
solution-level support
Pre-defined configurations
Runbook and automated installation scripts
Faster time to
insight,
right-sized for
your
business needs
Typical Biginsights deployment
“Linear growth” design
Big Data clusters are built using storagedense server offerings, with fixed
disk/core ratio (1:1 for most offerings)
Delivering faster time to value with an intuitive and
powerful solution
Clustered and
optimized
Highly evolved
building blocks
IBM InfoSphere
BigInsights
IBM Elastic
Storage
Delivered as an
integrated
solution
IBM Platform
Symphony
Family
IBM Platform
Cluster
Manager
© 2014 IBM Corporation
IBM Solutions for Big Data and Analytics – Detailed View
IBM Solutions for Big Data and Analytics – Detailed View
Download