Vom Experiment zur Produktion
Mario Vosschmidt
Consulting Systems Engineer
1 © 2014 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only
BigData oder SmartData?
1) Was ist „BigData“
2) Anforderungen und Herausforderungen
3) Auf welche Szenarien konzentrieren wir uns?
4) Wie sehen Lösungsansätze aus?
5) Wie implementiere ich diese Lösungen?
6) Zusammenfassung
2 © 2014 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only
3
The 3V Paradigm
Multiple data sources
Multiple data formats
High speed processing
Fast changing requirements
Huge amounts of data
Process and persist
4
NetApp Confidential - Internal Use Only
5
6
A B C s of Big Data at Netapp
Insight from extremely large datasets
Secure boundless data storage
Performance for data intensive workloads
VISIBILITY
Peak of Inflated Expectations
Plateau of Productivity
Slope of Enlightenment
Trough of Disillusionment
Technology Trigger
TIME
35 Zettabytes
Estimated size of the digital universe in 2020
30 Billion
Pieces of new content to
Facebook per month
5 Billion
Smart phones
80%
Unstructured data
7
A Lot of Hype and Buzz – Everyone is Jumping In
200
150
100
50
400
350
300
250
0
Jan-08
Funding for Hadoop and NoSQL
Cloudera series B
MapR series A
451 Research
Cloudera series D
10gen series D
MapR series B
DataStax series B
Neo Technology series A
Opera Solutions series A
Platfora series A
Couchbase series C
Cloudera series C
Nov-11
Market is expected to grow from $3.2 billion in 2010 to $16.9 billion in 2015
Most firms are taking a pragmatic approach
Big data is in the very early stages of maturity
Best practices are not mature
IDC Big Data Survey
8
NetApp Confidential - Internal Use Only
"The Big Data market is expanding rapidly …
For technology buyers, opportunities exist to use Big Data technology to improve operational efficiency and to drive innovation.
Use cases are already present across industries and geographic regions."
Dan Vesset, Vice President, IDC
8
“Big Data” refers to datasets whose size is beyond the ability of typical tools to capture, store, manage and analyze
Information Becomes a Propellant to Business
Speed
Inflection
Point
Data Becomes a
Burden to IT Infrastructure
Complexity
Volume
2020
9
2010
Financial Services
Fraud detection & prevention
Anti-money laundering
Risk management
Government
Law enforcement
Counter-terrorism
Research and Education
10
Manufacturing
Supply chain optimization
Defect tracking
Root cause analysis
RFID correlation
Healthcare
Drug development
Patient Records
Evidence-based medicine
It’s the Value of Your Data
– Leverage their data assets into business advantage
5 Billion Records
Anywhere, Anytime
Faster time to market
50% Increase in Revenue
– Lower the cost of compliance
– Manage ever growing data efficiently
11
NetApp Confidential - Internal Use Only
Over 1PB of data
Growth of 175% YOY
90 days of data within
24 hours of a failure
13
Practical solutions that solve today’s problems
NetApp helps you turn your exploding data from threat to opportunity. Manage your data effectively and affordably.
Break through the limits. With
NetApp, you can take on even the most massive and complex data projects.
Turn insight to action. NetApp helps you get to clarity and insight faster and more reliably.
14
NetApp’s Largest Customer
100 PB
50 PB
10 Customers
20 PB
50 Customers
10 PB
100 Customers
Best of breed storage for Big Data
Applications
Built on open standards with bestin-class partnerships
Validated with ecosystem leaders
Complete server, network and storage
“Racks”
Delivered via trusted high-value partners
15
NetApp Confidential - Internal Use Only
15
16
Smart Data
Smart Data
/
Solutions partners include IBM, Oracle, Microsoft,
ParAccel, Exasol and SAND
Enterprise class Hadoop-based solutions
MapR, Hortonworks, Cloudera
Solutions for validated server, network and storage
1
7
18
Data Warehouse
Fast, space-efficient backup and recovery with storage utilization up to 90%. Less raw capacity with modular scalability
Mixed Use Database, Cubes
Optimized for IBM,
Oracle and Microsoft.
Simplified data management and protection. Zero down time
Hadoop
Enterprise class Hadoop with
Lower total cost of ownership and based on open standards
Some problems require and Enterprise Class Hadoop Solution
Enterprise Class Hadoop
Packaged ready-to-deploy modular
Compute / Memory intensive Hadoop cluster
Compute intensive applications
Tic Data Analysis
Extremely tight Service Level expectations
Severe financial consequences if the analytic run is late
Enterprise Class Hadoop
Packaged ready-to-deploy modular Hadoop cluster
The Data has intrinsic value $$$
Usable capacity must expand faster than compute
Higher storage performance
Real human consequences if the system fails
(Threats, treatments, financial losses)
System has to allow for asymmetric growth
White Box Hadoop
Values associated with early adopters of
Hadoop
Social Media Space
Contributors to Apache
Strong bias to JBOD
Skeptical of ALL vendors
Enterprise Class Hadoop
Bounded Compute algorithm / Memory intensive Hadoop cluster
Compute intensive applications
Additional CPUs do not improve run time
Extremely tight Service Level expectations
Severe financial consequences if the analytic run is late
Need for deeper storage per datanode
Storage Capacity
19
NetApp Confidential - Internal Use Only
Availability
NameNode is a single point of failure
Slow recovery from disk drive failure
Expensive process to replace failed disks online
Most common Hadoop support issue is disk drive failure
Operations
Requires three copies of data, larger footprint, and more storage
Limited flexibility; storage and servers tied together affects scalability
Low cluster efficiency, higher network congestion
Implementation
Need to keep up with fast-paced patches, projects of open source platform
Need to decide on distribution of Hadoop
Skills are not common
Integration with existing IT infrastructure can be difficult
Tuning expertise needed to make Hadoop perform optimally
20
Cisco and NetApp Confidential. For Internal Use Only. Do Not Distribute.
© 2014 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only
20
21 © 2014 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only
FlexPod
®
Express
MSB/Branch Office
For smaller, less-dynamic requirements and VAR velocity
App App App
FlexPod Data Center
Enterprise/Service Provider
Massively scalable shared virtual data center infrastructure
FlexPod Select
Dedicated
Big data analytics, scientific,
HPC
App App App App App App App
Compute Pool
Network Pool
Compute Pool
Network Pool
Compute
Nodes
Network / Direct
Storage Pool
Cisco UCS C-Series
Nexus ® 3K
FAS2xx0,
Two fixed pod sizes
Cisco UCS Director,
VMware ® , and Microsoft ®
Storage Pool
Cisco UCS C-Series/B-Series,
Nexus ® 5k
FAS Storage
Flexible pod sizes
FlexPod validated management and ecosystem
Storage
Cisco UCS C-Series
Nexus, Catalyst ® , MDS
E-Series, FAS
Reference architecture and/or designs
Application-based management
23
NetApp Confidential - Internal Use Only
Cisco UCS ®
C-Series Rack
Mount Servers
Cisco UCS Fabric
Interconnect
Cisco UCS
Manager
Converged big data platform from
NetApp and Cisco for Hadoop
Enterprise-class Hadoop: Innovative storage, servers, networking validated with leading Hadoop distributions
Faster time to value : Prevalidated configuration accelerates deployment
High availability : Less downtime, higher serviceability to meet tight SLAs around data applications and processes
Flexible scaling: Independently scale servers and storage; modular design for scaling as data needs grow
NetApp
®
FAS
Storage Systems
NetApp E-Series
Storage Array
* NetApp 50% Storage Guarantee http://www.netapp.com/us/solutions/infrastructure/virtualization/guarantee.html
24
NetApp and Cisco deliver enterprise class Hadoop for high availability, performance, scalability
Cloudera or Hortonworks Distribution of Hadoop
Master
…
…
Expansion
Architected for the enterprise
Superior NameNode protection
Faster recovery from failover
Lower cluster downtime
Faster time to value
Validated, presized configurations
Low-latency, high-bandwidth networking
12 DataNodes in master, 16 in expansion
Coexistence with current applications and infrastructure
Supports existing applications from
SAP, Microsoft, Oracle
Data management and monitoring with Cloudera Manager, Cisco UCS ®
Manager
26
27
High-Value Time-Sensitive Problems
Accelerate time to insights
Fast deployment with validated, preconfigured, reference designs
Store, process, analyze all data for new opportunities and business impact
More time to focus on data analysis rather than deal with cluster downtime
Making the Hadoop experience better
Optimized, tuned, fully configured cluster
Hadoop integrated with storage, compute, networking
Monitoring and management tools with SANtricity® and from partners (Cloudera Manager, Cisco UCS® Manager)
High density and capacity reduce data center footprint
Reduce risk in an open ecosystem
Compatibility with existing infrastructure and applications
Best-in-class partnerships, not entire stack from one vendor
Future-proof against lock-in and benefit from evolving ecosystem
FlexPod Select for
Hadoop with
Cloudera
28
Preconfigured – Pre-Vaildated
Phone home data representing information about the status NetApp storage controllers
Correlate disk latency (hot) with disk type
24 billion records
4 weeks to run query
Hadoop implementation 10.5 hours
Bug detection through pattern matching
240 billion records – Too large to run
Hadoop implementation 18 hours
30
Archiving & Indexing Tools
NetApp Hadoop Solution
DN DN DN DN
DN DN DN DN
Hadoop Distributed
File System (HDFS)
32
Agent Servers
AS AS AS
Remote Site
Collector Servers
CS CS CS
Central Site
Agent Servers
AS AS AS
Remote Site
The solution consists of an eight node Hadoop cluster at the core site. All the data from the remote sites are transported over WAN into the central site.
The data gets collected, ingested, compressed and archived into the Hadoop cluster via HDFS. The data is then categorized, put into separate containers, and indexed based on its record keeping tags.
Telco Industry
Provides wireless voice and data services globally
32
OLAP
OLTP
Mobile Devices
Location/GPS
Logs
Sensors
Applications
ETL
Other
Data
Source s
Reporting/Dashboard/Visualization
Applications
Analytics
Data Management
Storage File Systems
ETL
Content Shared Storage
Infrastructure
Storage
Data
Manageme nt
OLAP
(All other storage, i.e. internal DAS)
33
34
Full Motion Video Video Storage for Surveillance
Scalable density and performance to ingest and simultaneously analyze
UAV and satellite video data
High bandwidth & density supporting hundreds or thousands of HD cameras
Media Content Management
High ingest & play-out rates with support for media and entertainment workflows
HPC: Lustre, GPFS, BeeGfs
Massively parallel distributed file system for large scale cluster computing and
O&G Seismic Processing
Applications
Storage File Systems
Density
Reliability
Modularity
E-Series Storage
Performance
Efficiency
Flexibility
High bandwidth HD Video Ingest
• Satellite
• UAV
Full-Motion Video
Built on E-Stack
E5460 Stack
Quantum® StorNext File System
Massively Scalable
Single Data Container
Multi-Stream
Video Playout
•
Processing
• Exploitation
• Analyst
Viewing
Turnkey solution in a 40U industry-standard rack
Single architecture for ingest, exploitation and dissemination
1.8PB Raw Capacity
– 4000+ hours of uncompressed
720p HD video
>20 GB/s R/W Performance,
>30 GB/s Peak Performance
Scale to multiple Petabytes in a single data container
Performance to meet the needs of the world’s fastest
Supercomputers
High Bandwidth & Density
– 1.8PB & 30GB/s per
40U rack
Highly available
– No Single points of failure
– Extensive RAS features
NetApp provided 7x24 Lustre
Support
NetApp Professional Services
38
NetApp Confidential – Limited Use
Sequoia – announced as the fastest supercomputer and storage combination on the planet at ISC 2012
Supercomputer storage to support twenty thousand trillion arithmetic operations per second with access speeds up to 1 TB/sec
55PB of usable storage
Simulations for nuclear weapons viability
Counter Terrorism
Energy Security
Understanding Climate Change
Press Release: http://www.netapp.com/us/company/news/news-rel-20110928-990734.html
NetApp Confidential – Limited Use 39
Enhance public safety with better physical security
Industry trends are exploding storage
Analog to Digital
SD to HD
7 days to 30+ Days
Open Platform Solution
Best of breed industry partners
Flexible deployments
Modular scalability
99.999% up time
40
No servers required between cameras and storage
save HW/SW, licensing, footprint, very robust, save a lot of network cabling, easy to scale.
41
NetApp Confidential - Internal Use Only
Highly scalable digital repository
Consolidates collaborative production
Multi-format distribution workflows
Industry-leading bandwidth per rack to reduce bottlenecks
Highest capacity density to minimize power and cooling
Single namespace for multi-petabyte repositories
Unmatched breadth of production client support
42
NetApp Confidential – Limited Use
44 NetApp Confidential – Limited Use
File Services Enterprise Content Repository
Multi-application workloads
Non-disruptive operation
Integrated data protection, efficiency
Distributed Content Repository
Infinite container
Fixed content
Non-disruptive operation
Integrated data protection, efficiency
Large, multi-site repository
Policy based data management
Metadata-enabled object storage
45
NetApp Confidential – Limited Use
ONTAP Cluster Mode
46
Heterogeneous cluster:
A mix of controller types in a single cluster per workload needs
Entry, mid, and high-end platforms
Native and third-party storage
(FAS and V-Series)
Multiprotocol: NFS, pNFS, CIFS, iSCSI, FCP
Integrated Data Protection
Virtual storage tier:
Match data to disk price and performance
Manage multiple tiers in the same namespace or many
NetApp Confidential – Limited Use
ONTAP Cluster Mode with Infinite Volume
Single large content repository
Scales to PBs and billions of files across cluster
Native storage efficiency
Simplified operations
Multi-tenancy
Simplifies application workflows
Load balances data at ingest
Starts small, grow granularly
High availability
Protects against disk and hardware failures
Snapshots & Replication for quick recovery
Manage & Upgrade non-disruptively
47
Object Storage Insights
Flat Namespace
No filesystem hierarchy
Metadata separated
Not within data space
Metadata serve as descriptors
Can change over time
However Data is persistent
Objects referenced by ID
Index
Write once read many
Similar to library
Objects do not change
Single writer multiple readers
48
NetApp Confidential - Internal Use Only
Less data management overhead
High Metadata rates
Less space management
Data are replicated across Geos
Simplified rights management
StorageGRID
Large content repository for big, unstructured data
Billions of data sets, dozens of petabytes
Create, manage and consume content globally
Predictable access to data independent of location
Policy-controlled data stores at each site
Intelligent data classification and access
Metadata-based management
49
NAS
I/O
Object Ingest and Retrieval
NAS
Protocols
(SG 9)
HTTP API / CDMI
Metadata Tagging and Query
Global Object Namespace
Object-Level Data Management
Location-Transparent Distributed Object Store
Policy-Driven
Data Placement
Storage Systems
“We’ve increased the number of retail partners we work with from 2,000 to almost 20,000 in just a few years. In the past 6 years, we’ve seen a
1,900% increase in transactions. This plus the massive increase in digital images uploaded by consumers demanded a more robust and highly scalable storage infrastructure .”
–
Zach Wickes, Vice President of Technology, PNI
51
High-performance, scalable storage infrastructure built to support 17 million revenue-generating transactions annually
100% uptime even during peak holiday access when transaction increase 6 to 10 times
3PB of rich media data
Consumer access to 950 million digital images
20,000 worldwide retail locations, online fulfillment partners and in-store kiosks
WalMart Canada, Costco, Sam’s Club,
Tesco, CVS/pharmacy, and Kodak
NetApp FAS6280 and FAS3200, Data
ONTAP, and FlashCache
NetApp Confidential – Limited Use
52
STaaS offering for healthcare providers
Medical Image Archive Cloud
Two sites with ~1PB each
2TB+ local cache at each edge site
8x growth in capacity last 12 months
100% uptime since start of service
“Forever” retention policies
~60% of customers use hybrid cloud model
Solution offers a proven 100% up-time with automated data movement from on-premise to offpremise public clouds with “keep forever” retention policy and indefinite growth
Press Release: http://www.netapp.com/us/company/news/news-rel-20111128-36413.html
Planning and implementation expertise for Big Data
Turn-key solution stacks and Big Data services
Big Data System Integrators Solutions Built on
NetApp
®
53 NetApp Confidential – Limited Use
54 © 2014 NetApp, Inc. All rights reserved. NetApp Proprietary – Limited Use Only
55
Common Architecture
Software Solution
Solution Rack
+
Appliance
Application Packaging
Visualization
Analytics
Integration
Management
Efficiency
Validated Architecture
& SKUs
Infrastructure Integration
& Distribution
Operational Integration
& System Integrators
56
Enable enterprise customers to gain business advantage
Practical solutions proven to reduce complexity, increase efficiency and lower cost of ownership
Open standards based with bestin-class partnerships
For more information : http://www.netapp.com/us/company/leadership/big-data/
57
Strategic Assessment
Business goals
Data growth needs
Use case discovery
(partner delivery)
Consult
Solution architecture and design (NetApp delivery)
Deploy
Installation and implementation
(NetApp delivery)
Solution implementation
(partner delivery)
Support options:
Global support available from
NetApp and partners
NetApp Confidential - Internal Use Only