Hortonworks

advertisement
Hortonworks
Architecting the Future of Big Data
Eric Baldeschwieler – CEO
twitter: @jeric14 (@hortonworks)
© Hortonworks Inc. 2011
June 29, 2011
About Hortonworks
• Mission: Revolutionize and commoditize the storage and
processing of big data via open source
• Vision: Half of the world’s data will be stored in Apache
Hadoop within five years
• Strategy: Grow the Apache Hadoop Ecosystem by making
Apache Hadoop easier to consume, profit by providing
training, support and certification
 An independent company
 Focused on making Apache Hadoop great
 Hold nothing back, Apache Hadoop will be complete
© Hortonworks Inc. 2011
3
Credentials
• Technical: key architects and committers from Yahoo! Hadoop
engineering team
− Highest concentration of Apache Hadoop committers
− Contributed >70% of the code in Hadoop, Pig and ZooKeeper
− Delivered every major/stable Apache Hadoop release since 0.1
− History of driving innovation across entire Apache Hadoop stack
− Experience managing world’s largest deployment
• Business operations: team of highly successful open source veterans
− Led by Rob Bearden, former COO of SpringSource & JBoss
• Investors: backed by Benchmark Capital and Yahoo!
− Benchmark was key investor in Red Hat, MySQL, SpringSource, Twitter & eBay
© Hortonworks Inc. 2011
4
Hortonworks and Yahoo!
• Yahoo! is a development partner
−Leverage large Yahoo! development, testing & operations team
 More than 1,000 active & sophisticated users of Apache Hadoop
 Access to the Yahoo! grid for testing large workloads
 Only organization that has delivered a stable release of Apache Hadoop
−Yahoo will continue to contribute Apache Hadoop code too!
• Yahoo! is a customer
−Hortonworks provides level 3 support and training to Yahoo!
−Yahoo deploys Apache Hadoop releases across its 42,000 grid
• Yahoo! is an investor
© Hortonworks Inc. 2011
5
Current State of Adoption
Enterprise Adoption
•
•
•
•
•
Early adopters
Technology is hard to install,
manage & use
Technology lacks enterprise
robustness
Requires significant
investment in technical staff
or consulting
Hard to find & hire
experienced developer &
operations talent
© Hortonworks Inc. 2011
Technology & Knowledge
Gaps Prevent Apache
Hadoop from Reaching Full
Potential
Customers are asking their
vendors for help with
Hadoop!
“We’re seeing Hadoop in all
of our fortune 2000 data
accounts”
6
Vendor Ecosystem
Adoption
•
•
•
Early in vendor adoption
lifecycle
Hadoop is hard to integrate
and extend
Hard to find & hire
experienced developer &
operations talent
Hortonworks Role & Opportunity
Bridge the Gap!
Grow Market
Enterprise
Adoption
Vendor Ecosystem
Adoption
Sell training and
support via
Partners
Fundamental shift in enterprise data architecture strategy
• Apache Hadoop becomes standard for managing new types & scale of data
• New applications & solutions will be created to leverage data in Apache Hadoop
• Creates massive big data technology and services opportunity for ecosystem
© Hortonworks Inc. 2011
7
Hortonworks Objectives
•
Make Apache Hadoop projects easier
to install, manage & use
− Regular sustaining releases
− Compiled code for each project (e.g. RPMs)
− Testing at scale
•
Make Apache Hadoop more robust
− Performance gains
− High availability
− Administration & monitoring
•
All done within Apache
Hadoop community
•
•
•
Develop collaboratively
with community
Complete transparency
All code contributed
back to Apache
Make Apache Hadoop easier to
integrate & extend
− Open APIs for extension & experimentation
Anyone should be able to easily deploy the Hadoop projects directly from Apache
© Hortonworks Inc. 2011
8
Technology Roadmap
Phase 1 – Making Apache Hadoop Accessible
• Release the most stable version of Hadoop ever
• Release directly usable code via Apache (RPMs, .debs…)
• Frequent sustaining releases off of the stable branches
2011
Phase 2 – Next Generation Apache Hadoop
• Address key product gaps (Hbase support, HA, Management…)
• Enable community & partner innovation via modular architecture &
open APIs
• Work with community to define integrated stack
2012
© Hortonworks Inc. 2011
9
(Alphas starting
Oct 2011)
Phase 2 - Next Generation Apache Hadoop
• Core
−
−
−
−
HDFS Federation
Next Gen MapReduce
New Write Pipeline (HBase support)
HA (no SPOF) and Wire compatibility
• Data - HCatalog 0.3
− Pig, Hive, MapReduce and Streaming as clients
− HDFS and HBase as storage systems
− Performance and storage improvements
• Management & Ease of use
− All components fully tested and deployable as a stack
− Stack installation and centralized config management
− REST and GUI for user tasks
© Hortonworks Inc. 2011
10
Phase 2 – Core - MapReduce
MapReduce App
Client
Resource
Manager
MPI App
Client
Zookeeper
(No SPOF)
MapReduce App
Compute Machine
Application Master
Application Worker
•
Complete rewrite of the resource management layer
•
Performance and Scale improvements
•
6,000+ nodes / 100,000 concurrent tasks
•
Supports better availability and fail-over
•
Supports new frameworks beyond MapReduce
© Hortonworks Inc. 2011
11
Namespace
Phase 2 – Core – HDFS Federation
NS1
Block storage
Foreig
n NS n
NS k
...
...
Pool k
Pool 1
•
NN-n
NN-k
NN-1
Block Pools
B
a
l
a
n
Datanode
1
c
...
e
r
Datanode 2
...
Common Storage
Multiple independent Namenodes and Namespace Volumes in a cluster
− Scalability (6K nodes, 100K clients, 120PB disk), Workload isolation support
− Client side mount tables for Global Namespace
•
Block storage as a generic shared storage service
− DataNodes store blocks for all Namespace volumes – no partitioning
− Non-HDFS namespaces (HBase, MR tmp and others) can share the same storage
© Hortonworks Inc. 2011
Pool n
12
Datanode m
...
Phase 2 – Core – HDFS Write Pipeline
• Limitations of HDFS write pipeline in 0.20
− Broken Flush, Sync, Append
− Node failures can cause data loss for slow writers
Client
DN
Flush Ack
DN
• Hadoop.Next
− Flush, Sync, and Append support
− New replicas are added dynamically on failures
© Hortonworks Inc. 2011
13
DN
Phase 2 – Data – HCatalog
Map
Reduce
Hive
Pig
Streaming
HCatalog
•
•
•
•
•
•
Shared schema and data model
Data can be shared between tool users
Data located by table rather than file
Clients independent of storage details
• format, compression, …
Only one adaptor for new formats
• not one per tool
Notifications when new data is
available
© Hortonworks Inc. 2011
14
= Phase 1
HDFS
HBase
= Phase 2
Hortonworks Value
For Enterprises
• Make Apache Hadoop
easier to consume
• Extend to broader
developer audience
• Foster vibrant
technology and
services ecosystem
• Access to
Hortonworks’
technical expertise
Confidential Information
For Vendors
• Create larger market
for Apache Hadoop
technology and
services
• Simplify process for
supporting Hadoop
• Access to
Hortonworks’
technical expertise
15
For
Community
• Ensure Apache
Hadoop remains
unified and strong
• Expand value provided
by core Apache
projects
• Foster additional
participation &
contributions from
ecosystem
Hortonworks Differentiation
• Unmatched domain expertise
− Delivered every major release of Apache Hadoop to date
− Critical mass of committers
• Community leadership role
− Setting direction for core projects
• Yahoo! commitment and backing
− Access to 1,000+ Hadoop engineers, Yahoo! grid
• Absolute dedication to Apache & open source
− Focused on making Apache Hadoop the standard
• Focus on delivering significant value to technology vendors
− ISVs, OEMs, Systems Integrators and other service providers
Confidential Information
16
Thank You.
© Hortonworks Inc. 2011
Download