Slides PDF - Spark Summit

Spark Integration Into
an Enterprise Stack
Open Source Successes & Challenges
About the presenter
Konstantin (Cos) Boudnik
Director of WANdisco Bigdata Engineering
In charge of delivering company’s enterprise grade NonStop Hadoop solution
ASF Hadoop, MRUnit committer
ASF Bigtop’s co-author
Spark/Shark contributor
Apply with caution: highly abrasive (according to most - now former - managers)
Shark Integration: Challenges and Lessons Learnt / page 2
Open-source is a force of natural evolution
Anarchy: ἀν + ἀρχός (an + arkhos) without a ruler
Most apparent characteristics:
Fail-fast on your own dime
Hard or impossible to control by authority (!)
Resistant to political correctness bias (aka political bulls#$t)
Creates huge competitive advantage
Resulting in
Highly successful projects
Innovations up to the limit
Technologically disruptive
Rules the world (once matured)
Empirical evidences:
Everything on the planet is “Powered by Linux”
“Bad” news: Android market share will never double again
Firefox is THE web-browser of the world
I ran out of the slide space and my time slot is limited...
Shark Integration: Challenges and Lessons Learnt / page 3
What “open source” often-time is
I am not bashing the open-source: it is my bread & butter
Open => anyone can do what they’re most interested in doing
Innovative => creates formats & standards as it goes; abandon them in passing
Stable => we’ll fix it in the next release,
Backward compatible => we might break it, but we’ll fix it
Fault tolerant and, at least, highly available => if you configure the hell out of it
Configuration management => shall scripts or Python to generate configuration
Deployment management (packages and Puppet) => here’s your tarball
Supported (there’s a throat to choke) => “Gone fishing!”
Secure => million eyeballs will find all you bugs in no time
Shark Integration: Challenges and Lessons Learnt / page 4
Let’s call spade a spade
What “enterprise grade” really is
Compatible with standards, scalable
Stable: features set, release schedules, bug fixes, upgrades
Backward compatible with itself
Fault tolerant and, at least, highly available
Configuration management (you know your environment)
Deployment management (packages and Puppet)
Supported (there’s a throat to choke)
… and more
Shark Integration: Challenges and Lessons Learnt / page 5
The goals are aligned. How about semantics?
The devils is in the details
Open Source
Compatible with standards
Bugs get fixed;
“works for me”
- not a single change since 1867
We have all cool features
Backward compatible
Easy upgrade to next release;
fixed on “trunk”
Year 2013:
- we have to run on JDK1.3
Fault tolerant & HA
Let’s restart damn thing
$100m/min in downtime costs
Configuration Mgmt
A script, or sketchy docs
Change of control, puppet, etc.
Deployment Mgmt
A tarball
Staging environments,
long upgrade paths
A throat to choke
Shark Integration: Challenges and Lessons Learnt / page 6
Case study: major telecom SI
What we have built
Open JDK7
Hive 0.11’ish
It is 3 light years ahead of Hive 0.9 and 5 light years behind an enterprise grade
Spark 0.8
Guess what? Not everybody are in love with Larry Ellison
Hello Apache Incubation!
Shark 0.8’ish
Shark Integration: Challenges and Lessons Learnt / page 7
How the stack looks like?
What it implies for the development and customers alike
Shark Integration: Challenges and Lessons Learnt / page 8
Fixes that span multiple components
Memory leaks: JobConf hold by ThreadLocal
Shark Integration: Challenges and Lessons Learnt / page 9
What does it mean?
Semantic and toolset barriers between JVM languages
Shark Integration: Challenges and Lessons Learnt / page 10
Unsynchronized release trains
Upstream components live their own lives oftentimes
Shark Integration: Challenges and Lessons Learnt / page 11
Impatient Customers
I want everything on the menu! NOW!
Shark Integration: Challenges and Lessons Learnt / page 12
What else can possibly go wrong?
“Hold my beer!” (Famous last words)
Shark Integration: Challenges and Lessons Learnt / page 13
Lessons learnt & principles applied
“What to do, what to do?” (r. Bender)
Proper system integration
Git & well-thought branching model
ASF Bigtop as the integration point
Close collaboration with open source community
All fixes and features are offered to appropriate projects; most are accepted
Tireless and careful back-poring
Continuous Integration and Delivery
Simplifying development where is possible
Switch from “org.apache.hive” to “edu.berkeley.cs.shark”
Keep open your version control system
Education and expectations management
“released” in open-source not always means “usable in the datacenter”
Shark Integration: Challenges and Lessons Learnt / page 14
Thank you
Contact: Samantha Leggat | t: 925.396.1194 |
WANdisco, Bishop Ranch 8, 5000 Executive Pkwy, Suite 270, San Ramon, CA 94583