Slides PDF - Spark Summit

advertisement
Spark Integration Into
an Enterprise Stack
Open Source Successes & Challenges
About the presenter
Konstantin (Cos) Boudnik

Zen-Empiricist

Director of WANdisco Bigdata Engineering
–
In charge of delivering company’s enterprise grade NonStop Hadoop solution

ASF Hadoop, MRUnit committer

ASF Bigtop’s co-author

Spark/Shark contributor

Apply with caution: highly abrasive (according to most - now former - managers)
Shark Integration: Challenges and Lessons Learnt / page 2
Open-source is a force of natural evolution
Anarchy: ἀν + ἀρχός (an + arkhos) without a ruler




Most apparent characteristics:
–
Fail-fast on your own dime
–
Hard or impossible to control by authority (!)
–
Resistant to political correctness bias (aka political bulls#$t)
–
Creates huge competitive advantage
Resulting in
–
Highly successful projects
–
Innovations up to the limit
–
Technologically disruptive
–
Rules the world (once matured)
Empirical evidences:
–
Everything on the planet is “Powered by Linux”
–
“Bad” news: Android market share will never double again
–
Firefox is THE web-browser of the world
I ran out of the slide space and my time slot is limited...
Shark Integration: Challenges and Lessons Learnt / page 3
What “open source” often-time is
I am not bashing the open-source: it is my bread & butter

Open => anyone can do what they’re most interested in doing

Innovative => creates formats & standards as it goes; abandon them in passing

Stable => we’ll fix it in the next release,

Backward compatible => we might break it, but we’ll fix it

Fault tolerant and, at least, highly available => if you configure the hell out of it

Configuration management => shall scripts or Python to generate configuration

Deployment management (packages and Puppet) => here’s your tarball

Supported (there’s a throat to choke) => “Gone fishing!”

Secure => million eyeballs will find all you bugs in no time
Shark Integration: Challenges and Lessons Learnt / page 4
Let’s call spade a spade
What “enterprise grade” really is

Compatible with standards, scalable

Stable: features set, release schedules, bug fixes, upgrades

Backward compatible with itself

Fault tolerant and, at least, highly available

Configuration management (you know your environment)

Deployment management (packages and Puppet)

Supported (there’s a throat to choke)

Secure

… and more
Shark Integration: Challenges and Lessons Learnt / page 5
The goals are aligned. How about semantics?
The devils is in the details
Characteristic
Open Source
Enterprise
Open
Agile
Compatible with standards
Stable
Bugs get fixed;
“works for me”
RHEL:
- not a single change since 1867
Innovative
We have all cool features
NaN
Backward compatible
Easy upgrade to next release;
fixed on “trunk”
Year 2013:
- we have to run on JDK1.3
Fault tolerant & HA
Let’s restart damn thing
$100m/min in downtime costs
Configuration Mgmt
A script, or sketchy docs
Change of control, puppet, etc.
Deployment Mgmt
A tarball
Staging environments,
long upgrade paths
Supported
mailto:dev@project.org
A throat to choke
Shark Integration: Challenges and Lessons Learnt / page 6
Case study: major telecom SI
What we have built

Open JDK7
–

Hive 0.11’ish
–

It is 3 light years ahead of Hive 0.9 and 5 light years behind an enterprise grade
Spark 0.8
–

Guess what? Not everybody are in love with Larry Ellison
Hello Apache Incubation!
Shark 0.8’ish
Shark Integration: Challenges and Lessons Learnt / page 7
How the stack looks like?
What it implies for the development and customers alike
Shark Integration: Challenges and Lessons Learnt / page 8
Fixes that span multiple components
Memory leaks: JobConf hold by ThreadLocal
Shark Integration: Challenges and Lessons Learnt / page 9
What does it mean?
Semantic and toolset barriers between JVM languages
それが何を意味している
Shark Integration: Challenges and Lessons Learnt / page 10
Unsynchronized release trains
Upstream components live their own lives oftentimes
Shark Integration: Challenges and Lessons Learnt / page 11
Impatient Customers
I want everything on the menu! NOW!
Shark Integration: Challenges and Lessons Learnt / page 12
What else can possibly go wrong?
“Hold my beer!” (Famous last words)
Shark Integration: Challenges and Lessons Learnt / page 13
Lessons learnt & principles applied
“What to do, what to do?” (r. Bender)


Proper system integration
–
Git & well-thought branching model
–
ASF Bigtop as the integration point
Close collaboration with open source community
–
All fixes and features are offered to appropriate projects; most are accepted

Tireless and careful back-poring

Continuous Integration and Delivery

Simplifying development where is possible

–
Switch from “org.apache.hive” to “edu.berkeley.cs.shark”
–
Keep open your version control system
Education and expectations management
–
“released” in open-source not always means “usable in the datacenter”
Shark Integration: Challenges and Lessons Learnt / page 14
Thank you
Konstantin.Boudnik@wandisco.com
@c0sin
Contact: Samantha Leggat | t: 925.396.1194 | samantha.leggat@wandisco.com
WANdisco, Bishop Ranch 8, 5000 Executive Pkwy, Suite 270, San Ramon, CA 94583
Download