Spark Integration Into an Enterprise Stack Open Source Successes & Challenges About the presenter Konstantin (Cos) Boudnik Zen-Empiricist Director of WANdisco Bigdata Engineering – In charge of delivering company’s enterprise grade NonStop Hadoop solution ASF Hadoop, MRUnit committer ASF Bigtop’s co-author Spark/Shark contributor Apply with caution: highly abrasive (according to most - now former - managers) Shark Integration: Challenges and Lessons Learnt / page 2 Open-source is a force of natural evolution Anarchy: ἀν + ἀρχός (an + arkhos) without a ruler Most apparent characteristics: – Fail-fast on your own dime – Hard or impossible to control by authority (!) – Resistant to political correctness bias (aka political bulls#$t) – Creates huge competitive advantage Resulting in – Highly successful projects – Innovations up to the limit – Technologically disruptive – Rules the world (once matured) Empirical evidences: – Everything on the planet is “Powered by Linux” – “Bad” news: Android market share will never double again – Firefox is THE web-browser of the world I ran out of the slide space and my time slot is limited... Shark Integration: Challenges and Lessons Learnt / page 3 What “open source” often-time is I am not bashing the open-source: it is my bread & butter Open => anyone can do what they’re most interested in doing Innovative => creates formats & standards as it goes; abandon them in passing Stable => we’ll fix it in the next release, Backward compatible => we might break it, but we’ll fix it Fault tolerant and, at least, highly available => if you configure the hell out of it Configuration management => shall scripts or Python to generate configuration Deployment management (packages and Puppet) => here’s your tarball Supported (there’s a throat to choke) => “Gone fishing!” Secure => million eyeballs will find all you bugs in no time Shark Integration: Challenges and Lessons Learnt / page 4 Let’s call spade a spade What “enterprise grade” really is Compatible with standards, scalable Stable: features set, release schedules, bug fixes, upgrades Backward compatible with itself Fault tolerant and, at least, highly available Configuration management (you know your environment) Deployment management (packages and Puppet) Supported (there’s a throat to choke) Secure … and more Shark Integration: Challenges and Lessons Learnt / page 5 The goals are aligned. How about semantics? The devils is in the details Characteristic Open Source Enterprise Open Agile Compatible with standards Stable Bugs get fixed; “works for me” RHEL: - not a single change since 1867 Innovative We have all cool features NaN Backward compatible Easy upgrade to next release; fixed on “trunk” Year 2013: - we have to run on JDK1.3 Fault tolerant & HA Let’s restart damn thing $100m/min in downtime costs Configuration Mgmt A script, or sketchy docs Change of control, puppet, etc. Deployment Mgmt A tarball Staging environments, long upgrade paths Supported mailto:dev@project.org A throat to choke Shark Integration: Challenges and Lessons Learnt / page 6 Case study: major telecom SI What we have built Open JDK7 – Hive 0.11’ish – It is 3 light years ahead of Hive 0.9 and 5 light years behind an enterprise grade Spark 0.8 – Guess what? Not everybody are in love with Larry Ellison Hello Apache Incubation! Shark 0.8’ish Shark Integration: Challenges and Lessons Learnt / page 7 How the stack looks like? What it implies for the development and customers alike Shark Integration: Challenges and Lessons Learnt / page 8 Fixes that span multiple components Memory leaks: JobConf hold by ThreadLocal Shark Integration: Challenges and Lessons Learnt / page 9 What does it mean? Semantic and toolset barriers between JVM languages それが何を意味している Shark Integration: Challenges and Lessons Learnt / page 10 Unsynchronized release trains Upstream components live their own lives oftentimes Shark Integration: Challenges and Lessons Learnt / page 11 Impatient Customers I want everything on the menu! NOW! Shark Integration: Challenges and Lessons Learnt / page 12 What else can possibly go wrong? “Hold my beer!” (Famous last words) Shark Integration: Challenges and Lessons Learnt / page 13 Lessons learnt & principles applied “What to do, what to do?” (r. Bender) Proper system integration – Git & well-thought branching model – ASF Bigtop as the integration point Close collaboration with open source community – All fixes and features are offered to appropriate projects; most are accepted Tireless and careful back-poring Continuous Integration and Delivery Simplifying development where is possible – Switch from “org.apache.hive” to “edu.berkeley.cs.shark” – Keep open your version control system Education and expectations management – “released” in open-source not always means “usable in the datacenter” Shark Integration: Challenges and Lessons Learnt / page 14 Thank you Konstantin.Boudnik@wandisco.com @c0sin Contact: Samantha Leggat | t: 925.396.1194 | samantha.leggat@wandisco.com WANdisco, Bishop Ranch 8, 5000 Executive Pkwy, Suite 270, San Ramon, CA 94583