Real-time and Long-time with Storm and Hadoop ©MapR Technologies - Confidential 1 Real-time and Long-time with Storm and Hadoop MapR ©MapR Technologies - Confidential 2 Contact: – – Slides and such: – tdunning@maprtech.com @ted_dunning http://info.mapr.com/ted-uk-05-2012 Hash tag: #mapr_uk Collective notes: http://bit.ly/JDCRhc ©MapR Technologies - Confidential 3 Company Background MapR provides the industry’s best Hadoop Distribution – Combines the best of the Hadoop community contributions with significant internally financed infrastructure development Background of Team Deep management bench with extensive analytic, storage, virtualization, and open source experience – Google, EMC, Cisco, VMWare, Network Appliance, IBM, Microsoft, Apache Foundation, Aster Data, Brio, ParAccel – Proven – – – MapR used across industries (Financial Services, Media, Telcom, Health Care, Internet Services, Government) Strategic OEM relationship with EMC and Cisco Over 1,000 installs ©MapR Technologies - Confidential 4 Expanding Hadoop Use Cases NFS for filebased applications Hadoop APIs for Hadoop Applications Mission Critical and SLA dependent Applications Real-time Applications Blue = MapR Innovations ©MapR Technologies - Confidential ODBC (JDBC) for SQL-based applications 5 MapR’s Complete Distribution for Apache Hadoop Integrated, tested, hardened and Supported 100% Hadoop, HBase, HDFS API compatible Easy portability/ migration between distributions Unique advanced features No changes required to Hadoop applications Runs on commodity hardware MapR Control System MapR Heatmap™ Hive LDAP, NIS Integration Pig Oozle Mahout Cascading Direct Access NFS Sqoop HBase Naglos Ganglia Integration Integration Real-Time Streaming Volumes Mirrors No NameNode Architecture CLI, REST APT Quotas, Alerts, Alarms Snapshots High Performance Direct Shuffle 6 Flume Zookeeper Data Placement Stateful Failover and Self Healing MapR’s Storage 2.7 Services™ ©MapR Technologies - Confidential Whirr So what about that real-time stuff? ©MapR Technologies - Confidential 7 The Challenge Hadoop is great of processing vats of data – Storm is great for real-time processing – But sucks for real-time (by design!) But lacks any way to deal with batch processing It sounds like there isn’t a solution – Neither fashionable solution handles everything ©MapR Technologies - Confidential 8 This is not a problem. It’s an opportunity! ©MapR Technologies - Confidential 9 Hadoop is Not Very Real-time Unprocessed Data now t Fully Latest full processed period ©MapR Technologies - Confidential 10 Hadoop job takes this long for this data Need to Plug the Hole in Hadoop We have real-time data with limited state – – We also have long-term analytics with lots of state – – Exactly what Storm does And what Hadoop does not Exactly what Hadoop does And what Storm does not Can Storm and Hadoop be combined? ©MapR Technologies - Confidential 11 Real-time and Long-time together Blended View view now t Hadoop works great back here ©MapR Technologies - Confidential 12 Storm works here An Example I want to know how many queries I get – Per second, minute, day, week Results should be available – – within <2 seconds 99.9+% of the time within 30 seconds almost always History should last >3 years Should work for 0.001 q/s up to 100,000 q/s Failure tolerant, yadda, yadda ©MapR Technologies - Confidential 13 Rough Design – Data Flow Search Engine Query QueryEvent Event Spout Spout Counter Counter Bolt Bolt Logger Logger Bolt Bolt Logger Bolt Semi Agg Snap Hadoop Aggregator Raw Logs Long agg ©MapR Technologies - Confidential 14 Counter Bolt Detail Input: Labels to count Output: Short-term semi-aggregated counts – (time-window, label, count) Input is logged until next flush Non-zero counts emitted on flush if – – event count reaches threshold (typical 100K) time since last count reaches threshold (typical 1-10s) Tuples acked when counts emitted Double count probability is > 0 but very small ©MapR Technologies - Confidential 15 Counter Bolt Counterintuitivity Counts are emitted for same label, same time window many times – – – – these are semi-aggregated this is a feature tuples can be acked within 1s time windows can be much longer than 1s No need to send same label to same bolt – speeds failure recovery ©MapR Technologies - Confidential 16 Design Flexibility Tuples can be ack’ed as soon as they hit the log – – Count flush interval can be extended without extending tuple timeout – counter can recover state on failure log is burn after write Decreases currency of counts in semi-aggregates Total bandwidth for log is typically not huge – All of twitter @10,000 messages per second = 10K x 2KB = 20MB/s ©MapR Technologies - Confidential 17 Counter Bolt No-nos Cannot accumulate entire period in-memory – – – Tuples must be ack’ed much sooner State must be persisted before ack’ing State can easily grow too large to handle without disk access Cannot persist entire count table at once – Incremental persistence required ©MapR Technologies - Confidential 18 Guarantees Counter output volume is small-ish – – the greater of k tuples per 100K inputs or k tuple/s 1 tuple/s/label/bolt for this exercise Persistence layer must provide guarantees – – distributed against node failure must have either readable flush or closed-append HDFS is distributed, but provides no guarantees and strange semantics MapRfs is distributed, provides all necessary guarantees ©MapR Technologies - Confidential 19 Failure Modes Bolt failure – – – – buffered tuples will go un’acked after timeout, tuples will be resent timeout ≈ 10s if failure occurs after persistence, before acking, then double-counting is possible Storage (with MapR) – – – – most failures invisible a few continue within 0-2s, some take 10s catastrophic cluster restart can take 2-3 min logger can buffer this much easily ©MapR Technologies - Confidential 20 Presentation Layer Presentation must – – – read recent output of Logger bolt read relevant output of Hadoop jobs combine semi-aggregated records User will see – – counts that increment within 0-2 s of events seamless meld of short and long-term data ©MapR Technologies - Confidential 21 Example 2 – Real-time learning My system has to – learn a response model and – – select training data in real-time Data rate up to 100K queries per second ©MapR Technologies - Confidential 22 Door Number 3 – AB testing in real-time I have 15 versions of my landing page Each visitor is assigned to a version – A conversion or sale or whatever can happen – Which version? How long to wait? Some versions of the landing page are horrible – Don’t want to give them traffic ©MapR Technologies - Confidential 23 Real-time Constraints Selection must happen in <20 ms almost all the time Training events must be handled in <20 ms Failover must happen within 5 seconds Client should timeout and back-off – no need for an answer after 500ms State persistence required ©MapR Technologies - Confidential 24 Rough Design Selector Layer DRPC Spout Conversion Detector Query Event Timed SpoutJoin Counter Model Bolt Logger Logger Bolt Bolt Model State Raw Logs ©MapR Technologies - Confidential 25 A Quick Diversion You see a coin – – What is the probability of heads? Could it be larger or smaller than that? I flip the coin and while it is in the air ask again I catch the coin and ask again I look at the coin (and you don’t) and ask again Why does the answer change? – And did it ever have a single value? ©MapR Technologies - Confidential 26 A First Conclusion Probability as expressed by humans is subjective and depends on information and experience ©MapR Technologies - Confidential 27 A Second Diversion What is the mass of the moon? – – – Is that the exact number? – 1/2 degree @ 385 Mm = ~ 3.8 Mm diameter (really about 3.4-ish) V = 1/6 x pi x 3.83 x 1018 m3 = ~ 29 x 1018 m3 (really about 22) m = rho V = 4 Mg/m3 x 29 x 1018 m3 = 1.2 x 1023 kg (really about 0.7) Shouldn’t we have confidence bounds? Wikipedia says: 7.3477 × 1022 kg – – Is that the exact number? Shouldn’t they have confidence bounds? ©MapR Technologies - Confidential 28 A Second Conclusion A single number is a bad way to express uncertain knowledge A distribution of values might be better ©MapR Technologies - Confidential 29 I Dunno ©MapR Technologies - Confidential 30 5 and 5 ©MapR Technologies - Confidential 31 2 and 10 ©MapR Technologies - Confidential 32 Bayesian Bandit Compute distributions based on data Sample p1 and p2 from these distributions Put a coin in bandit 1 if p1 > p2 Else, put the coin in bandit 2 ©MapR Technologies - Confidential 33 And it works! 0.12 0.11 0.1 0.09 0.08 regret 0.07 0.06 ε- greedy, ε = 0.05 0.05 0.04 Bayesian Bandit with Gam m a- Norm al 0.03 0.02 0.01 0 0 100 200 300 400 600 500 n ©MapR Technologies - Confidential 34 700 800 900 1000 1100 Video Demo ©MapR Technologies - Confidential 35 The Code Select an alternative n = dim(k)[1] p0 = rep(0, length.out=n) for (i in 1:n) { p0[i] = rbeta(1, k[i,2]+1, k[i,1]+1) } return (which(p0 == max(p0))) Select and learn for (z in 1:steps) { i = select(k) j = test(i) k[i,j] = k[i,j]+1 } return (k) But we already know how to count! ©MapR Technologies - Confidential 36 The Basic Idea We can encode a distribution by sampling Sampling allows unification of exploration and exploitation Can be extended to more general response models ©MapR Technologies - Confidential 37 Contact: – – tdunning@maprtech.com @ted_dunning Slides and such: – http://info.mapr.com/ted-uk-05-2012 ©MapR Technologies - Confidential 39 MapR’s Innovations ©MapR Technologies - Confidential 40 Thank You ©MapR Technologies - Confidential 41