Which Algorithms Really Matter? ©MapR Technologies 2013 1 Me, Us Ted Dunning, Chief Application Architect, MapR Committer PMC member, Mahout, Zookeeper, Drill Bought the beer at the first HUG MapR Distributes more open source components for Hadoop Adds major technology for performance, HA, industry standard API’s Info Hash tag - #mapr See also - @ApacheMahout @ApacheDrill @ted_dunning and @mapR ©MapR Technologies 2013 2 Topic For Today What is important? What is not? Why? What is the difference from academic research? Some examples ©MapR Technologies 2013 4 What is Important? Deployable Robust Transparent Skillset and mindset matched? Proportionate ©MapR Technologies 2013 5 What is Important? Deployable – Clever prototypes don’t count if they can’t be standardized Robust Transparent Skillset and mindset matched? Proportionate ©MapR Technologies 2013 6 What is Important? Deployable – Robust – Clever prototypes don’t count Mishandling is common Transparent – Will degradation be obvious? Skillset and mindset matched? Proportionate ©MapR Technologies 2013 7 What is Important? Deployable – Robust – Will degradation be obvious? Skillset and mindset matched? – Mishandling is common Transparent – Clever prototypes don’t count How long will your fancy data scientist enjoy doing standard ops tasks? Proportionate – Where is the highest value per minute of effort? ©MapR Technologies 2013 8 Academic Goals vs Pragmatics Academic goals – – – Reproducible Isolate theoretically important aspects Work on novel problems Pragmatics – – – – – Highest net value Available data is constantly changing Diligence and consistency have larger impact than cleverness Many systems feed themselves, exploration and exploitation are both important Engineering constraints on budget and schedule ©MapR Technologies 2013 9 Example 1: Making Recommendations Better ©MapR Technologies 2013 10 Recommendation Advances What are the most important algorithmic advances in recommendations over the last 10 years? Cooccurrence analysis? Matrix completion via factorization? Latent factor log-linear models? Temporal dynamics? ©MapR Technologies 2013 11 The Winner – None of the Above What are the most important algorithmic advances in recommendations over the last 10 years? 1. Result dithering 2. Anti-flood ©MapR Technologies 2013 12 The Real Issues Exploration Diversity Speed Not the last fraction of a percent ©MapR Technologies 2013 13 Result Dithering Dithering is used to re-order recommendation results – Re-ordering is done randomly Dithering is guaranteed to make off-line performance worse Dithering also has a near perfect record of making actual performance much better ©MapR Technologies 2013 14 Result Dithering Dithering is used to re-order recommendation results – Re-ordering is done randomly Dithering is guaranteed to make off-line performance worse Dithering also has a near perfect record of making actual performance much better “Made more difference than any other change” ©MapR Technologies 2013 15 Simple Dithering Algorithm Generate synthetic score from log rank plus Gaussian s = logr + N(0, e ) Pick noise scale to provide desired level of mixing Dr µ r exp e Typically e Î [ 0.4, 0.8] Oh… use floor(t/T) as seed ©MapR Technologies 2013 16 Example … ε = 0.5 1 1 1 1 1 1 1 2 4 2 3 2 ©MapR Technologies 2013 2 2 4 2 6 2 2 1 1 1 1 1 6 3 3 4 2 3 3 3 2 5 5 3 5 8 2 3 3 5 4 5 7 3 4 4 3 5 6 15 4 24 6 7 3 4 2 7 17 4 7 7 7 16 7 12 6 9 7 7 12 13 6 11 13 9 17 5 4 8 13 8 17 16 34 10 19 5 13 14 17 5 6 6 16 Example … ε = log 2 = 0.69 1 1 1 1 1 1 1 2 2 3 11 1 ©MapR Technologies 2013 2 8 3 2 5 2 3 4 3 4 1 8 8 14 8 10 33 7 5 11 1 1 2 7 3 15 2 7 15 3 23 8 4 2 4 3 9 3 10 3 2 5 9 3 6 10 5 22 18 15 2 5 8 9 4 7 1 7 11 7 11 7 22 7 6 11 19 4 44 8 15 3 2 6 10 4 14 29 6 2 9 33 14 14 33 Exploring The Second Page ©MapR Technologies 2013 19 Lesson 1: Exploration is good ©MapR Technologies 2013 20 Example 2: Bayesian Bandits ©MapR Technologies 2013 21 Bayesian Bandits Based on Thompson sampling Very general sequential test Near optimal regret Trade-off exploration and exploitation Possibly best known solution for exploration/exploitation Incredibly simple ©MapR Technologies 2013 22 Thompson Sampling Select each shell according to the probability that it is the best Probability that it is the best can be computed using posterior é ù P(i is best) = ò I êE[ri | q ] = max E[rj | q ]ú P(q | D) dq ë û j But I promised a simple answer ©MapR Technologies 2013 23 Thompson Sampling – Take 2 Sample θ q ~ P(q | D) Pick i to maximize reward i = argmax E[rj | q ] j Record result from using i ©MapR Technologies 2013 24 Fast Convergence 0.12 0.11 0.1 0.09 0.08 regret 0.07 0.06 ε- greedy, ε = 0.05 0.05 0.04 Bayesian Bandit with Gam m a- Norm al 0.03 0.02 0.01 0 0 100 200 300 400 600 500 n ©MapR Technologies 2013 25 700 800 900 1000 1100 Thompson Sampling on Ads An Empirical Evaluation of Thompson Sampling - Chapelle and Li, 2011 ©MapR Technologies 2013 26 Bayesian Bandits versus Result Dithering Many useful systems are difficult to frame in fully Bayesian form Thompson sampling cannot be applied without posterior sampling Can still do useful exploration with dithering But better to use Thompson sampling if possible ©MapR Technologies 2013 27 Lesson 2: Exploration is pretty easy to do and pays big benefits. ©MapR Technologies 2013 28 Example 3: On-line Clustering ©MapR Technologies 2013 29 The Problem K-means clustering is useful for feature extraction or compression At scale and at high dimension, the desirable number of clusters increases Very large number of clusters may require more passes through the data Super-linear scaling is generally infeasible ©MapR Technologies 2013 30 The Solution Sketch-based algorithms produce a sketch of the data Streaming k-means uses adaptive dp-means to produce this sketch in the form of many weighted centroids which approximate the original distribution The size of the sketch grows very slowly with increasing data size Many operations such as clustering are well behaved on sketches Fast and Accurate k-means For Large Datasets. Michael Shindler, Alex Wong, Adam Meyerson. Revisiting k-means: New Algorithms via Bayesian Nonparametrics . Brian Kulis, Michael Jordan. ©MapR Technologies 2013 31 An Example ©MapR Technologies 2013 32 An Example ©MapR Technologies 2013 33 The Cluster Proximity Features Every point can be described by the nearest cluster – – Or by the proximity to the 2 nearest clusters (2 x 4.3 bits + 1 sign bit + 2 proximities) – – 4.3 bits per point in this case Significant error that can be decreased (to a point) by increasing number of clusters Error is negligible Unwinds the data into a simple representation Or we can increase the number of clusters (n fold increase adds log n bits per point, decreases error by sqrt(n) ©MapR Technologies 2013 34 Diagonalized Cluster Proximity ©MapR Technologies 2013 35 Lots of Clusters Are Fine ©MapR Technologies 2013 36 Typical k-means Failure Selecting two seeds here cannot be fixed with Lloyds Result is that these two clusters get glued together ©MapR Technologies 2013 37 Streaming k-means Ideas By using a sketch with lots (k log N) of centroids, we avoid pathological cases We still get a very good result if the sketch is created – – in one pass with approximate search In fact, adaptive dp-means works just fine In the end, the sketch can be used for clustering or … ©MapR Technologies 2013 38 Lesson 3: Sketches make big data small. ©MapR Technologies 2013 39 Example 4: Search Abuse ©MapR Technologies 2013 40 Recommendations Alice Charles ©MapR Technologies 2013 Alice got an apple and a puppy Charles got a bicycle 41 Recommendations Alice Bob Charles ©MapR Technologies 2013 Alice got an apple and a puppy Bob got an apple Charles got a bicycle 42 Recommendations Alice Bob ? What else would Bob like? Charles ©MapR Technologies 2013 43 Log Files Alice Charles Charles Alice Alice Bob Bob ©MapR Technologies 2013 44 History Matrix: Users by Items Alice ✔ Bob ✔ Charles ©MapR Technologies 2013 ✔ ✔ ✔ ✔ 45 ✔ Co-occurrence Matrix: Items by Items How do you tell which co-occurrences are useful?. 1 2 1 ©MapR Technologies 2013 1 2 1 0 0 - 1 1 46 0 0 Co-occurrence Binary Matrix not not ©MapR Technologies 2013 1 1 47 1 Indicator Matrix: Anomalous Co-Occurrence Result: The marked row will be added to the indicator field in the item document… ✔ ✔ ©MapR Technologies 2013 48 Indicator Matrix That one row from indicator matrix becomes the indicator field in the Solr document used to deploy the recommendation engine. ✔ id: t4 title: puppy desc: The sweetest little puppy ever. keywords: puppy, dog, pet indicators: (t1) Note: data for the indicator field is added directly to meta-data for a document in Solr index. You don’t need to create a separate index for the indicators. ©MapR Technologies 2013 49 Internals of the Recommender Engine 50 ©MapR Technologies 2013 50 Internals of the Recommender Engine 51 ©MapR Technologies 2013 51 Looking Inside LucidWorks Real-time recommendation query and results: Evaluation What to recommend if new user listened to 2122: Fats Domino & 303: Beatles? Recommendation is “1710 : Chuck Berry” 52 ©MapR Technologies 2013 52 Real-life example ©MapR Technologies 2013 53 Lesson 4: Recursive search abuse pays Search can implement recs Which can implement search ©MapR Technologies 2013 54 Summary ©MapR Technologies 2013 55 ©MapR Technologies 2013 56 Me, Us Ted Dunning, Chief Application Architect, MapR Committer PMC member, Mahout, Zookeeper, Drill Bought the beer at the first HUG MapR Distributes more open source components for Hadoop Adds major technology for performance, HA, industry standard API’s Info Hash tag - #mapr See also - @ApacheMahout @ApacheDrill @ted_dunning and @mapR ©MapR Technologies 2013 57