Intelligent Services Serving Machine Learning Joseph E. Gonzalez jegonzal@cs.berkeley.edu; Assistant Professor @ UC Berkeley joseph@dato.com; Co-Founder @ Dato Inc. Contemporary Learning Systems Big Data Training Big Models Contemporary Learning Systems Create BIDMach MLlib VW LIBSVM MLC Oryx 2 What happens after we train a model Data Training Conference Papers Dashboards and Reports Model Drive Actions What happens after we train a model Data Training Conference Papers Dashboards and Reports Model Drive Actions Suggesting Items at Checkout Low-Latency Fraud Detection Cognitive Assistance Personalized Internet of Things Rapidly Changing Data Train Model Actions Data Train Model Machine Learning Intelligent Services 9 Top-K Query New Page Images … User Feedback: Preferred Item Web Serving Tier Request: Items like x Top Items Intelligent Service The Life of a Query in an Intelligent Service Lookup Model μ ∫ σ Feature Lookup α math ρ Feature ∑ β Feedback Model Info User Data Lookup Content Request Product Info Essential Attributes of Intelligent Services Responsi Intelligent applications ve are interactive Adaptiv ML models out-of-date the e is done moment learning Manageabl Many models e people created by multiple Responsive: Now and Always Compute predictions in < 20ms for complex Queries Top K Models Features SELECT * FROM users JOIN items, click_logs, pages WHERE … under heavy query load with system failures. Experiment: End-to-end Latency in Spark MLlib To JSON 4 HTTP Req. Feature Trans. Evaluate Model HTTP Response Encode Prediction Latency in Milliseconds 450 400 350 300 250 200 150 100 50 0 418.7 172.6 137.7 4.3 21.8 22.4 50.5 Adaptive to Change at All Scales Population Granularity of Data Session Shopping for Mom Shopping for Me Months Rate of Change Minutes Adaptive to Change at All Scales Population Population Granularity of Data Session Law of Large Numbers Shopping Change Slow for Mom Months Months Rely on efficientShopping offline for Me retraining Change High-throughput Rate of Minutes Systems Adaptive to Change at All Scales Population Granularity of Data Session Small Data Rapidly Changing Low Latency Online Learning Sensitive to feedback bias Months Rate of Change Shopping for Mom Shopping for Me Minutes The Feedback Loop I once looked at cameras on Amazon … Opportunity for Similar cameras and accessories Bandit Algorithms Bandits present new challenges: • computation overhead • complicates caching + indexing Management: Collaborative Development Teams of data-scientists working on similar tasks “competing” features and models Complex model dependencies: Cat Photo isCat Cat Classifier isAnimal Animal Classifier Cute! Cuteness Predictor UC Berkeley AMPLab Daniel Crankshaw, Xin Wang, Joseph Gonzalez Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan Predictive Services UC Berkeley AMPLab Daniel Crankshaw, Xin Wang, Joseph Gonzalez Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan Active Research Project Predictive Services Velox Model Serving System [CIDR’15, LearningSys’15] Focuses on the multi-task learning (MTL) domain Spam Classification Content Rec. Scoring Session 1: Session 2: Localized Anomaly Detection Velox Model Serving System Personalized Models (Mulit-task Learning) [CIDR’15, LearningSys’15] Input Output “Separate” model for each user/context. Velox Model Serving System Personalized Models (Mulit-task Learning) [CIDR’15, LearningSys’15] Feature Model Personalization Model Split Hybrid Offline + Online Learning Update feature functions offline using batch solvers Feature Personalization Model systemsModel • Leverage high-throughput (Apache Spark) • Exploit slow change in population statistics Update the user weights online: Split • Simple to train + more robust model • Address rapidly changing user statistics Hybrid Online + Offline Learning Results Similar Test Error Hybrid Offline Full User Pref. Change Substantially Faster Training Offline Full Hybrid Evaluating the Model Cache Feature Evaluation Input Split Evaluating the Model Feature Caching Across InputUsers Cache Feature Evaluation Approximate Feature Hashing Anytime Feature Evaluation Split Feature Caching New input: Compute feature: Hash input: Feature Hash Table LSH Cache Coarsening New input Hash new input: Use Wrong Value! LSH hash fn. Feature Hash Table LSH Cache Coarsening Locality-Sensitive Hashing: Hash new input: Locality-Sensitive Caching: Use Value Anyways Req. LSH Feature Hash Table Anytime Predictions Compute features asynchronously: __ __ __ if a particular element does not arrive use estimator instead Always able to render a prediction by the latency deadline Coarsening + Anytime Predictions More Features Approx. Expectation Best Checkout our poster! No Coarsening Better Overly Coarsened Coarser Hash Part of Berkeley Data Analytics Stack Management + Serving Training Spark Streamin g BlinkDB Spark SQL MLbas e Graph ML X library Spark Velox Predictio Model n Manager Service Mesos Tachyon HDFS, S3, … UC Berkeley AMPLab Daniel Crankshaw, Xin Wang, Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan Predictive Services Dato Predictive Services Production ready model serving and management system Elastic scaling and load balancing of docker.io containers AWS Cloudwatch Metrics and Reporting Serves Dato Create models, scikit-learn, and custom python Distributed shared caching: scale-out to address latency REST management API: Demo? UC Berkeley AMPLab Daniel Crankshaw, Xin Wang, Joseph Gonzalez Peter Bailis, Haoyuan, Zhao Zhang, Michael J. Franklin, Ali Ghodsi, and Michael I. Jordan Responsiv e Key Insights: Predictive Services Adaptive Caching, Bandits, & Management Manageable Online/Offline Learning Latency vs. Accuracy Future of Learning Systems Actions Dat a Train Mode l Thank You Joseph E. Gonzalez jegonzal@cs.berkeley.edu, Assistant Professor @ UC Berkeley joseph@dato.com, Co-Founder @ Dato