Slides - LearningSys

advertisement
Intelligent
Services
Serving Machine Learning
Joseph E. Gonzalez
jegonzal@cs.berkeley.edu; Assistant Professor @ UC
Berkeley
joseph@dato.com; Co-Founder @ Dato Inc.
Contemporary Learning Systems
Big
Data
Training
Big
Models
Contemporary Learning Systems
Create
BIDMach
MLlib
VW LIBSVM
MLC
Oryx 2
What happens after we train a model
Data
Training
Conference
Papers
Dashboards and
Reports
Model
Drive Actions
What happens after we train a model
Data
Training
Conference
Papers
Dashboards and
Reports
Model
Drive Actions
Suggesting Items
at Checkout
Low-Latency
Fraud
Detection
Cognitive
Assistance
Personalized
Internet of
Things
Rapidly Changing
Data
Train
Model
Actions
Data
Train
Model
Machine
Learning
Intelligent
Services
9
Top-K
Query
New Page
Images …
User
Feedback:
Preferred Item
Web Serving Tier
Request:
Items like x
Top
Items
Intelligent Service
The Life of a Query in an Intelligent
Service
Lookup
Model
μ ∫ σ Feature
Lookup
α math
ρ Feature
∑ β
Feedback
Model
Info
User
Data
Lookup
Content Request
Product
Info
Essential Attributes of Intelligent
Services
Responsi
Intelligent applications
ve
are interactive
Adaptiv
ML models out-of-date the
e is done
moment learning
Manageabl
Many models
e people
created by multiple
Responsive: Now and Always
Compute predictions in < 20ms for complex
Queries
Top K
Models
Features
SELECT * FROM
users JOIN items,
click_logs, pages
WHERE …
under heavy query load with system failures.
Experiment: End-to-end Latency in Spark
MLlib
To
JSON
4
HTTP Req.
Feature Trans.
Evaluate
Model
HTTP Response
Encode Prediction
Latency in
Milliseconds
450
400
350
300
250
200
150
100
50
0
418.7
172.6
137.7
4.3
21.8 22.4
50.5
Adaptive to Change at All Scales
Population
Granularity of Data
Session
Shopping
for Mom
Shopping
for Me
Months
Rate of Change
Minutes
Adaptive to Change at All Scales
Population
Population
Granularity of Data
Session
Law of Large Numbers
Shopping
 Change Slow
for Mom
Months
Months
Rely on efficientShopping
offline
for Me
retraining
Change
High-throughput
Rate of
Minutes
Systems
Adaptive to Change at All Scales
Population
Granularity of Data
Session
Small Data  Rapidly Changing
Low Latency  Online Learning
Sensitive to feedback bias
Months
Rate of Change
Shopping
for Mom
Shopping
for Me
Minutes
The Feedback Loop
I once looked at cameras on
Amazon …
Opportunity for
Similar
cameras and
accessories
Bandit Algorithms
Bandits present new challenges:
• computation overhead
• complicates caching + indexing
Management: Collaborative Development
Teams of data-scientists working on similar
tasks
“competing” features and models
Complex model dependencies:
Cat Photo
isCat
Cat Classifier
isAnimal
Animal
Classifier
Cute!
Cuteness
Predictor
UC Berkeley AMPLab
Daniel Crankshaw, Xin Wang, Joseph Gonzalez
Peter Bailis, Haoyuan, Zhao Zhang,
Michael J. Franklin, Ali Ghodsi,
and Michael I. Jordan
Predictive Services
UC Berkeley AMPLab
Daniel Crankshaw, Xin Wang, Joseph Gonzalez
Peter Bailis, Haoyuan, Zhao Zhang,
Michael J. Franklin, Ali Ghodsi,
and Michael I. Jordan
Active Research Project
Predictive Services
Velox Model Serving
System
[CIDR’15, LearningSys’15]
Focuses on the multi-task learning (MTL) domain
Spam
Classification
Content Rec.
Scoring
Session 1:
Session 2:
Localized
Anomaly Detection
Velox Model Serving
System
Personalized Models (Mulit-task Learning)
[CIDR’15, LearningSys’15]
Input
Output
“Separate” model for
each user/context.
Velox Model Serving
System
Personalized Models (Mulit-task Learning)
[CIDR’15, LearningSys’15]
Feature
Model
Personalization
Model
Split
Hybrid Offline + Online Learning
Update feature functions offline using batch
solvers
Feature
Personalization
Model systemsModel
• Leverage high-throughput
(Apache
Spark)
• Exploit slow change in population statistics
Update the user weights online:
Split
• Simple to train + more
robust model
• Address rapidly changing user statistics
Hybrid Online + Offline Learning
Results
Similar Test Error
Hybrid
Offline Full
User Pref. Change
Substantially Faster
Training
Offline Full
Hybrid
Evaluating the Model
Cache
Feature Evaluation
Input
Split
Evaluating the Model
Feature
Caching Across
InputUsers
Cache
Feature Evaluation
Approximate
Feature Hashing
Anytime Feature
Evaluation
Split
Feature Caching
New input:
Compute feature:
Hash input:
Feature Hash Table
LSH Cache Coarsening
New input
Hash new input:
Use Wrong Value!
 LSH hash fn.
Feature Hash Table
LSH Cache Coarsening
Locality-Sensitive Hashing:
Hash new input:
Locality-Sensitive Caching:
Use Value Anyways
 Req. LSH
Feature Hash Table
Anytime Predictions
Compute features asynchronously:
__
__
__
if a particular element does not arrive use estimator
instead
Always able to render a prediction by the latency
deadline
Coarsening + Anytime Predictions
More Features
Approx. Expectation
Best
Checkout our poster!
No Coarsening
Better
Overly Coarsened
Coarser Hash
Part of Berkeley Data Analytics
Stack
Management + Serving
Training
Spark
Streamin
g
BlinkDB
Spark
SQL
MLbas
e
Graph
ML
X
library
Spark
Velox
Predictio
Model
n
Manager
Service
Mesos
Tachyon
HDFS, S3, …
UC Berkeley AMPLab
Daniel Crankshaw, Xin Wang,
Peter Bailis, Haoyuan, Zhao Zhang,
Michael J. Franklin, Ali Ghodsi,
and Michael I. Jordan
Predictive Services
Dato Predictive Services
Production ready model serving and management system
Elastic scaling and load balancing of docker.io containers
AWS Cloudwatch Metrics and Reporting
Serves Dato Create models, scikit-learn, and custom
python
Distributed shared caching: scale-out to address latency
REST management API: Demo?
UC Berkeley AMPLab
Daniel Crankshaw, Xin Wang, Joseph Gonzalez
Peter Bailis, Haoyuan, Zhao Zhang,
Michael J. Franklin, Ali Ghodsi,
and Michael I. Jordan
Responsiv
e
Key Insights:
Predictive Services
Adaptive
Caching, Bandits, &
Management
Manageable
Online/Offline Learning
Latency vs. Accuracy
Future of Learning Systems
Actions
Dat
a
Train
Mode
l
Thank You
Joseph E. Gonzalez
jegonzal@cs.berkeley.edu, Assistant Professor @ UC Berkeley
joseph@dato.com, Co-Founder @ Dato
Download