2014-11-20

advertisement
15 millions de prédictions par seconde
Les défis de Criteo
Nicolas Le Roux
Scientific Program Manager - R&D
2014-11-20
Copyright © 2014 Criteo
What does Criteo do?
• We buy advertising spaces on websites
• We display ads for our partners
• We get paid if the user clicks on the ad
Copyright © 2014 Criteo
Retargeting
Copyright © 2014 Criteo
In practice
1. A user lands on a webpage
2. The website contacts an ad-exchange
3. The ad-exchange contacts Criteo and its competitors
4. It is an auction: each competitor tells how much it bids
5. The highest bidder wins the right to display an ad
Copyright © 2014 Criteo
Details of the auction
• Real-time bidding (RTB)
• Second-price auction: the winner pays the second highest price
• Optimal strategy: bid the expected gain
• Expected gain = price per click (CPC) * probability of click (CTR)
Copyright © 2014 Criteo
Goal of the arbitrage
• Choose the appropriate advertiser
• Only win the profitable displays: it is a prediction problem
• An overbid means you are losing money
• An underbid means you are losing potential revenue
Copyright © 2014 Criteo
What to do once we win the display?
• We are now directly in contact with the website
• Choose the best products
• Choose the color, the font and the layout
• Generate the banner
Copyright © 2014 Criteo
Goal of the recommendation
• Increase the value of a display: it is a ranking problem
• The potential gains are huge
Copyright © 2014 Criteo
Real-time constraints at Criteo
•
More than 2 billion banners displayed per day
Copyright © 2014 Criteo
Real-time constraints at Criteo
•
More than 2 billion banners displayed per day
Time to answer an RTB request (in ms)
100
Time to bid before timeout
Copyright © 2014 Criteo
Real-time constraints at Criteo
•
More than 2 billion banners displayed per day
Time to answer an RTB request (in ms)
50
Network time
Copyright © 2014 Criteo
10
Parsing + logging
35
Other computation
5
Available for prediction
Fitting within the real-time constraints
• To avoid timeouts, only simple models are allowed
• Logistic regression
• Decision trees
• What freedom do we have?
Copyright © 2014 Criteo
Predicting the CTR
•
𝑃 𝑐𝑙𝑖𝑐𝑘 = 1 𝑥 = 𝜎(𝜃 𝑇 𝑥)
•
𝑥: features containing historical and contextual information
•
𝜃: parameters of the model
Copyright © 2014 Criteo
Predicting the CTR
•
𝑃 𝑐𝑙𝑖𝑐𝑘 = 1 𝑥 = 𝜎(𝜃 𝑇 𝑥)
•
𝑥: [TimeSinceLastVisit, CurrentURL]
•
𝜃: parameters of the model
Copyright © 2014 Criteo
A large number of parameters
• 𝑥: [TimeSinceLastVisit, CurrentURL]
• One parameter per modality
Copyright © 2014 Criteo
A large number of parameters
• 𝑥: [TimeSinceLastVisit, CurrentURL]
• One parameter per modality
• TimeSinceLastVisit
• Less than 10 seconds: [1, 0, 0]
• Between 10 seconds and 5 minutes: [0, 1, 0]
• More than 5 minutes: [0, 0, 1]
Copyright © 2014 Criteo
A large number of parameters
• 𝑥: [TimeSinceLastVisit, CurrentURL]
• One parameter per modality
• TimeSinceLastVisit
• Less than 10 seconds: [1, 0, 0]
• Between 10 seconds and 5 minutes: [0, 1, 0]
• More than 5 minutes: [0, 0, 1]
• CurrentURL
• lemonde.fr : [1, 0, 0, ..., 0]
• facebook.com : [0, 1, 0, ..., 0]
• maisonetjardin.fr : [0, 0, 1, ..., 0]
• 𝑥: [More Than 5 minutes, facebook.com] = [0, 0, 1, 0, 1, 0, ..., 0]
Copyright © 2014 Criteo
Modeling higher-order information
• A linear model cannot represent higher order information
Copyright © 2014 Criteo
Modeling higher-order information
• A linear model cannot represent higher order information
• E.g. CurrentUrl = "disney.com" and Advertiser = "Guns4Life"
Copyright © 2014 Criteo
Modeling higher-order information
• A linear model cannot represent higher order information
• E.g. CurrentUrl = "disney.com" and Advertiser = "Guns4Life"
• We model these by creating "cross-features"
Copyright © 2014 Criteo
Modeling higher-order information
• A linear model cannot represent higher order information
• E.g. CurrentUrl = "disney.com" and Advertiser = "Guns4Life"
• We model these by creating "cross-features"
• CurrentUrl has 𝑝1 modalities, Advertiser has 𝑝2 modalities
• The cross-feature has 𝑝1 𝑝2 modalities
Copyright © 2014 Criteo
Hashing
• This model has estimation and computational issues
Copyright © 2014 Criteo
Hashing
• This model has estimation and computational issues
• Choose ℎ: ℕ → 1, … , 𝑝
Copyright © 2014 Criteo
Hashing
• This model has estimation and computational issues
• Choose ℎ: ℕ → 1, … , 𝑝
• Replace 𝑥𝑖 = 1 with 𝑥ℎ(𝑖) = 1
Copyright © 2014 Criteo
Hashing
• This model has estimation and computational issues
• Choose ℎ: ℕ → 1, … , 𝑝
• Replace 𝑥𝑖 = 1 with 𝑥ℎ(𝑖) = 1
• The original 𝑥 is projected to ℝ𝑝
Copyright © 2014 Criteo
Hashing
• This model has estimation and computational issues
• Choose ℎ: ℕ → 1, … , 𝑝
• Replace 𝑥𝑖 = 1 with 𝑥ℎ(𝑖) = 1
• The original 𝑥 is projected to ℝ𝑝
• 𝑃 𝑐𝑙𝑖𝑐𝑘 = 1 𝑥 = 𝜎(𝜃 𝑇 𝑥)
Copyright © 2014 Criteo
Dealing with collisions
• There are many pairs 𝑖1 , 𝑖2 such that ℎ(𝑖1 ) = ℎ(𝑖2 )
• These two features will become indistinguishable
Copyright © 2014 Criteo
Dealing with collisions
• There are many pairs 𝑖1 , 𝑖2 such that ℎ(𝑖1 ) = ℎ(𝑖2 )
• These two features will become indistinguishable
• First solution: increase 𝑝
Copyright © 2014 Criteo
Dealing with collisions
• There are many pairs 𝑖1 , 𝑖2 such that ℎ(𝑖1 ) = ℎ(𝑖2 )
• These two features will become indistinguishable
• First solution: increase 𝑝
• Second solution: do feature selection.
Copyright © 2014 Criteo
The cost of adding variables
• « Hey, I thought of this great variable: Time since last product view.
• Can we add it to the model? »
Copyright © 2014 Criteo
The cost of adding variables
• « Hey, I thought of this great variable: Time since last product view.
• Can we add it to the model? »
• Storage: #Banners/day x #Days x 4 = 480GB
Copyright © 2014 Criteo
The cost of adding variables
• « Hey, I thought of this great variable: Time since last product view.
• Can we add it to the model? »
• Storage: #Banners/day x #Days x 4 = 480GB
• RAM: #Users x #Campaigns x 4 = 40GB
Copyright © 2014 Criteo
Feature selection
• About 40 original features
Copyright © 2014 Criteo
Feature selection
• About 40 original features
• 780 level-2 cross-features
• 9880 level-3 cross-features
Copyright © 2014 Criteo
Feature selection
• About 40 original features
• 780 level-2 cross-features
• 9880 level-3 cross-features
• Each feature contains many modalities
Copyright © 2014 Criteo
Feature selection at scale
• Greedy methods score the variables independently
• Group sparsity methods score them jointly but are biased (think L1)
Copyright © 2014 Criteo
Feature selection at scale
• Greedy methods score the variables independently
• Group sparsity methods score them jointly but are biased (think L1)
• Let us use greedy methods to provide a good initial set of features
• Refine with group sparsity
Copyright © 2014 Criteo
Learning the parameters
• n = 10^9, p = 10^7
• Theory tells us that stochastic gradient methods should be used
Copyright © 2014 Criteo
Learning the parameters - The actual situation
• Tens of models are trained several times a day
• Warm starts favor batch methods
• Not all points are equal
• Stochastic methods are harder to parallelize.
Copyright © 2014 Criteo
Criteo's optimizer
• Batch optimizer
• Distributed computation of the gradients (107 examples/s)
• Update computation on a single node
Copyright © 2014 Criteo
Criteo's optimizer
• Batch optimizer
• Distributed computation of the gradients (107 examples/s)
• Update computation on a single node
• Robustness trumps accuracy
Copyright © 2014 Criteo
Dealing with spurious clicks
• Some clicks do not bring sales
• Some clicks are pure fake
• We can build a fraud detection system
Copyright © 2014 Criteo
Dealing with spurious clicks
• Some clicks do not bring sales
• Some clicks are pure fake
• We can build a fraud detection system
• What is the real issue here?
Copyright © 2014 Criteo
Dealing with spurious clicks
• The real goal is to only buy clicks which bring sales
Copyright © 2014 Criteo
Dealing with spurious clicks
• The real goal is to only buy clicks which bring sales
• We have labeled data
Copyright © 2014 Criteo
Dealing with spurious clicks
• The real goal is to only buy clicks which bring sales
• We have labeled data
• Let's build a sale prediction model.
Copyright © 2014 Criteo
Are we done yet?
• We know how to build a good model
• We know how to train it
• We are predicting a meaningful target
Copyright © 2014 Criteo
Simpson’s paradox
CTR
Overall
First position
60/9000 (0.67%)
Second position
50/7000 (0.71%)
Copyright © 2014 Criteo
Simpson’s paradox
CTR
Overall
Low-value users
High-value users
First position
60/9000 (0.67%)
48/8000 (0.6%)
12/1000 (1.2%)
Second position
50/7000 (0.71%)
2/1000 (0.2%)
48/6000 (0.8%)
Copyright © 2014 Criteo
Simpson’s paradox
CTR
Overall
Low-value users
High-value users
First position
60/9000 (0.67%)
48/8000 (0.6%)
12/1000 (1.2%)
Second position
50/7000 (0.71%)
2/1000 (0.2%)
48/6000 (0.8%)
• Because of the underprediction, we might not detect our error.
Copyright © 2014 Criteo
Offline model evaluation
• The data collection process depends on the current algorithm
• Changing the predictor changes the input distribution
• How can we predict what will happen?
Copyright © 2014 Criteo
Evaluating the model
• A/B test
• Slow
• Costly
Copyright © 2014 Criteo
Evaluating the model
• A/B test
• Slow
• Costly
• Counterfactual analysis: “What would have happened if…?”
• Based around importance sampling
Copyright © 2014 Criteo
Counterfactual analysis
• Requires randomization in real time
• Requires the ability to replay what happened
• “Would I have taken another decision?”
• Extensive logging: ALL campaigns
Copyright © 2014 Criteo
Using side information
• There is never enough data
• With more information, we could learn finer behaviours
• But there are only so many sales, can we do better?
Copyright © 2014 Criteo
From unsupervised to weakly supervised learning
• Unsupervised learning tries to learn about X
• Weakly supervised learning uses related tasks
• Long visits on the website
• Sales which do not follow a click
• Big data: unstructured targets rather than inputs
Copyright © 2014 Criteo
Summary for the prediction
• The technical constraints shape our solution
• Accuracy is not the only goal
• Scaling is much more than the quantity of data
Copyright © 2014 Criteo
Summary for the prediction
• The technical constraints shape our solution
• Accuracy is not the only goal
• Scaling is much more than the quantity of data
• Which latest developments can we incorporate and benefit from?
Copyright © 2014 Criteo
Product recommendation
• We have the list of products seen
• A catalog can contain 105 products
• How do we choose the right products to show?
Copyright © 2014 Criteo
Product recommendation
• We have the list of products seen
• A catalog can contain 105 products
• How do we choose the right products to show?
• In less than 20ms
Copyright © 2014 Criteo
Two-stage approach
• Stage 1: Product preselection based on:
• Popularity
• Browsing history of the user
Copyright © 2014 Criteo
Two-stage approach
• Stage 1: Product preselection based on:
• Popularity
• Browsing history of the user
• Stage 2: Exact scoring using a prediction model
Copyright © 2014 Criteo
Major hurdles for product recommendation
• Products come and go
• There might be a sale (Black Friday)
• Complementary vs. similar products
Copyright © 2014 Criteo
Other challenges
• Multiple products in a banner
• Interaction between products and layout
• Different timeframes for different products
Copyright © 2014 Criteo
A first recipe for success
• There are many sources of success/failure
• It is often suboptimal to focus on one
• The first step to address each source is often manual
Copyright © 2014 Criteo
Conclusion
• Prediction is at the core of our business
• Huge engineering constraints
• Different bottlenecks than in academia
• Build from the ground up, not the other way around
Copyright © 2014 Criteo
This is the end
Thank you!
Questions?
Copyright © 2014 Criteo
Download