Estimating Rates of Rare Events at Multiple Resolutions Deepak Agarwal Andrei Broder

advertisement
Estimating Rates of Rare
Events at Multiple Resolutions
Deepak Agarwal
Andrei Broder
Deepayan Chakrabarti
Dejan Diklic
Vanja Josifovski
Mayssam Sayyadian
1
Estimation in the “tail”

Contextual Advertising



Show an ad on a webpage (“impression”)
Revenue is generated if a user clicks
Problem: Estimate the click-through rate (CTR) of
an ad on a page
Most (ad, page) pairs have very few impressions, if any,
 and even fewer clicks
 Severe data sparsity

2
Estimation in the “tail”

Use an existing, well-understood hierarchy
Categorize ads and webpages to leaves of the
hierarchy
 CTR estimates of siblings are correlated
The hierarchy allows us to aggregate data


Coarser resolutions


provide reliable estimates for rare events
which then influences estimation at finer
resolutions
3
System overview
Retrospective data
[URL, ad, isClicked]
Crawl
URLs
a sample
of URLs
Classify pages
and ads
Rare event
estimation using
hierarchy
Impute impressions,
fix sampling bias
4
Sampling of webpages

Naïve strategy: sample at random from the
set of URLs
Sampling errors in impression volume AND click
volume

Instead, we propose:
Crawling all URLs with at least one click, and
 a sample of the remaining URLs
Variability is only in impression volume

5
Imputation of impression volume
Page classes
Ad classes
sums to #impressions
on ads of this ad class
[column constraint]
#impressions =
nij + mij + xij
Clicked
pool
Sampled
Excess impressions
Non-clicked
(to be imputed)
pool
sums to
∑nij + K.∑mij
[row constraint]
sums to
Total impressions
(known)
6
Imputation of impression volume
Level 0


Region
= (page node, ad node)
Region Hierarchy
 A cross-product of the page
hierarchy and the ad
Level i
hierarchy
Region
7
Imputation of impression volume
Level i
Level
i+1
sums to
[block constraint]
8
Imputing xij
Iterative Proportional Fitting
[Darroch+/1972]
Level i
block
Level
i+1
• Initialize xij = nij + mij
• Iteratively scale xij values to
match row/col/block constraint
• Ordering of constraints: topdown, then bottom-up, and
repeat
9
Imputation: Summary

Given




nij (impressions in clicked pool)
mij (impressions in sampled non-clicked pool)
# impressions on ads of each ad class in the ad
hierarchy
We get

Estimated impression volume
Ñij = nij + mij + xij
in each region ij of every level
10
System overview
Retrospective data
[page, ad, isclicked]
Crawl
Pages
a sample
of pages
Classify pages
and ads
Rare event
estimation using
hierarchy
Impute impressions,
fix sampling bias
11
Rare rate modeling
1. Freeman-Tukey transform:


yij = F-T(clicks and impressions at ij)
≈ transformed-CTR
Variance stabilizing transformation: Var(y) is
independent of E[y]  needed in further
modeling
12
Rare rate modeling
2. Generative Model (Tree-structured Markov Model)
variance Wij
Unobserved
“state”
Sparent(ij)
Sij
covariates βij
Wparent(ij)
βparent(ij)
variance Vij
yij
Vparent(ij)
yparent(ij)
13
Rare rate modeling

Model fitting with a 2-pass
Kalman filter:



Filtering: Leaf to root
Smoothing: Root to leaf
Linear in the
number of regions
14
Experiments



503M impressions
7-level hierarchy of which the top 3 levels
were used
Zero clicks in



76% regions in level 2
95% regions in level 3
Full dataset DFULL, and a 2/3 sample
DSAMPLE
15
Experiments



Estimate CTRs for all regions R in level 3
with zero clicks in DSAMPLE
Some of these regions R>0 get clicks in
DFULL
A good model should predict higher CTRs for
R>0 as against the other regions in R
16
Experiments

We compared 4 models




TS: our tree-structured model
LM (level-mean): each level smoothed
independently
NS (no smoothing): CTR proportional to 1/Ñ
Random: Assuming |R>0| is given, randomly
predict the membership of R>0 out of R
17
Experiments
TS
18
Experiments
Few impressions 
Estimates depend
more on siblings
Enough impressions
 little “borrowing”
from siblings
19
Related Work

Multi-resolution modeling


Imputation


studied in time series modeling and spatial
statistics [Openshaw+/79, Cressie/90, Chou+/94]
studied in statistics [Darroch+/1972]
Application of such models to estimation of
such rare events (rates of ~10-3) is novel
20
Conclusions

We presented a method to estimate




rates of extremely rare events
at multiple resolutions
under severe sparsity constraints
Our method has two parts


Imputation  incorporates hierarchy, fixes
sampling bias
Tree-structured generative model  extremely
fast parameter fitting
21
Rare rate modeling
1. Freeman-Tukey transform
~
# clicks in
region r
~
# impressions
in region r


Distinguishes between regions with zero clicks
based on the number of impressions
Variance stabilizing transformation: Var(y) is
independent of E[y]  needed in further
modeling
22
Rare rate modeling

Generative Model

Sij values can be quickly
estimated using a Kalman
filtering algorithm

Kalman filter requires
knowledge of β, V, and W
 EM wrapped around the
Kalman filter
23
Rare rate modeling

Fitting using a Kalman
filtering algorithm



Filtering: Recursively aggregate
data from leaves to root
Smoothing: Propagate
information from root to leaves
Complexity: linear in the
number of regions, for both
time and space
24
Rare rate modeling

Fitting using a Kalman
filtering algorithm



Filtering: Recursively aggregate
data from leaves to root
Smoothing: Propagates
information from root to leaves
Kalman filter requires
knowledge of β, V, and W
 EM wrapped around the
Kalman filter
25
Imputing xij
Iterative Proportional Fitting
[Darroch+/1972]
Initialize xij = nij + mij
(i)
Z
Z(i+1)
block
Top-down:
• Scale all xij in every block in
Z(i+1) to sum to its parent in Z(i)
• Scale all xij in Z(i+1) to sum to
the row totals
• Scale all xij in Z(i+1) to sum to
the column totals
Repeat for every level Z(i)
Bottom-up: Similar
26
Download