Estimating Rates of Rare Events at Multiple Resolutions Deepak Agarwal Andrei Broder Deepayan Chakrabarti Dejan Diklic Vanja Josifovski Mayssam Sayyadian 1 Estimation in the “tail” Contextual Advertising Show an ad on a webpage (“impression”) Revenue is generated if a user clicks Problem: Estimate the click-through rate (CTR) of an ad on a page Most (ad, page) pairs have very few impressions, if any, and even fewer clicks Severe data sparsity 2 Estimation in the “tail” Use an existing, well-understood hierarchy Categorize ads and webpages to leaves of the hierarchy CTR estimates of siblings are correlated The hierarchy allows us to aggregate data Coarser resolutions provide reliable estimates for rare events which then influences estimation at finer resolutions 3 System overview Retrospective data [URL, ad, isClicked] Crawl URLs a sample of URLs Classify pages and ads Rare event estimation using hierarchy Impute impressions, fix sampling bias 4 Sampling of webpages Naïve strategy: sample at random from the set of URLs Sampling errors in impression volume AND click volume Instead, we propose: Crawling all URLs with at least one click, and a sample of the remaining URLs Variability is only in impression volume 5 Imputation of impression volume Page classes Ad classes sums to #impressions on ads of this ad class [column constraint] #impressions = nij + mij + xij Clicked pool Sampled Excess impressions Non-clicked (to be imputed) pool sums to ∑nij + K.∑mij [row constraint] sums to Total impressions (known) 6 Imputation of impression volume Level 0 Region = (page node, ad node) Region Hierarchy A cross-product of the page hierarchy and the ad Level i hierarchy Region 7 Imputation of impression volume Level i Level i+1 sums to [block constraint] 8 Imputing xij Iterative Proportional Fitting [Darroch+/1972] Level i block Level i+1 • Initialize xij = nij + mij • Iteratively scale xij values to match row/col/block constraint • Ordering of constraints: topdown, then bottom-up, and repeat 9 Imputation: Summary Given nij (impressions in clicked pool) mij (impressions in sampled non-clicked pool) # impressions on ads of each ad class in the ad hierarchy We get Estimated impression volume Ñij = nij + mij + xij in each region ij of every level 10 System overview Retrospective data [page, ad, isclicked] Crawl Pages a sample of pages Classify pages and ads Rare event estimation using hierarchy Impute impressions, fix sampling bias 11 Rare rate modeling 1. Freeman-Tukey transform: yij = F-T(clicks and impressions at ij) ≈ transformed-CTR Variance stabilizing transformation: Var(y) is independent of E[y] needed in further modeling 12 Rare rate modeling 2. Generative Model (Tree-structured Markov Model) variance Wij Unobserved “state” Sparent(ij) Sij covariates βij Wparent(ij) βparent(ij) variance Vij yij Vparent(ij) yparent(ij) 13 Rare rate modeling Model fitting with a 2-pass Kalman filter: Filtering: Leaf to root Smoothing: Root to leaf Linear in the number of regions 14 Experiments 503M impressions 7-level hierarchy of which the top 3 levels were used Zero clicks in 76% regions in level 2 95% regions in level 3 Full dataset DFULL, and a 2/3 sample DSAMPLE 15 Experiments Estimate CTRs for all regions R in level 3 with zero clicks in DSAMPLE Some of these regions R>0 get clicks in DFULL A good model should predict higher CTRs for R>0 as against the other regions in R 16 Experiments We compared 4 models TS: our tree-structured model LM (level-mean): each level smoothed independently NS (no smoothing): CTR proportional to 1/Ñ Random: Assuming |R>0| is given, randomly predict the membership of R>0 out of R 17 Experiments TS 18 Experiments Few impressions Estimates depend more on siblings Enough impressions little “borrowing” from siblings 19 Related Work Multi-resolution modeling Imputation studied in time series modeling and spatial statistics [Openshaw+/79, Cressie/90, Chou+/94] studied in statistics [Darroch+/1972] Application of such models to estimation of such rare events (rates of ~10-3) is novel 20 Conclusions We presented a method to estimate rates of extremely rare events at multiple resolutions under severe sparsity constraints Our method has two parts Imputation incorporates hierarchy, fixes sampling bias Tree-structured generative model extremely fast parameter fitting 21 Rare rate modeling 1. Freeman-Tukey transform ~ # clicks in region r ~ # impressions in region r Distinguishes between regions with zero clicks based on the number of impressions Variance stabilizing transformation: Var(y) is independent of E[y] needed in further modeling 22 Rare rate modeling Generative Model Sij values can be quickly estimated using a Kalman filtering algorithm Kalman filter requires knowledge of β, V, and W EM wrapped around the Kalman filter 23 Rare rate modeling Fitting using a Kalman filtering algorithm Filtering: Recursively aggregate data from leaves to root Smoothing: Propagate information from root to leaves Complexity: linear in the number of regions, for both time and space 24 Rare rate modeling Fitting using a Kalman filtering algorithm Filtering: Recursively aggregate data from leaves to root Smoothing: Propagates information from root to leaves Kalman filter requires knowledge of β, V, and W EM wrapped around the Kalman filter 25 Imputing xij Iterative Proportional Fitting [Darroch+/1972] Initialize xij = nij + mij (i) Z Z(i+1) block Top-down: • Scale all xij in every block in Z(i+1) to sum to its parent in Z(i) • Scale all xij in Z(i+1) to sum to the row totals • Scale all xij in Z(i+1) to sum to the column totals Repeat for every level Z(i) Bottom-up: Similar 26