Dynamic Covering for Recommendation Systems Ioannis Antonellis Anish Das Sarma Shaddin Dughmi Outline • • Covering & Recommendations • Succinct Dynamic Covering • Results: o Upper Bounds o Lower Bounds Max k-cover Problem • Input: integer k o items: X = {1,2, ..., n} o sets: I = {S1, ..., Sm}, Si subset of X • Output: Find subset of I with size less than k that maximizes cover of items o 1 k=1, Solution: A (size=3) A 2 B 3 C 4 Sets k=2, Solutions: A,C (size=4) A,B (size=4) 5 Items B,C (size=4) Max k-cover Problem • NP-complete • Greedy Algorithm pick set that cover more items o iterate • 1 - ((k-1)/k)^k <= 1 - 1/e = 0.67 approximation o Items Sets 1 k=1, Solution: A (size=3) A B 2 3 C 4 k=2, Solutions: A,C (size=4) A,B (size=4) 5 B,C (size=4) Max k-cover in Recommendations Alice views and rates movies • Netflix would like to recommend new movies to Alice for watching • • Important problem: o o Find users "similar" to Alice Find users who cover a large set of Alice's likes and dislikes Netflix example • Each user is identified by subset of movies he likes/viewed • Alice likes {A, B, C} • Fred likes {A, D} • Bob likes {B, E} • Ben likes {C, F} • Jim likes {A, B, F} • James likes {A, B, F} Ben and Jim in conjunction cover all Alice's likes Fred, Bob and Ben in conjunction cover all Alice's likes Jim and James add same value k-covering vs nearest neighbor • for k=1, equivalent (dot product similarity) • covering allows for diversifying recommendations • want to cover all genres liked by a user o o o consider a user that likes 100 thriller movies and 10 comedies want "similar" users to cover as many movies as possible k-nearest neighbor attempts to find many similar users, not cover as many movies as possible oDesk example • Online labor marketplace • clients post jobs and/or invite contractors • contractors apply to jobs • Contractor recommendations for clients o Bob invites/interviews/hires contractors o find clients "similar" to Bob • Job recommendations for contractors o o Alice applies to jobs find contractors "similar" to Alice Succinct Dynamic Covering (SDC) • Input: o integer k o items: X = {1,2, ..., n} o sets: I = {S1, ..., Sm}, Si subset of X o query Q subset of X • Output: Find subset of I with size less than k that maximizes cover of items in query Q • However we further constrain the problem: o o space constrained: statically preprocess (X,I) and store a small sketch, much smaller than O(mn) dynamic: Q is not known apriori during the sketch creation Notice two twists • dynamic o o for each user the set of movies that need to be covered is different covering is not static • space-constrained o real time, interactive recommendations o the whole netflix graph is huge o 10 million users 100k movies popular movies have been viewed many times cannot process over the entire graph at query time Ad serving • online advertisers o o bid on webpages matching relevancy criteria target certain user demographics When a user visits a page • Ad servers: o have some (not precise) idea about the demographic of the user (e.g. from click logs) o try to pick a set of ads that cover many user demographics o need to solve the SDC probem Ad serving • space-constraint: o set system consists of users, webpages and clicks • dynamic: o each user view of each page is associated with different user demographic 1 A B 2 3 C 4 Ads 5 Webpages User visited pages Coverage Oracle • Offline stage: Input: integer k items: X = {1,2, ..., n} sets: I = {S1, ..., Sm}, Si subset of X • Output: Data Structure D o • Dynamic stage: o o Input: Query Q subset of X Output: use D to find subset of I with size less than k that maximizes cover of items in query Q Outline • • Covering & Recommendations • Succinct Dynamic Covering • Results: o Upper Bounds o Lower Bounds Results • given space limitations interested in approximate solutions for SDC • space vs approximation ratio tradeoffs o • ε: [0,1/2] • δ1, δ1: non-negative integers, not both zero Simple Deterministic Algorithm • For every item, "remember" one set • break ties arbitrarily • m/k approximation, linear space Sets Items Sets k=2: OPT = 16 APPROX = 8 ratio = 16/8 =2 Items Better Deterministic Algorithm • • • Find unchosen set containing the most uncovered items. Iterate. similar to previous algorithm, order is fixed sqrt(n/k) approximation, linear Sets space Items Sets Items k=2: OPT = 16 APPROX = 16 ratio = 16/16 = 1 Randomized Algorithm • • • • mε/sqrt(k) approximation nm1-2ε space Find unchosen set containing at least n/(mεsqrt(k)). Choose and Iterate. For every remaining unchosen set, choose n/m2ε uniformly at random from the uncovered items Randomized Algorithm • • • • mε/sqrt(k) approximation nm1-2ε space Find unchosen set containing at least n/(mεsqrt(k)). Choose and Iterate. For every remaining unchosen set, choose n/m2ε uniformly at random from the uncovered items Lower Bound • holds for deterministic oracles only • proof somewhat involved, uses the probabilistic method • matches randomized upper bound • Open problem: randomized lower bound Related word • • • distance oracles in graphs, Thorup and Zwick set cover in streaming model (sets are streams or items are streams) nearest neighbor (NN) search: o for k=1, SDC and NN are equivalent using the dot product similarity o no locality sensitive hashing for dot product (Charikar). So, no hope for signature schemes for SDC. Summary • • Introduced Succinct Dynamic Covering problem Applications in many real-world recommendation systems • approximation ratio and space tradeoffs • Deterministic and Randomized upper bounds • Deterministic lower bound Thank you!