Dynamic Covering for Recommendation Systems

advertisement
Dynamic Covering for
Recommendation Systems
Ioannis Antonellis
Anish Das Sarma
Shaddin Dughmi
Outline
•
• Covering & Recommendations
• Succinct Dynamic Covering
• Results:
o Upper Bounds
o Lower Bounds
Max k-cover Problem
• Input:
integer k
o items: X = {1,2, ..., n}
o sets: I = {S1, ..., Sm}, Si subset of X
• Output: Find subset of I with size less than k
that maximizes cover of items
o
1
k=1, Solution: A (size=3)
A
2
B
3
C
4
Sets
k=2, Solutions: A,C (size=4)
A,B
(size=4)
5
Items
B,C (size=4)
Max k-cover Problem
• NP-complete
• Greedy Algorithm
pick set that cover more items
o iterate
• 1 - ((k-1)/k)^k <= 1 - 1/e = 0.67 approximation
o
Items
Sets
1
k=1, Solution: A (size=3)
A
B
2
3
C
4
k=2, Solutions: A,C (size=4)
A,B
(size=4)
5
B,C (size=4)
Max k-cover in Recommendations
Alice views and rates movies
• Netflix would like to recommend new
movies to Alice for watching
•
•
Important problem:
o
o
Find users "similar" to Alice
Find users who cover a large set of Alice's
likes and dislikes
Netflix example
• Each user is identified by subset of movies he
likes/viewed
• Alice likes {A, B, C}
• Fred likes {A, D}
• Bob likes {B, E}
• Ben likes {C, F}
• Jim likes {A, B, F}
• James likes {A, B, F}
Ben and Jim in conjunction cover all Alice's likes
Fred, Bob and Ben in conjunction cover all Alice's
likes
Jim and James add same value
k-covering vs nearest neighbor
• for k=1, equivalent (dot product similarity)
• covering allows for diversifying
recommendations
• want to cover all genres liked by a user
o
o
o
consider a user that likes 100 thriller movies
and 10 comedies
want "similar" users to cover as many movies
as possible
k-nearest neighbor attempts to find many
similar users, not cover as many movies as
possible
oDesk example
• Online labor marketplace
• clients post jobs and/or invite contractors
• contractors apply to jobs
• Contractor recommendations for clients
o Bob invites/interviews/hires contractors
o find clients "similar" to Bob
• Job recommendations for contractors
o
o
Alice applies to jobs
find contractors "similar" to Alice
Succinct Dynamic Covering (SDC)
• Input:
o
integer k
o
items: X = {1,2, ..., n}
o
sets: I = {S1, ..., Sm}, Si subset of X
o
query Q subset of X
• Output: Find subset of I with size less than k that
maximizes cover of items in query Q
• However we further constrain the problem:
o
o
space constrained: statically preprocess (X,I) and
store a small sketch, much smaller than O(mn)
dynamic: Q is not known apriori during the sketch
creation
Notice two twists
• dynamic
o
o
for each user the set of movies that need to be
covered is different
covering is not static
• space-constrained
o
real time, interactive recommendations
o
the whole netflix graph is huge
o

10 million users

100k movies

popular movies have been viewed many times
cannot process over the entire graph at query time
Ad serving
• online advertisers
o
o
bid on webpages matching relevancy criteria
target certain user demographics
When a user visits a page
• Ad servers:
o have some (not precise) idea about the
demographic of the user (e.g. from click logs)
o try to pick a set of ads that cover many user
demographics
o need to solve the SDC probem
Ad serving
• space-constraint:
o set system consists of users, webpages and
clicks
• dynamic:
o
each user view of each page is associated
with different user demographic
1
A
B
2
3
C
4
Ads
5
Webpages
User visited pages
Coverage Oracle
• Offline stage:
Input:

integer k

items: X = {1,2, ..., n}

sets: I = {S1, ..., Sm}, Si subset of X
• Output: Data Structure D
o
• Dynamic stage:
o
o
Input: Query Q subset of X
Output: use D to find subset of I with size
less than k that maximizes cover of items in
query Q
Outline
•
• Covering & Recommendations
• Succinct Dynamic Covering
• Results:
o Upper Bounds
o Lower Bounds
Results
• given space limitations
interested in approximate solutions for SDC
• space vs approximation ratio tradeoffs
o
• ε: [0,1/2]
• δ1, δ1: non-negative integers, not both zero
Simple Deterministic Algorithm
• For every item, "remember" one set
• break ties arbitrarily
• m/k approximation, linear space
Sets
Items
Sets
k=2:
OPT = 16
APPROX = 8
ratio = 16/8 =2
Items
Better Deterministic Algorithm
•
•
•
Find unchosen set containing the most
uncovered items. Iterate.
similar to previous algorithm, order is fixed
sqrt(n/k) approximation,
linear Sets
space
Items
Sets
Items
k=2:
OPT = 16
APPROX = 16
ratio = 16/16 = 1
Randomized Algorithm
•
•
•
•
mε/sqrt(k) approximation
nm1-2ε space
Find unchosen set containing at least
n/(mεsqrt(k)). Choose and Iterate.
For every remaining unchosen set, choose n/m2ε
uniformly at random from the uncovered items
Randomized Algorithm
•
•
•
•
mε/sqrt(k) approximation
nm1-2ε space
Find unchosen set containing at least
n/(mεsqrt(k)). Choose and Iterate.
For every remaining unchosen set, choose n/m2ε
uniformly at random from the uncovered items
Lower Bound
• holds for deterministic oracles only
• proof somewhat involved, uses the probabilistic
method
• matches randomized upper bound
• Open problem: randomized lower bound
Related word
•
•
•
distance oracles in graphs, Thorup and Zwick
set cover in streaming model (sets are streams
or items are streams)
nearest neighbor (NN) search:
o for k=1, SDC and NN are equivalent using the
dot product similarity
o no locality sensitive hashing for dot product
(Charikar). So, no hope for signature
schemes for SDC.
Summary
•
•
Introduced Succinct Dynamic Covering problem
Applications in many real-world
recommendation systems
•
approximation ratio and space tradeoffs
•
Deterministic and Randomized upper bounds
•
Deterministic lower bound
Thank you!
Download