Slide - PersDB 2009

advertisement
Recommender Systems Revisited
– From Items to Transactions
1
Laks V.S. Lakshmanan
University of British Columbia
Vancouver, Canada
http://www.cs.ubc.ca/~laks
Joint work with Zeinab Abbassi.
PersDB'09, Lyon.
3/12/2016
Why Recommendations?
2
 The Recommendation Paradigm

Suggest content (in most cases, items) to users based on her profile
and past activities.
 Why Recommendation?


Search queries can be generic: e.g., >90% of Yahoo! Travel queries
are general descriptions like family trip.
More so for Social Content Sites ...
PersDB'09, Lyon, August 2009.
Why Recommendations?
3
 Social Content Sites

Sites where users make friends and share contents


E.g., facebook, del.icio.us, Flickr, etc.
Content sites letting you share info with your social buddies

E.g., nytimes.com, indiatimes.com, youtube.com
 Recommendation is an indispensible information
exploration paradigm on social content sites.
 The rich activities and user connections provide lots
of opportunities for generating recommendations.
PersDB'09, Lyon, August 2009.
Overview of RecSys
4
 Item-Based Strategies
 Estimate the rating of an unrated item (i) by the user
(u) based on its similarity to items already rated and
how u rated those items.
 Collaborative Filtering Strategies
 Estimate the rating of i by u based on how u’s similarity
network (either explicit or implicit) rated i.
PersDB'09, Lyon, August 2009.
Overview of RecSys
5
U1
U2
…
X
Ui
…
Item-based.
I1
I2
…
Ij …
Overview of RecSys
6
U1
U2
…
X
Ui
…
User-based:
Collaborative
Filtering
I1
I2
…
Ij …
Overview of RecSys
7
 Fusion Strategies
 Model/Machine Leaning-based approaches.
 What’s common among all RecSys algorithms?
 Recommend items to users.
 What if we want to recommend transactions instead?
 Motivating Apps
 User exchanging items
Offline (one-shot) exchanges
 Asynchronous exchanges


Users buying/selling items for a price
Talk Outline
8
 Motivation
 Problem Definition
 Related Work
 Our Approach
 Experimental Results
 Summary & Future Work.
PersDB'09, Lyon, August 2009.
Motivation
9
 Online social networks – emergence and rapid
growth.
 Users spend more time on online social networks.
 MySpace and Facebook are among the top 10
websites:
Motivation – Exchange Markets
 There are exchange
10
markets around on the
Web today:
 OddShoe.org
 Peerflix.com (movie
exchange)
 ReadItSwapIt.co.uk
 JoeBarter.com
 Intervac
PersDB'09, Lyon, August 2009.
10
Motivation
11
 Need “Matching” Algorithms
 Enhance quality of user experience.
 Let more people be engaged in the system and more of the
time.

Monetization.
 Lack of comprehensive study of “matching”.
 Need efficient recommendation algorithms.
PersDB'09, Lyon, August 2009.
Some Related Problems
12
 Chinese Postman Problem:
 Collect mail from postal station (s).
 Deliver mail on all streets (edges).
 Minimize distance covered (fuel consumed).
Some Related Problems
13
 Cycle Covers:
?
 Let’s try again!
?
Some Related Problems
14
 Cycle Covers:
12
6
7
15
20
16
Can we cover all vertices with edge-disjoint cycles?
What is the minimum weight of such a cover?
Some Related Problems
15
 Cycle Covers:
12
6
7
15
20
16
Can we cover all vertices with edge-disjoint cycles?
What is the minimum weight of such a cover?
Vertex-disjoint, edge-disjoint, vertex/edge cover,
bounded length, etc. – variants.
Related Work
16
 Graphs – Cycle Cover Problem:
 Polytime algorithm for Chinese Postman problem on
undirected graphs [Edmonds & Johnson 73].
 CPP is NP-hard for mixed graphs [Papadimitriou 76] but
admits a 3/2-approx. [Raghavachari & Veerasamy 99].
 Cycle Cover -- cover given set of nodes/edges with set of
min. length cycles.
 Min. Weight Cycle Covers – a variant of CPP; NP-hard in
general [Thomassen 97].
 CC w/ bounds on cycle length (heuristic) [Hochbaum and
Olinick 01].
 Approximation algorithms when length is bounded
[Immorlica+ 05].
PersDB'09, Lyon, August 2009.
Related Work
17
 Recommender Systems:
 Management science perspective [Murthy & Sarkar 03].
 Collaborative filtering [Resnik+ 94, Shani+ 02].
 Survey of item-based, collaborative filtering, fusion-based,
and model-based [Adomavicius & Tuzhilin 05].
PersDB'09, Lyon, August 2009.
Related Work
18
 Kidney Exchange problem:
4,000 deaths/yr in US.
70,000 waiting for a cadaver kidney.
Related Work
19
 Kidney Exchange problem:
 In kidney transplants frequently the donor’s kidney is not
compatible with the patient’s.
 Example: A’ is willing to donate her kidney to A and B’ to B
but incompatible. However B’ kidney compatible with A and
A’ kidney with B.
 Motivation: Find feasible exchanges and save more
people’s lives.
 Medical constraints: no cycle
longer than 3!
PersDB'09, Lyon, August 2009.
Related Work
20
 Bi-cycles in this case: perfect matching,
therefore polynomial!
 Cycles of length 3 or more: NP-complete.
 In [Abraham+ 07] solved by Integer Linear
Programming for the problem of United States
kidney exchange with real data!
 We will look at a more general problem than KE.
 Incentivizing exchanges in P2P file-sharing
systems [Anagnostakis & Greenwald 04].
PersDB'09, Lyon, August 2009.
A Model
21
 Set of users U and a set of items I.
 Two lists for each user u in U – item list Su,
items u is willing to give away; wish list Wu,
items u is looking for.
 Network – nodes = users; u  v iff there is a
feasible transaction from u to v.
 Edges labeled with item.
i
v
u
j
PersDB'09, Lyon, August 2009.
Example
3/12/2016
22
PersDB'09, Lyon.
Different Models
23
 One-shot exchange Markets:
 Simple
exchange markets (swaps).
 Exchange markets through short
cycles.
 Probabilistic exchange markets.
 Wish List as Query List.
 Exchange markets over time.
PersDB'09, Lyon, August 2009.
Simple exchange markets
24
 Only one-by-one transactions.
i
 The problem is to find a set
of pairs:


[ (u,i) , (v,j)] where i Є Su, j Є
Wu, i Є Wv and j Є Sv.
form 2-cycles (swaps).
v
u
i
k
j
 Typically each user has one
instance of any item and also
wants one instance of an item
in his wish list  [ (u,i) , (*,*)]
should not appear more than
once for each user u, i.e.,
looking for a set of conflictfree cycles.
3/12/2016
w
PersDB'09, Lyon, August 2009.
24
Exchange markets through cycles.
25
 We look for cycles of
length more than 2 in
the system.
B7
Alice
Bob
 The goal is to find
cycles:
[ (u_1,i_1) , (u_2,i_2) ,
(u_3,i_3) , …, (u_k,i_k)
] where i_1 in S_u1, i_1
in W_u2, i_2 in S_u2,
i_2 in W_u2, ….
3/12/2016
B4
B8
PersDB'09, Lyon, August 2009.
Amy
25
Exchange markets through
short
cycles
26
 Note: A cycle can happen if and only if all the
participating edges are realized.
  discover short cycles and solve the short
cycle cover problem for cycles of length <= k,
where k = 3, 4, 5, …
PersDB'09, Lyon, August 2009.
Probabilistic Exchange Markets
27
 Each edge in the graph has a probability
indicating the likelihood of it being realized.
 The probability of realizing each edge is
independent of the other edges.
 Two kinds of probabilities are of interest:


Pu(v): what’s the probability u is willing to perform a
transaction with v?
Pu(i,j): what’s the probability u is willing to exchange item i
for j?
PersDB'09, Lyon, August 2009.
Query List as Wish List
28
 Wish list only contains “predicates” instead of
items.

E.g., horror movie, science fiction, eastern philosophy,
home hardware, …
 Item list as before.
 Users may rate/review items.
 Matchmaking has to factor in ratings, i.e.,
matching has to use RecSys technology.
-- Will focus on simple exchange, short cycles, and
prob. markets in this talk.
Goal
29
 Generate recommendations that maximize the
(expected) number of items exchanged through
the network.
 Each user u gets a reco.:


Gives = {(give i to v), …}
Gets = {(get j from w), …}
 Set of reco’s together constitute a set of
conflict-free cycles that maximize above metric.
SimpleMarket Problem
30
 Theorem: Even SimpleMarket problem is NP



complete.
(Contrast with one-by-one kidney exchange.)
Reduction from four-cycle partitioning of 4partite graphs to our problem.
Reduction from three-cycle partitioning of 3partite graphs to 4-cycle partitioning of 4-partite
graphs.
3-cycle partitionining of 3-partite graphs is NPcomplete [Holyer 81, Abraham+ 06].
PersDB'09, Lyon, August 2009.
ProbMarket
31
Lemma: The kidney exchange version of
ProbMarket can be solved in polynomial time.
Idea: Maximum weighted perfect matching
continues to work.
PersDB'09, Lyon, August 2009.
Algorithms
32
 Maximal set of Cycles
 Greedy.
 Local Search.
 Greedy/Local Search.
PersDB'09, Lyon, August 2009.
Maximal Algorithm
33
 Initialize the set of cycles CFSC=empty.
 At each step,




Find an exchange cycle C.
Add C to the set of cycles CFSC.
Remove all edges in G in conflict with this cycle.
Terminate if there is no remaining cycle.
 Find an exchange cycle C:


Run a DFS or BFS algorithm until you find a backward
edge.
BFS tends to find short cycles.
PersDB'09, Lyon, August 2009.
Greedy Algorithm
34
 Initialize the set of cycles CFSC=empty.
 At each step,




Find the best exchange cycle C.
Add C to the set of cycles CFSC.
Remove all edges in conflict with this cycle.
Terminate if there is no remaining cycle.
 Find the best exchange cycle C:

Try all short cycles and find the cycle with maximum
weight.
PersDB'09, Lyon, August 2009.
Intermediate Maximal/Greedy
35
 Improve Running Time.
 Find the best exchange cycle C:


Run BFS from each node v and find a cycle Cv.
Find the cycle Cv with the maximum weight and add it.
PersDB'09, Lyon, August 2009.
Local search algorithm
36
 Initialize the set of cycles CFSC=empty.
 At each step,
 Let the current set of cycles be CFSC.
 For any exchange cycle C that is not already picked,
Try to add C, and remove all cycles in CFSC in conflict
with C
 If the total weight of CFSC increases, add C to CFSC
and remove all conflicting cycles from CFSC.


If no local improvement is possible, output CFSC and
terminate.
PersDB'09, Lyon, August 2009.
Greedy/Local Search
37
 First, Run the greedy algorithm to find a set of
cycles CFSC.
 Then, Run the local search algorithm starting
from the set CFSC.
 How good are these algorithms?
PersDB'09, Lyon, August 2009.
Set Packing
38
 Our problem is a special case of weighted k-set
packing problem:
 Given a collection of sets, each of which has an
associated real weight and contains at most k
elements drawn from a finite base set, find a
collection of disjoint sets of maximum total
weight.
Output
Input
PersDB'09, Lyon, August 2009.
http://www.cs.sunysb.edu/~algorith/files/set-packing.shtml
Relation to set packing
39
 Elements  (User u gives item i) (User v gets
item j)
 Sets  Cycles of exchanges.
 Weights of sets:


Short cycle case: weight is 2k for k item exchanges.
Probabilistic: weight is [\pi_{e \in C} p(e)]*2k.
 Main difference: Sets are not given explicitly.
 Sets are cycles (given implicitly) and we have to
discover them.
PersDB'09, Lyon, August 2009.
Quality of Algorithms
40
 Maximal: No guaranteed quality: O((|V| + |E|)|B|)
time.
 Greedy: 2k-approximation [Chandra & Halldorsson
99]: O(|V|^2k |B|) time.
 Local Search: (2K-1)-approximation. [Arkin &Hassin
97]: O(|V|^2k|E|log OPT).
 Local Greedy: 2(2k+1)/3-approximation [Arkin &
Hassin 97].
 More details  [Abbassi & L 09].
PersDB'09, Lyon, August 2009.
Experiments
41
 Algorithms implemented in MATLAB and run on
2.16 GHz Intel Core 2 Duo CPU and 1 GBof RAM
under Windows XP.
 Goals:



Extent to which allowing cycles of length > 2 increases
coverage of users/items.
Quality of results of algorithms (Recall: Maximal has no
theoretical guarantees).
Scalability.
 Synthetic data: Structure as well as user
activities follow power law [Newman 03].
Takeaways: Allowing Larger Cycles
42
%Increase
23
34
45
Maximal
5.56
3.40
2.70
Greedy
7.33
3.81
3.12
Local Search
7.71
3.85
3.35
Experimental Results
43
43
44
44
45
45
Takeaways: Maximal vs Approximation
Algorithms
46
 Skew factor = 1.0, cycle length bound = 4,
#users = 25K.
Algorithm
#items exchanged
Maximal
60,000
Greedy, Local Search
65,000
 Skew factor = 1.5, cycle length bound = 4,
#users = 25K.
Algorithm
#items exchanged
Maximal
35,000
Greedy, Local Search
41,000
Summary & Future Work
47
 Market exchanges over online social nets – simple,






short cycles, probabilistic.
Related kidney exchange problem – polytime for
swaps and NP-complete for k > 2.
Even swaps NP-complete for market exchange.
Reduction to weighted k-set packing 
approximation algorithms and Maximal (heuristic).
Experiments: “Diminishing returns” as k goes up.
Maximal – more than one order of magnitude more
efficient and comparable quality!
More empirical analysis needed.
PersDB'09, Lyon, August 2009.
Summary & Future Work
48
 Experiments on Real data sets.
 More efficient approximation algorithms?
 Randomization?
 Exchange Markets over time?
 Many different objectives: e.g., #items exchanged,
fairness, average waiting time, …
 Market Price, Buy/Sell.
 Connection with game theory.
 Query List as Wish List (think movie in place of
kidney!)
PersDB'09, Lyon, August 2009.
Other Projects in Social Networks and A
Shameless Ad
49
 Mining/Analysis of Social Networks (e.g., for viral






marketing).
Network Evolution.
Diversification in RecSys.
Network-aware Search.
Social Search – SocialScope.
Opportunities for grad students and postdocs.
See http://www.cs.ubc.ca/~laks and UBC CS Grad
Programs
50
References Cited
51
 N. Immorlica, V. S. Mirrokni, and M. Mahdian, “Cycle cover with





short cycles,” in Symposium on Theoretical Aspects of Computer
Science (STACS) 2005.
J. D. Hartline, V. S. Mirrokni, and M. Sundararajan. Optimal
marketing strategies over social networks. In WWW’08.
J. Edmonds and E. Johnson, “Matching euler tours and the chinese
postman problem.” Mathematical programming, vol. 5, pp. 88–124,
1973.
C. H. Papadimitriou, “On the complexity of edge traversing.” Journal
of the ACM, vol. 23, no. 3, pp. 544–554, 1976.
B. Rachavachari and J. Veerasamy, “A 3/2-approximation algorithm
for the mixed postman problem,” SIAM journal of Discrete Math.,
vol. 12, pp. 425–433, 1999.
C. Thomassen, “On the complexity of finding a minimum cycle cover
of a graph,” SIAM J. Comput., vol. 26, no. 3, pp. 675–677, 1997.
References Cited
52
 D. Hochbaum and E. Olinick, “The bounded cycle-cover problem,”




INFORMS Journal on Computing, vol. 13, no. 2, pp. 104–109, 2001.
B. Murthi and S. Sarkar, “The role of the management sciences in
research on personalization,” Management Science, vol. 49, no. 10,
pp. 1344–1362, 2003.
P. Resnick, N. Iakovou, M. Sushak, P. Bergstrom, and J. Riedl,
“Grouplens: An open architecture for collaborative filtering of
netnews,” in Computer Supported Cooperative Work Conference,
1994.
G. Shani, R. Brafman, and D. Heckerman, “An mdp-based
recommender system,” in 18th Conference Uncertainty in Artificial
Intelligence, August, 2002.
D. J. Abraham, A. Blum, and T. Sandholm, “Clearing algorithms for
barter exchange markets: Enabling nationwide kidney exchanges,”
inACM Conference on Electronic Commerce, June 13-16 2007, pp.
295–304.
References Cited
53
 K. G. Anagnostakis and M. B. Greenwald, “Exchange-based incentive





mechanisms for peer-to-peer file sharing,” in 24th International
Conference on Distributed Computing Systems, 2004, pp. 524–533.
I. Holyer, “The np-completeness of some edge-partition problems.”
SIAM Journal of Computing,, vol. 10, no. 4, pp. 713–717, Nov. 1981.
D. J. Abraham, N. Chen, V. Kumar, and V. Mirrokni, “Assignment
problems in rental markets,” in WINE 2006, 2006, pp. 198–213.
B. Chandra and M. Halldorsson, “Greedy local improvement and
weighted set packing approximation,” in SODA 1999, 1999, pp. 169–
176.
E. M. Arkin and R. Hassin, “On local search for weighted k-set
packing.” in ESA 1997, 1997, pp. 13–22.
Zeinab Abbassi, Laks V. S. Lakshmanan: On Efficient
Recommendations for Online Exchange Markets. ICDE 2009: 712723.
Download