Recommender Systems Revisited – From Items to Transactions 1 Laks V.S. Lakshmanan University of British Columbia Vancouver, Canada http://www.cs.ubc.ca/~laks Joint work with Zeinab Abbassi. PersDB'09, Lyon. 3/12/2016 Why Recommendations? 2 The Recommendation Paradigm Suggest content (in most cases, items) to users based on her profile and past activities. Why Recommendation? Search queries can be generic: e.g., >90% of Yahoo! Travel queries are general descriptions like family trip. More so for Social Content Sites ... PersDB'09, Lyon, August 2009. Why Recommendations? 3 Social Content Sites Sites where users make friends and share contents E.g., facebook, del.icio.us, Flickr, etc. Content sites letting you share info with your social buddies E.g., nytimes.com, indiatimes.com, youtube.com Recommendation is an indispensible information exploration paradigm on social content sites. The rich activities and user connections provide lots of opportunities for generating recommendations. PersDB'09, Lyon, August 2009. Overview of RecSys 4 Item-Based Strategies Estimate the rating of an unrated item (i) by the user (u) based on its similarity to items already rated and how u rated those items. Collaborative Filtering Strategies Estimate the rating of i by u based on how u’s similarity network (either explicit or implicit) rated i. PersDB'09, Lyon, August 2009. Overview of RecSys 5 U1 U2 … X Ui … Item-based. I1 I2 … Ij … Overview of RecSys 6 U1 U2 … X Ui … User-based: Collaborative Filtering I1 I2 … Ij … Overview of RecSys 7 Fusion Strategies Model/Machine Leaning-based approaches. What’s common among all RecSys algorithms? Recommend items to users. What if we want to recommend transactions instead? Motivating Apps User exchanging items Offline (one-shot) exchanges Asynchronous exchanges Users buying/selling items for a price Talk Outline 8 Motivation Problem Definition Related Work Our Approach Experimental Results Summary & Future Work. PersDB'09, Lyon, August 2009. Motivation 9 Online social networks – emergence and rapid growth. Users spend more time on online social networks. MySpace and Facebook are among the top 10 websites: Motivation – Exchange Markets There are exchange 10 markets around on the Web today: OddShoe.org Peerflix.com (movie exchange) ReadItSwapIt.co.uk JoeBarter.com Intervac PersDB'09, Lyon, August 2009. 10 Motivation 11 Need “Matching” Algorithms Enhance quality of user experience. Let more people be engaged in the system and more of the time. Monetization. Lack of comprehensive study of “matching”. Need efficient recommendation algorithms. PersDB'09, Lyon, August 2009. Some Related Problems 12 Chinese Postman Problem: Collect mail from postal station (s). Deliver mail on all streets (edges). Minimize distance covered (fuel consumed). Some Related Problems 13 Cycle Covers: ? Let’s try again! ? Some Related Problems 14 Cycle Covers: 12 6 7 15 20 16 Can we cover all vertices with edge-disjoint cycles? What is the minimum weight of such a cover? Some Related Problems 15 Cycle Covers: 12 6 7 15 20 16 Can we cover all vertices with edge-disjoint cycles? What is the minimum weight of such a cover? Vertex-disjoint, edge-disjoint, vertex/edge cover, bounded length, etc. – variants. Related Work 16 Graphs – Cycle Cover Problem: Polytime algorithm for Chinese Postman problem on undirected graphs [Edmonds & Johnson 73]. CPP is NP-hard for mixed graphs [Papadimitriou 76] but admits a 3/2-approx. [Raghavachari & Veerasamy 99]. Cycle Cover -- cover given set of nodes/edges with set of min. length cycles. Min. Weight Cycle Covers – a variant of CPP; NP-hard in general [Thomassen 97]. CC w/ bounds on cycle length (heuristic) [Hochbaum and Olinick 01]. Approximation algorithms when length is bounded [Immorlica+ 05]. PersDB'09, Lyon, August 2009. Related Work 17 Recommender Systems: Management science perspective [Murthy & Sarkar 03]. Collaborative filtering [Resnik+ 94, Shani+ 02]. Survey of item-based, collaborative filtering, fusion-based, and model-based [Adomavicius & Tuzhilin 05]. PersDB'09, Lyon, August 2009. Related Work 18 Kidney Exchange problem: 4,000 deaths/yr in US. 70,000 waiting for a cadaver kidney. Related Work 19 Kidney Exchange problem: In kidney transplants frequently the donor’s kidney is not compatible with the patient’s. Example: A’ is willing to donate her kidney to A and B’ to B but incompatible. However B’ kidney compatible with A and A’ kidney with B. Motivation: Find feasible exchanges and save more people’s lives. Medical constraints: no cycle longer than 3! PersDB'09, Lyon, August 2009. Related Work 20 Bi-cycles in this case: perfect matching, therefore polynomial! Cycles of length 3 or more: NP-complete. In [Abraham+ 07] solved by Integer Linear Programming for the problem of United States kidney exchange with real data! We will look at a more general problem than KE. Incentivizing exchanges in P2P file-sharing systems [Anagnostakis & Greenwald 04]. PersDB'09, Lyon, August 2009. A Model 21 Set of users U and a set of items I. Two lists for each user u in U – item list Su, items u is willing to give away; wish list Wu, items u is looking for. Network – nodes = users; u v iff there is a feasible transaction from u to v. Edges labeled with item. i v u j PersDB'09, Lyon, August 2009. Example 3/12/2016 22 PersDB'09, Lyon. Different Models 23 One-shot exchange Markets: Simple exchange markets (swaps). Exchange markets through short cycles. Probabilistic exchange markets. Wish List as Query List. Exchange markets over time. PersDB'09, Lyon, August 2009. Simple exchange markets 24 Only one-by-one transactions. i The problem is to find a set of pairs: [ (u,i) , (v,j)] where i Є Su, j Є Wu, i Є Wv and j Є Sv. form 2-cycles (swaps). v u i k j Typically each user has one instance of any item and also wants one instance of an item in his wish list [ (u,i) , (*,*)] should not appear more than once for each user u, i.e., looking for a set of conflictfree cycles. 3/12/2016 w PersDB'09, Lyon, August 2009. 24 Exchange markets through cycles. 25 We look for cycles of length more than 2 in the system. B7 Alice Bob The goal is to find cycles: [ (u_1,i_1) , (u_2,i_2) , (u_3,i_3) , …, (u_k,i_k) ] where i_1 in S_u1, i_1 in W_u2, i_2 in S_u2, i_2 in W_u2, …. 3/12/2016 B4 B8 PersDB'09, Lyon, August 2009. Amy 25 Exchange markets through short cycles 26 Note: A cycle can happen if and only if all the participating edges are realized. discover short cycles and solve the short cycle cover problem for cycles of length <= k, where k = 3, 4, 5, … PersDB'09, Lyon, August 2009. Probabilistic Exchange Markets 27 Each edge in the graph has a probability indicating the likelihood of it being realized. The probability of realizing each edge is independent of the other edges. Two kinds of probabilities are of interest: Pu(v): what’s the probability u is willing to perform a transaction with v? Pu(i,j): what’s the probability u is willing to exchange item i for j? PersDB'09, Lyon, August 2009. Query List as Wish List 28 Wish list only contains “predicates” instead of items. E.g., horror movie, science fiction, eastern philosophy, home hardware, … Item list as before. Users may rate/review items. Matchmaking has to factor in ratings, i.e., matching has to use RecSys technology. -- Will focus on simple exchange, short cycles, and prob. markets in this talk. Goal 29 Generate recommendations that maximize the (expected) number of items exchanged through the network. Each user u gets a reco.: Gives = {(give i to v), …} Gets = {(get j from w), …} Set of reco’s together constitute a set of conflict-free cycles that maximize above metric. SimpleMarket Problem 30 Theorem: Even SimpleMarket problem is NP complete. (Contrast with one-by-one kidney exchange.) Reduction from four-cycle partitioning of 4partite graphs to our problem. Reduction from three-cycle partitioning of 3partite graphs to 4-cycle partitioning of 4-partite graphs. 3-cycle partitionining of 3-partite graphs is NPcomplete [Holyer 81, Abraham+ 06]. PersDB'09, Lyon, August 2009. ProbMarket 31 Lemma: The kidney exchange version of ProbMarket can be solved in polynomial time. Idea: Maximum weighted perfect matching continues to work. PersDB'09, Lyon, August 2009. Algorithms 32 Maximal set of Cycles Greedy. Local Search. Greedy/Local Search. PersDB'09, Lyon, August 2009. Maximal Algorithm 33 Initialize the set of cycles CFSC=empty. At each step, Find an exchange cycle C. Add C to the set of cycles CFSC. Remove all edges in G in conflict with this cycle. Terminate if there is no remaining cycle. Find an exchange cycle C: Run a DFS or BFS algorithm until you find a backward edge. BFS tends to find short cycles. PersDB'09, Lyon, August 2009. Greedy Algorithm 34 Initialize the set of cycles CFSC=empty. At each step, Find the best exchange cycle C. Add C to the set of cycles CFSC. Remove all edges in conflict with this cycle. Terminate if there is no remaining cycle. Find the best exchange cycle C: Try all short cycles and find the cycle with maximum weight. PersDB'09, Lyon, August 2009. Intermediate Maximal/Greedy 35 Improve Running Time. Find the best exchange cycle C: Run BFS from each node v and find a cycle Cv. Find the cycle Cv with the maximum weight and add it. PersDB'09, Lyon, August 2009. Local search algorithm 36 Initialize the set of cycles CFSC=empty. At each step, Let the current set of cycles be CFSC. For any exchange cycle C that is not already picked, Try to add C, and remove all cycles in CFSC in conflict with C If the total weight of CFSC increases, add C to CFSC and remove all conflicting cycles from CFSC. If no local improvement is possible, output CFSC and terminate. PersDB'09, Lyon, August 2009. Greedy/Local Search 37 First, Run the greedy algorithm to find a set of cycles CFSC. Then, Run the local search algorithm starting from the set CFSC. How good are these algorithms? PersDB'09, Lyon, August 2009. Set Packing 38 Our problem is a special case of weighted k-set packing problem: Given a collection of sets, each of which has an associated real weight and contains at most k elements drawn from a finite base set, find a collection of disjoint sets of maximum total weight. Output Input PersDB'09, Lyon, August 2009. http://www.cs.sunysb.edu/~algorith/files/set-packing.shtml Relation to set packing 39 Elements (User u gives item i) (User v gets item j) Sets Cycles of exchanges. Weights of sets: Short cycle case: weight is 2k for k item exchanges. Probabilistic: weight is [\pi_{e \in C} p(e)]*2k. Main difference: Sets are not given explicitly. Sets are cycles (given implicitly) and we have to discover them. PersDB'09, Lyon, August 2009. Quality of Algorithms 40 Maximal: No guaranteed quality: O((|V| + |E|)|B|) time. Greedy: 2k-approximation [Chandra & Halldorsson 99]: O(|V|^2k |B|) time. Local Search: (2K-1)-approximation. [Arkin &Hassin 97]: O(|V|^2k|E|log OPT). Local Greedy: 2(2k+1)/3-approximation [Arkin & Hassin 97]. More details [Abbassi & L 09]. PersDB'09, Lyon, August 2009. Experiments 41 Algorithms implemented in MATLAB and run on 2.16 GHz Intel Core 2 Duo CPU and 1 GBof RAM under Windows XP. Goals: Extent to which allowing cycles of length > 2 increases coverage of users/items. Quality of results of algorithms (Recall: Maximal has no theoretical guarantees). Scalability. Synthetic data: Structure as well as user activities follow power law [Newman 03]. Takeaways: Allowing Larger Cycles 42 %Increase 23 34 45 Maximal 5.56 3.40 2.70 Greedy 7.33 3.81 3.12 Local Search 7.71 3.85 3.35 Experimental Results 43 43 44 44 45 45 Takeaways: Maximal vs Approximation Algorithms 46 Skew factor = 1.0, cycle length bound = 4, #users = 25K. Algorithm #items exchanged Maximal 60,000 Greedy, Local Search 65,000 Skew factor = 1.5, cycle length bound = 4, #users = 25K. Algorithm #items exchanged Maximal 35,000 Greedy, Local Search 41,000 Summary & Future Work 47 Market exchanges over online social nets – simple, short cycles, probabilistic. Related kidney exchange problem – polytime for swaps and NP-complete for k > 2. Even swaps NP-complete for market exchange. Reduction to weighted k-set packing approximation algorithms and Maximal (heuristic). Experiments: “Diminishing returns” as k goes up. Maximal – more than one order of magnitude more efficient and comparable quality! More empirical analysis needed. PersDB'09, Lyon, August 2009. Summary & Future Work 48 Experiments on Real data sets. More efficient approximation algorithms? Randomization? Exchange Markets over time? Many different objectives: e.g., #items exchanged, fairness, average waiting time, … Market Price, Buy/Sell. Connection with game theory. Query List as Wish List (think movie in place of kidney!) PersDB'09, Lyon, August 2009. Other Projects in Social Networks and A Shameless Ad 49 Mining/Analysis of Social Networks (e.g., for viral marketing). Network Evolution. Diversification in RecSys. Network-aware Search. Social Search – SocialScope. Opportunities for grad students and postdocs. See http://www.cs.ubc.ca/~laks and UBC CS Grad Programs 50 References Cited 51 N. Immorlica, V. S. Mirrokni, and M. Mahdian, “Cycle cover with short cycles,” in Symposium on Theoretical Aspects of Computer Science (STACS) 2005. J. D. Hartline, V. S. Mirrokni, and M. Sundararajan. Optimal marketing strategies over social networks. In WWW’08. J. Edmonds and E. Johnson, “Matching euler tours and the chinese postman problem.” Mathematical programming, vol. 5, pp. 88–124, 1973. C. H. Papadimitriou, “On the complexity of edge traversing.” Journal of the ACM, vol. 23, no. 3, pp. 544–554, 1976. B. Rachavachari and J. Veerasamy, “A 3/2-approximation algorithm for the mixed postman problem,” SIAM journal of Discrete Math., vol. 12, pp. 425–433, 1999. C. Thomassen, “On the complexity of finding a minimum cycle cover of a graph,” SIAM J. Comput., vol. 26, no. 3, pp. 675–677, 1997. References Cited 52 D. Hochbaum and E. Olinick, “The bounded cycle-cover problem,” INFORMS Journal on Computing, vol. 13, no. 2, pp. 104–109, 2001. B. Murthi and S. Sarkar, “The role of the management sciences in research on personalization,” Management Science, vol. 49, no. 10, pp. 1344–1362, 2003. P. Resnick, N. Iakovou, M. Sushak, P. Bergstrom, and J. Riedl, “Grouplens: An open architecture for collaborative filtering of netnews,” in Computer Supported Cooperative Work Conference, 1994. G. Shani, R. Brafman, and D. Heckerman, “An mdp-based recommender system,” in 18th Conference Uncertainty in Artificial Intelligence, August, 2002. D. J. Abraham, A. Blum, and T. Sandholm, “Clearing algorithms for barter exchange markets: Enabling nationwide kidney exchanges,” inACM Conference on Electronic Commerce, June 13-16 2007, pp. 295–304. References Cited 53 K. G. Anagnostakis and M. B. Greenwald, “Exchange-based incentive mechanisms for peer-to-peer file sharing,” in 24th International Conference on Distributed Computing Systems, 2004, pp. 524–533. I. Holyer, “The np-completeness of some edge-partition problems.” SIAM Journal of Computing,, vol. 10, no. 4, pp. 713–717, Nov. 1981. D. J. Abraham, N. Chen, V. Kumar, and V. Mirrokni, “Assignment problems in rental markets,” in WINE 2006, 2006, pp. 198–213. B. Chandra and M. Halldorsson, “Greedy local improvement and weighted set packing approximation,” in SODA 1999, 1999, pp. 169– 176. E. M. Arkin and R. Hassin, “On local search for weighted k-set packing.” in ESA 1997, 1997, pp. 13–22. Zeinab Abbassi, Laks V. S. Lakshmanan: On Efficient Recommendations for Online Exchange Markets. ICDE 2009: 712723.