HyPER: A Flexible and Extensible Probabilistic Framework for Hybrid Recommender Systems Pigi Kouki, Shobeir Fakhraei, James Foulds, Magdalini Eirinaki, Lise Getoor University of California, Santa Cruz University of Maryland, College Park San Jose State University Motivation • Increasing amount of data useful for recommendations ratings social content demographic 2 Multiple Data Sources Combining ratings with other data sources improves performance • Content – [Gunawardana and Meek, RecSys 2009] – [Forbes and Zhu, RecSys 2011] – [de Campos et al., IJAR 51(7) 2010] • Social relationships – [Ma et al., WSDM 2011] – [Liu et al., DSS 55(3) 2013] 3 Multiple Data Sources Combining ratings with other data sources improves performance • Content – [Gunawardana and Meek, RecSys 2009] – [Forbes and Zhu, RecSys 2011] – [de Campos et al., IJAR 51(7) 2010] • Social relationships – [Ma et al., WSDM 2011] – [Liu et al., DSS 55(3) 2013] 4 Multiple Data Sources Combining ratings with other data sources improves performance • Content – [Gunawardana and Meek, RecSys 2009] – [Forbes and Zhu, RecSys 2011] – [de Campos et al., IJAR 51(7) 2010] • Social relationships – [Ma et al., WSDM 2011] – [Liu et al., DSS 55(3) 2013] 5 Multiple Data Sources Combining ratings with other data sources improves performance • Review text – [McAuley & Leskovec, RecSys 2013] – [Ling et al., RecSys, 2014] • Tags and labels – [Guy et al., SIGIR 2010] #cool #neat #ok #sucks • Feedback – [Sedhain et al., RecSys, 2014] 6 Multiple Data Sources Combining ratings with other data sources improves performance • Review text – [McAuley & Leskovec, RecSys 2013] – [Ling et al., RecSys, 2014] • Tags and labels – [Guy et al., SIGIR 2010] #cool #neat #ok #sucks • Feedback – [Sedhain et al., RecSys, 2014] 7 Multiple Data Sources Combining ratings with other data sources improves performance • Review text – [McAuley & Leskovec, RecSys 2013] – [Ling et al., RecSys, 2014] • Tags and labels – [Guy et al., SIGIR 2010] #cool #neat #ok #sucks • Feedback – [Sedhain et al., RecSys, 2014] 8 Multiple Recommenders Combining predictions of multiple recommenders also improves performance “Predictive accuracy is substantially improved when blending multiple predictors” -[Bell et al., The BellKor Solution to the Netflix Prize, 2007] See also: • • [Jahrer et al., KDD 2010] [Burke, In The Adaptive Web, 2007] 9 Desiderata for Hybrid Systems • To get the best performance, we should make use of all available data sources and algorithms • We need a framework that is: – General • Combines arbitrary data modalities • Combines multiple recommenders • problem and data-agnostic – Extensible to new information sources/recommenders – Scalable to large data sets 10 Desiderata for Hybrid Systems • To get the best performance, we should make use of all available data sources and algorithms • We need a framework that is: – General • Combines arbitrary data modalities • Combines multiple recommenders • problem and data-agnostic – Extensible to new information sources/recommenders – Scalable to large data sets 11 Desiderata for Hybrid Systems • To get the best performance, we should make use of all available data sources and algorithms • We need a framework that is: – General • Combines arbitrary data modalities • Combines multiple recommenders • problem and data-agnostic – Extensible to new information sources/recommenders – Scalable to large data sets 12 General Hybrid Recommenders in the Literature • Existing hybrid systems, though powerful, typically fall short on either generality, extensibility, or scalability – Often combine collaborative and/or content-based methods with each other or just one other data modality (cf. previous slides) – Some systems can leverage heterogeneous data • [Gemmell et al. 2012, Burke et al. 2014, Yu et al. 2014] • Probabilistic graphical modeling approaches are typically more general, less scalable – Bayesian networks [de Campos et al., IJAR 51(7) 2010] – Markov logic networks [Hoxha & Rettinger, ICMLA 2013] 13 General Hybrid Recommenders in the Literature • Existing hybrid systems, though powerful, typically fall short on either generality, extensibility, or scalability – Often combine collaborative and/or content-based methods with each other or just one other data modality (cf. previous slides) – Some systems can leverage heterogeneous data • [Gemmell et al. 2012, Burke et al. 2014, Yu et al. 2014] • Probabilistic graphical modeling approaches are typically more general, less scalable – Bayesian networks [de Campos et al., IJAR 51(7) 2010] – Markov logic networks [Hoxha & Rettinger, ICMLA 2013] 14 Our Approach We propose HyPER: Hybrid Probabilistic Extensible Recommender • A general, extensible, scalable recommender framework • Leverages advances in statistical relational learning – Probabilistic soft logic [Bach et al., UAI 2013, ArXiv 2015] • Inspired by recent work in drug-target interaction prediction [Fakhraei et al., Transactions on Computational Biology and Bioinformatics 11(5) 2014] 15 Hybrid Modeling with HyPER 3 Recommender 4 Data Source … Predicted Ratings 16 Hybrid Modeling with HyPER 3 Recommender 4 Data Source 1 … Predicted Ratings Data Source 2 … Data Source N 17 Hybrid Modeling with HyPER Recommender 1 3 Data Source 1 Recommender 2 … Data Source 2 … Recommender M HyPER 4 … Predicted Ratings Data Source N 18 HyPER: High-Level Approach • User-item ratings viewed as a weighted bipartite graph • Build hybrid model by adding links to encode additional information – multiple user and item similarities, social information,… • Predict ratings by reasoning over the graph, via a graphical model 19 HyPER: High-Level Approach • User-item ratings viewed as a weighted bipartite graph • Build hybrid model by adding links to encode additional information – multiple user and item similarities, social information,… • Predict ratings by reasoning over the graph, via a graphical model 20 HyPER: High-Level Approach • User-item ratings viewed as a weighted bipartite graph • Build hybrid model by adding links to encode additional information – multiple user and item similarities, social information,… • Predict ratings by reasoning over the graph, via a graphical model 21 Extended Recommendation Graph Ratings User Similarities Item Similarities 5 4 item-similarity1 3 ? 2 . . . . . . 5 4 3 item-similarity2 friendship user-similarity1 Additional Information Source friendship Additional Information Source item-similarity1 ? Additional Information Source Additional Information Source Existing Recommenders 22 Extended Recommendation Graph Ratings User Similarities Item Similarities 5 4 item-similarity1 3 ? 2 . . . . . . 5 4 3 item-similarity2 friendship user-similarity1 Additional Information Source friendship Additional Information Source item-similarity1 ? Additional Information Source Additional Information Source Existing Recommenders 23 Extended Recommendation Graph Ratings User Similarities Item Similarities 5 4 item-similarity1 3 ? 2 . . . . . . 5 4 3 item-similarity2 friendship user-similarity1 Additional Information Source friendship Additional Information Source item-similarity1 ? Additional Information Source Additional Information Source Existing Recommenders 24 Extended Recommendation Graph Ratings User Similarities Item Similarities 5 4 item-similarity1 3 ? 2 . . . . . . 5 4 3 item-similarity2 friendship user-similarity1 Additional Information Source friendship Additional Information Source item-similarity1 ? Additional Information Source Additional Information Source Existing Recommenders 25 Extended Recommendation Graph Ratings User Similarities Item Similarities 5 4 item-similarity1 3 ? 2 . . . . . . 5 4 3 item-similarity2 friendship user-similarity1 Additional Information Source friendship Additional Information Source item-similarity1 ? Additional Information Source Additional Information Source Existing Recommenders 26 Modeling and Reasoning over the Graph • Hinge-loss Markov random fields (HL-MRFs) [Bach et al., UAI 2013] – Exact, efficient, and scalable inference – Continuous random variables – Models defined by PSL programs • Probabilistic Soft Logic (PSL) [Bach et al., ArXiv 2015] – Statistical relational learning system – Logical probabilistic programming interface – Templating language for HL-MRFs 27 Modeling and Reasoning over the Graph • Hinge-loss Markov random fields (HL-MRFs) [Bach et al., UAI 2013] – Exact, efficient, and scalable inference – Continuous random variables – Models defined by PSL programs • Probabilistic Soft Logic (PSL) [Bach et al., ArXiv 2015] – Statistical relational learning system – Logical probabilistic programming interface – Templating language for HL-MRFs 28 Hinge-loss Markov Random Fields Conditional random field over continuous random variables between 0 and 1 29 Hinge-loss Markov Random Fields Conditional random field over continuous random variables between 0 and 1 Feature functions are hinge loss functions 30 Hinge-loss Markov Random Fields Conditional random field over continuous random variables between 0 and 1 Feature functions are hinge loss functions 31 Hinge-loss Markov Random Fields Conditional random field over continuous random variables between 0 and 1 Feature functions are hinge loss functions Linear function 32 Hinge-loss Markov Random Fields Conditional random field over continuous random variables between 0 and 1 Feature functions are hinge loss functions Linear function 33 Hinge-loss Markov Random Fields Conditional random field over continuous random variables between 0 and 1 Feature functions are hinge loss functions 2 Linear function 34 Hinge-loss Markov Random Fields Conditional random field over continuous random variables between 0 and 1 Feature functions are hinge loss functions Linear function 2 Hinge losses encode the distance to satisfaction for each instantiated rule 35 Efficient Inference in HL-MRFs • Energy function is convex, can find a global MAP state • The alternating direction method of multipliers (ADMM) is used for efficient and scalable inference 36 Probabilistic Soft Logic • Statistical relational learning language • Uses first-order logical rules • Τemplates HL-MRFs logical operators w : LikesGenre(U, G) && IsGenre(M, G) Rating(U, M) weight predicates 37 Probabilistic Soft Logic • Statistical relational learning language • Uses first-order logical rules • Τemplates HL-MRFs w : LikesGenre(U, G) && IsGenre(M, G) Rating(U, M) weight predicates 38 Probabilistic Soft Logic • Statistical relational learning language • Uses first-order logical rules • Τemplates HL-MRFs logical operators w : LikesGenre(U, G) && IsGenre(M, G) Rating(U, M) weight predicates 39 Probabilistic Soft Logic • Statistical relational learning language • Uses first-order logical rules • Τemplates HL-MRFs logical operators w : LikesGenre(U, G) && IsGenre(M, G) Rating(U, M) weight predicates 40 Probabilistic Soft Logic • Converts rules to hinge-loss potentials LikesGenre(U, G) && IsGenre(M, G) Rating(U, M) hinge-loss • PSL program = rules + data • Open source: http://psl.umiacs.umd.edu 41 Probabilistic Soft Logic • Converts rules to hinge-loss potentials LikesGenre(U, G) && IsGenre(M, G) Rating(U, M) max{LikesGenre(U, G) + IsGenre(M, G) Rating(U, M) -1, 0} hinge-loss • PSL program = rules + data • Open source: http://psl.umiacs.umd.edu 42 Probabilistic Soft Logic • Converts rules to hinge-loss potentials LikesGenre(U, G) && IsGenre(M, G) Rating(U, M) max{LikesGenre(U, G) + IsGenre(M, G) Rating(U, M) -1, 0} hinge-loss • PSL program = rules + data • Open source: http://psl.umiacs.umd.edu 43 Recommendations with HyPER • Similar items get similar ratings from a user – e.g. cosine, adjusted cosine, Pearson, content SimilarItemssim(i1, i2) && Rating(u, i1) Rating(u, i2) Rating(u,i1) = 5 Rating(u,i1) = ? SimilarItems(i1,i2) 44 Recommendations with HyPER • Similar users give similar ratings to an item – e.g. cosine, Pearson SimilarUserssim(u1, u2) && Rating(u1, i) Rating(u2, i) Rating(u1,i) = 4 SimilarUsers(u1,u2) Rating(u2,i) = ? 45 Recommendations with HyPER • Mean-centering priors AverageUserRating(u) Rating(u, i) AverageItemRating(i) Rating(u, i)data sources • Additional • Leveraging existing recommenders • e.g. matrix factorization, item-based 46 Recommendations with HyPER • Mean-centering priors AverageUserRating(u) Rating(u, i) AverageItemRating(i) Rating(u, i) • Social network links Friends(u1, u2) && Rating (u1, i) Rating(u2, i) • Leveraging existing recommenders • e.g. matrix factorization, item-based 47 Recommendations with HyPER • Mean-centering priors AverageUserRating(u) Rating(u, i) AverageItemRating(i) Rating(u, i) • Social network links Friends(u1, u2) && Rating (u1, i) Rating(u2, i) • Leveraging existing recommenders • e.g. matrix factorization, item-based RatingRecommender(u, i) Rating(u, i) 48 Recommendations with HyPER • Mean-centering priors AverageUserRating(u) Rating(u, i) AverageItemRating(i) Rating(u, i) • Social network links Friends(u1, u2) && Rating (u1, i) Rating(u2, i) • Leveraging existing recommenders • e.g. matrix factorization, item-based RatingRecommender(u, i) Rating(u, i) Extensible to new data/algorithms – just add rules! 49 Balancing the Rules • Balancing done through weights wj • Higher wj indicates a more important rule • Weight learning by approximating a gradient step in the conditional log-likelihood: 50 Experimental Validation • Yelp academic dataset – ~34k users, ~3.6k items, ~99k ratings – ~81k friendships – 514 business categories • Last.fm – ~1.8k users, ~17k items, ~92k ratings – ~12k friendships – ~9.7k artist tags • Evaluation metrics: RMSE, MAE https://www.yelp.com/academic_dataset http://grouplens.org/datasets/hetrec-2011/ 51 Baselines • Collaborative filtering systems – Item-based cf. [Ning et al., In Recommender Systems Handbook, 2015] – Matrix factorization (MF) cf. [Koren et al., IEEE Computer 42(8) 2009] – Bayesian probabilistic matrix factorization (BPMF) [Salakhutdinov & Mnih., ICML 2008] • Hybrid Systems – Naïve hybrid (averaged predictions) – BPMF with social relations and content (BPMF-SRIC) [Liu et al., DSS 55(3) 2013] 52 HyPER vs Baselines • HyPER outperforms all other models in both datasets • Results statistically significant 53 HyPER Submodels: Mean-centering • HyPER combined model beats individual rules 54 HyPER Submodels: User-based • HyPER combined model beats/matches best individual rules • Similar story for item-based, content & social 55 Combining the Baselines • HyPER can combine different recommenders effectively • Results statistically significant better 56 HyPER (All Rules) • Combining all rules achieves the best performance in both datasets 57 Scaling to Large Datasets • Parallel implementation for inference and learning based on ADMM [Bach et al, UAI 2013] • Scaling to big-data applications: – perform inference in parallel on densely connected subgraphs of the original graph – fully distributed implementation of ADMM 58 Conclusions • HyPER is a general-purpose, extensible framework for hybrid recommender systems • With HyPER, practitioners can define custom hybrid models for using all available data/algorithms, via logical rules in PSL • HyPER outperforms existing techniques on two popular datasets 59 Conclusions • HyPER is a general-purpose, extensible framework for hybrid recommender systems • With HyPER, practitioners can define custom hybrid models for using all available data/algorithms, via logical rules in PSL • HyPER outperforms existing techniques on two popular datasets Thank you for your attention! 60 HyPER Submodels – Item-based, Content & Social 61 References X. Ning, C. Desrosiers and G. Karypis. A comprehensive survey of neighborhood-based recommendation methods. In Recommender Systems Handbook. 2nd edition, Springer, 2015 S. Fakhraei, B. Huang, L. Raschid, and L. Getoor. Network-based drug-target interaction prediction with probabilistic soft logic. Transactions on Computational Biology and Bioinformatics, 11(5), 2014. J. Liu, C. Wu, and W. Liu. Bayesian probabilistic matrix factorization with social relations and item contents for recommendation. Decision Support Systems, 55(3), 2013. R. Salakhutdinov and A. Mnih. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In ICML, 2008. Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. IEEE Computer, 42(8), 2009. A. Gunawardana and C. Meek. A unified approach to building hybrid recommender systems. In RecSys, 2009. R. Burke. Hybrid web recommender systems. In The Adaptive Web. Springer, 2007. L. de Campos, J. Fernandez-Luna, J. Huete, and M. Rueda-Morales. Combining content-based and collaborative recommendations: A hybrid approach based on Bayesian networks. International Journal of Approximate Reasoning, 51(7), 2010. M. Jahrer, A. T•oscher, and R. Legenstein. Combining predictions for accurate recommender systems. In KDD, 2010. 62 References J. Hoxha and A. Rettinger. First-order probabilistic model for hybrid recommendations. In ICMLA, 2013. S. H. Bach, B. Huang, B. London, and L. Getoor. Hinge-loss Markov random fields: Convex inference for structured prediction. In UAI, 2013. S.H. Bach, M. Broecheler, B. Huang, and L. Getoor. Hinge-loss Markov random fields and probabilistic soft logic. ArXiv:1505.04406 [cs.LG], 2015. A. P. Forbes and M. Zhu. Content-boosted matrix factorization for recommender systems: Experiments with recipe recommendation. In RecSys, 2011. J. Chen, G. Chen, H. Zhang, J. Huang, and G. Zhao. Social recommendation based on multi-relational analysis. In WI-IAT, 2012. R. Burke, F. Vahedian, and B. Mobasher. Hybrid recommendation in heterogeneous networks. In User Modeling, Adaptation, and Personalization. Springer, 2014. J. Gemmell, T. S., B. Mobasher, and R. Burke. Resource recommendation in social annotation systems: A linear-weighted hybrid approach. Journal of Computer and System Sciences, 78(4), 2012. X. Yu, X. Ren, Y. Sun, Q. Gu, B. Sturt, U. Khandelwal, B. Norick, and J. Han. Personalized entity recommendation: A heterogeneous information network approach. In WSDM, 2014. H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King. Recommender systems with social regularization. In WSDM, 2011. J. McAuley and J. Leskovec. Hidden factors and hidden topics: Understanding rating dimensions with review text. In RecSys, 2013. G. Ling, M. R. Lyu, and I. King. Ratings meet reviews, a combined approach to recommend. In RecSys, 2014. I. Guy, N. Zwerdling, I. Ronen, D. Carmel, and E. Uziel. Social media recommendation based on people and tags. In SIGIR, 2010. S. Sedhain, S. Sanner, D. Braziunas, L. Xie, and J. Christensen. Social collaborative ltering for cold-start recommendations. In RecSys, 2014. 63