Slides - Computer Science and Engineering

advertisement
HyPER: A Flexible and Extensible
Probabilistic Framework for
Hybrid Recommender Systems
Pigi Kouki, Shobeir Fakhraei, James Foulds, Magdalini Eirinaki, Lise Getoor
University of California, Santa Cruz
University of Maryland, College Park
San Jose State University
Motivation
• Increasing amount of data useful for recommendations
ratings
social
content
demographic
2
Multiple Data Sources
Combining ratings with other
data sources improves performance
• Content
– [Gunawardana and Meek, RecSys 2009]
– [Forbes and Zhu, RecSys 2011]
– [de Campos et al., IJAR 51(7) 2010]
• Social relationships
– [Ma et al., WSDM 2011]
– [Liu et al., DSS 55(3) 2013]
3
Multiple Data Sources
Combining ratings with other
data sources improves performance
• Content
– [Gunawardana and Meek, RecSys 2009]
– [Forbes and Zhu, RecSys 2011]
– [de Campos et al., IJAR 51(7) 2010]
• Social relationships
– [Ma et al., WSDM 2011]
– [Liu et al., DSS 55(3) 2013]
4
Multiple Data Sources
Combining ratings with other
data sources improves performance
• Content
– [Gunawardana and Meek, RecSys 2009]
– [Forbes and Zhu, RecSys 2011]
– [de Campos et al., IJAR 51(7) 2010]
• Social relationships
– [Ma et al., WSDM 2011]
– [Liu et al., DSS 55(3) 2013]
5
Multiple Data Sources
Combining ratings with other
data sources improves performance
• Review text
– [McAuley & Leskovec, RecSys 2013]
– [Ling et al., RecSys, 2014]
• Tags and labels
– [Guy et al., SIGIR 2010]
#cool #neat #ok #sucks
• Feedback
– [Sedhain et al., RecSys, 2014]
6
Multiple Data Sources
Combining ratings with other
data sources improves performance
• Review text
– [McAuley & Leskovec, RecSys 2013]
– [Ling et al., RecSys, 2014]
• Tags and labels
– [Guy et al., SIGIR 2010]
#cool #neat #ok #sucks
• Feedback
– [Sedhain et al., RecSys, 2014]
7
Multiple Data Sources
Combining ratings with other
data sources improves performance
• Review text
– [McAuley & Leskovec, RecSys 2013]
– [Ling et al., RecSys, 2014]
• Tags and labels
– [Guy et al., SIGIR 2010]
#cool #neat #ok #sucks
• Feedback
– [Sedhain et al., RecSys, 2014]
8
Multiple Recommenders
Combining predictions of multiple
recommenders also improves performance
“Predictive accuracy is substantially improved when blending multiple predictors”
-[Bell et al., The BellKor Solution to the Netflix Prize, 2007]
See also:
•
•
[Jahrer et al., KDD 2010]
[Burke, In The Adaptive Web, 2007]
9
Desiderata for Hybrid Systems
• To get the best performance, we should make use of
all available data sources and algorithms
• We need a framework that is:
– General
• Combines arbitrary data modalities
• Combines multiple recommenders
• problem and data-agnostic
– Extensible to new information sources/recommenders
– Scalable to large data sets
10
Desiderata for Hybrid Systems
• To get the best performance, we should make use of
all available data sources and algorithms
• We need a framework that is:
– General
• Combines arbitrary data modalities
• Combines multiple recommenders
• problem and data-agnostic
– Extensible to new information sources/recommenders
– Scalable to large data sets
11
Desiderata for Hybrid Systems
• To get the best performance, we should make use of
all available data sources and algorithms
• We need a framework that is:
– General
• Combines arbitrary data modalities
• Combines multiple recommenders
• problem and data-agnostic
– Extensible to new information sources/recommenders
– Scalable to large data sets
12
General Hybrid Recommenders
in the Literature
• Existing hybrid systems, though powerful, typically fall
short on either generality, extensibility, or scalability
– Often combine collaborative and/or content-based methods
with each other or just one other data modality
(cf. previous slides)
– Some systems can leverage heterogeneous data
• [Gemmell et al. 2012, Burke et al. 2014, Yu et al. 2014]
• Probabilistic graphical modeling approaches are
typically more general, less scalable
– Bayesian networks [de Campos et al., IJAR 51(7) 2010]
– Markov logic networks [Hoxha & Rettinger, ICMLA 2013]
13
General Hybrid Recommenders
in the Literature
• Existing hybrid systems, though powerful, typically fall
short on either generality, extensibility, or scalability
– Often combine collaborative and/or content-based methods
with each other or just one other data modality
(cf. previous slides)
– Some systems can leverage heterogeneous data
• [Gemmell et al. 2012, Burke et al. 2014, Yu et al. 2014]
• Probabilistic graphical modeling approaches are
typically more general, less scalable
– Bayesian networks [de Campos et al., IJAR 51(7) 2010]
– Markov logic networks [Hoxha & Rettinger, ICMLA 2013]
14
Our Approach
We propose HyPER:
Hybrid Probabilistic Extensible Recommender
• A general, extensible, scalable recommender framework
• Leverages advances in statistical relational learning
– Probabilistic soft logic [Bach et al., UAI 2013, ArXiv 2015]
• Inspired by recent work in drug-target interaction prediction
[Fakhraei et al., Transactions on Computational Biology and Bioinformatics 11(5) 2014]
15
Hybrid Modeling with HyPER
3
Recommender
4
Data Source
…
Predicted Ratings
16
Hybrid Modeling with HyPER
3
Recommender
4
Data Source 1
…
Predicted Ratings
Data Source 2
…
Data Source N
17
Hybrid Modeling with HyPER
Recommender 1
3
Data Source 1
Recommender 2
…
Data Source 2
…
Recommender M
HyPER
4
…
Predicted Ratings
Data Source N
18
HyPER: High-Level Approach
• User-item ratings viewed as a
weighted bipartite graph
• Build hybrid model by adding links
to encode additional information
– multiple user and item similarities,
social information,…
• Predict ratings by reasoning over
the graph, via a graphical model
19
HyPER: High-Level Approach
• User-item ratings viewed as a
weighted bipartite graph
• Build hybrid model by adding links
to encode additional information
– multiple user and item similarities,
social information,…
• Predict ratings by reasoning over
the graph, via a graphical model
20
HyPER: High-Level Approach
• User-item ratings viewed as a
weighted bipartite graph
• Build hybrid model by adding links
to encode additional information
– multiple user and item similarities,
social information,…
• Predict ratings by reasoning over
the graph, via a graphical model
21
Extended Recommendation Graph
Ratings
User Similarities
Item Similarities
5
4
item-similarity1
3
?
2
.
.
.
.
.
.
5 4
3
item-similarity2
friendship
user-similarity1
Additional
Information Source
friendship
Additional
Information Source
item-similarity1
?
Additional
Information Source
Additional
Information Source
Existing
Recommenders
22
Extended Recommendation Graph
Ratings
User Similarities
Item Similarities
5
4
item-similarity1
3
?
2
.
.
.
.
.
.
5 4
3
item-similarity2
friendship
user-similarity1
Additional
Information Source
friendship
Additional
Information Source
item-similarity1
?
Additional
Information Source
Additional
Information Source
Existing
Recommenders
23
Extended Recommendation Graph
Ratings
User Similarities
Item Similarities
5
4
item-similarity1
3
?
2
.
.
.
.
.
.
5 4
3
item-similarity2
friendship
user-similarity1
Additional
Information Source
friendship
Additional
Information Source
item-similarity1
?
Additional
Information Source
Additional
Information Source
Existing
Recommenders
24
Extended Recommendation Graph
Ratings
User Similarities
Item Similarities
5
4
item-similarity1
3
?
2
.
.
.
.
.
.
5 4
3
item-similarity2
friendship
user-similarity1
Additional
Information Source
friendship
Additional
Information Source
item-similarity1
?
Additional
Information Source
Additional
Information Source
Existing
Recommenders
25
Extended Recommendation Graph
Ratings
User Similarities
Item Similarities
5
4
item-similarity1
3
?
2
.
.
.
.
.
.
5 4
3
item-similarity2
friendship
user-similarity1
Additional
Information Source
friendship
Additional
Information Source
item-similarity1
?
Additional
Information Source
Additional
Information Source
Existing
Recommenders
26
Modeling and Reasoning over the Graph
• Hinge-loss Markov random fields (HL-MRFs)
[Bach et al., UAI 2013]
– Exact, efficient, and scalable inference
– Continuous random variables
– Models defined by PSL programs
• Probabilistic Soft Logic (PSL) [Bach et al., ArXiv 2015]
– Statistical relational learning system
– Logical probabilistic programming interface
– Templating language for HL-MRFs
27
Modeling and Reasoning over the Graph
• Hinge-loss Markov random fields (HL-MRFs)
[Bach et al., UAI 2013]
– Exact, efficient, and scalable inference
– Continuous random variables
– Models defined by PSL programs
• Probabilistic Soft Logic (PSL) [Bach et al., ArXiv 2015]
– Statistical relational learning system
– Logical probabilistic programming interface
– Templating language for HL-MRFs
28
Hinge-loss Markov Random Fields
Conditional random field over continuous random variables
between 0 and 1
29
Hinge-loss Markov Random Fields
Conditional random field over continuous random variables
between 0 and 1
Feature functions are hinge loss functions
30
Hinge-loss Markov Random Fields
Conditional random field over continuous random variables
between 0 and 1
Feature functions are hinge loss functions
31
Hinge-loss Markov Random Fields
Conditional random field over continuous random variables
between 0 and 1
Feature functions are hinge loss functions
Linear function
32
Hinge-loss Markov Random Fields
Conditional random field over continuous random variables
between 0 and 1
Feature functions are hinge loss functions
Linear function
33
Hinge-loss Markov Random Fields
Conditional random field over continuous random variables
between 0 and 1
Feature functions are hinge loss functions
2
Linear function
34
Hinge-loss Markov Random Fields
Conditional random field over continuous random variables
between 0 and 1
Feature functions are hinge loss functions
Linear function
2
Hinge losses encode the
distance to satisfaction
for each instantiated rule
35
Efficient Inference in HL-MRFs
• Energy function is convex,
can find a global MAP state
• The alternating direction method of multipliers
(ADMM) is used for efficient and scalable inference
36
Probabilistic Soft Logic
• Statistical relational learning language
• Uses first-order logical rules
• Τemplates HL-MRFs
logical operators
w : LikesGenre(U, G) && IsGenre(M, G) 
Rating(U, M)
weight
predicates
37
Probabilistic Soft Logic
• Statistical relational learning language
• Uses first-order logical rules
• Τemplates HL-MRFs
w : LikesGenre(U, G) && IsGenre(M, G) 
Rating(U, M)
weight
predicates
38
Probabilistic Soft Logic
• Statistical relational learning language
• Uses first-order logical rules
• Τemplates HL-MRFs
logical operators
w : LikesGenre(U, G) && IsGenre(M, G) 
Rating(U, M)
weight
predicates
39
Probabilistic Soft Logic
• Statistical relational learning language
• Uses first-order logical rules
• Τemplates HL-MRFs
logical operators
w : LikesGenre(U, G) && IsGenre(M, G) 
Rating(U, M)
weight
predicates
40
Probabilistic Soft Logic
• Converts rules to hinge-loss potentials
LikesGenre(U, G) && IsGenre(M, G) 
Rating(U, M)
hinge-loss
• PSL program = rules + data
• Open source: http://psl.umiacs.umd.edu
41
Probabilistic Soft Logic
• Converts rules to hinge-loss potentials
LikesGenre(U, G) && IsGenre(M, G) 
Rating(U, M)
max{LikesGenre(U, G) + IsGenre(M, G) Rating(U, M) -1, 0}
hinge-loss
• PSL program = rules + data
• Open source: http://psl.umiacs.umd.edu
42
Probabilistic Soft Logic
• Converts rules to hinge-loss potentials
LikesGenre(U, G) && IsGenre(M, G) 
Rating(U, M)
max{LikesGenre(U, G) + IsGenre(M, G) Rating(U, M) -1, 0}
hinge-loss
• PSL program = rules + data
• Open source: http://psl.umiacs.umd.edu
43
Recommendations with HyPER
• Similar items get similar ratings from a user
– e.g. cosine, adjusted cosine, Pearson, content
SimilarItemssim(i1, i2) && Rating(u, i1)
 Rating(u, i2)
Rating(u,i1) = 5
Rating(u,i1) = ?
SimilarItems(i1,i2)
44
Recommendations with HyPER
• Similar users give similar ratings to an item
– e.g. cosine, Pearson
SimilarUserssim(u1, u2) && Rating(u1, i) 
Rating(u2, i)
Rating(u1,i) = 4
SimilarUsers(u1,u2)
Rating(u2,i) = ?
45
Recommendations with HyPER
• Mean-centering priors
AverageUserRating(u)  Rating(u,
i)
AverageItemRating(i)  Rating(u,
i)data sources
• Additional
• Leveraging existing recommenders
• e.g. matrix factorization, item-based
46
Recommendations with HyPER
• Mean-centering priors
AverageUserRating(u)  Rating(u,
i)
AverageItemRating(i)  Rating(u,
i)
• Social network
links
Friends(u1, u2) && Rating (u1, i) 
Rating(u2, i)
• Leveraging existing recommenders
• e.g. matrix factorization, item-based
47
Recommendations with HyPER
• Mean-centering priors
AverageUserRating(u)  Rating(u,
i)
AverageItemRating(i)  Rating(u,
i)
• Social network
links
Friends(u1, u2) && Rating (u1, i) 
Rating(u2, i)
• Leveraging existing recommenders
• e.g. matrix factorization, item-based
RatingRecommender(u, i) 
Rating(u, i)
48
Recommendations with HyPER
• Mean-centering priors
AverageUserRating(u)  Rating(u,
i)
AverageItemRating(i)  Rating(u,
i)
• Social network
links
Friends(u1, u2) && Rating (u1, i) 
Rating(u2, i)
• Leveraging existing recommenders
• e.g. matrix factorization, item-based
RatingRecommender(u, i) 
Rating(u, i)
Extensible to new data/algorithms – just add rules!
49
Balancing the Rules
• Balancing done through weights wj
• Higher wj indicates a more important rule
• Weight learning by approximating a gradient
step in the conditional log-likelihood:
50
Experimental Validation
• Yelp academic dataset
– ~34k users, ~3.6k items, ~99k ratings
– ~81k friendships
– 514 business categories
• Last.fm
– ~1.8k users, ~17k items, ~92k ratings
– ~12k friendships
– ~9.7k artist tags
• Evaluation metrics: RMSE, MAE
https://www.yelp.com/academic_dataset
http://grouplens.org/datasets/hetrec-2011/
51
Baselines
• Collaborative filtering systems
– Item-based cf. [Ning et al., In Recommender Systems Handbook, 2015]
– Matrix factorization (MF) cf. [Koren et al., IEEE Computer 42(8) 2009]
– Bayesian probabilistic matrix factorization (BPMF)
[Salakhutdinov & Mnih., ICML 2008]
• Hybrid Systems
– Naïve hybrid (averaged predictions)
– BPMF with social relations and content (BPMF-SRIC)
[Liu et al., DSS 55(3) 2013]
52
HyPER vs Baselines
• HyPER outperforms all other models in both datasets
• Results statistically significant
53
HyPER Submodels: Mean-centering
• HyPER combined model beats individual rules
54
HyPER Submodels: User-based
• HyPER combined model beats/matches
best individual rules
• Similar story for item-based, content & social
55
Combining the Baselines
• HyPER can combine different recommenders effectively
• Results statistically significant better
56
HyPER (All Rules)
• Combining all rules achieves the best performance in
both datasets
57
Scaling to Large Datasets
• Parallel implementation for inference and
learning based on ADMM [Bach et al, UAI 2013]
• Scaling to big-data applications:
– perform inference in parallel on densely
connected subgraphs of the original graph
– fully distributed implementation of ADMM
58
Conclusions
• HyPER is a general-purpose, extensible
framework for hybrid recommender systems
• With HyPER, practitioners can define
custom hybrid models for using all available
data/algorithms, via logical rules in PSL
• HyPER outperforms existing techniques on
two popular datasets
59
Conclusions
• HyPER is a general-purpose, extensible
framework for hybrid recommender systems
• With HyPER, practitioners can define
custom hybrid models for using all available
data/algorithms, via logical rules in PSL
• HyPER outperforms existing techniques on
two popular datasets
Thank you for your attention!
60
HyPER Submodels – Item-based, Content & Social
61
References
X. Ning, C. Desrosiers and G. Karypis. A comprehensive survey of neighborhood-based recommendation
methods. In Recommender Systems Handbook. 2nd edition, Springer, 2015
S. Fakhraei, B. Huang, L. Raschid, and L. Getoor. Network-based drug-target interaction prediction with
probabilistic soft logic. Transactions on Computational Biology and Bioinformatics, 11(5), 2014.
J. Liu, C. Wu, and W. Liu. Bayesian probabilistic matrix factorization with social relations and item contents for
recommendation. Decision Support Systems, 55(3), 2013.
R. Salakhutdinov and A. Mnih. Bayesian probabilistic matrix factorization using Markov chain Monte Carlo. In
ICML, 2008.
Y. Koren, R. Bell, and C. Volinsky. Matrix factorization techniques for recommender systems. IEEE Computer,
42(8), 2009.
A. Gunawardana and C. Meek. A unified approach to building hybrid recommender systems. In RecSys, 2009.
R. Burke. Hybrid web recommender systems. In The Adaptive Web. Springer, 2007.
L. de Campos, J. Fernandez-Luna, J. Huete, and M. Rueda-Morales. Combining content-based and collaborative
recommendations: A hybrid approach based on Bayesian networks. International Journal of Approximate
Reasoning, 51(7), 2010.
M. Jahrer, A. T•oscher, and R. Legenstein. Combining predictions for accurate recommender systems. In KDD,
2010.
62
References
J. Hoxha and A. Rettinger. First-order probabilistic model for hybrid recommendations. In ICMLA, 2013.
S. H. Bach, B. Huang, B. London, and L. Getoor. Hinge-loss Markov random fields: Convex inference for structured
prediction. In UAI, 2013.
S.H. Bach, M. Broecheler, B. Huang, and L. Getoor. Hinge-loss Markov random fields and probabilistic soft logic.
ArXiv:1505.04406 [cs.LG], 2015.
A. P. Forbes and M. Zhu. Content-boosted matrix factorization for recommender systems: Experiments with recipe
recommendation. In RecSys, 2011.
J. Chen, G. Chen, H. Zhang, J. Huang, and G. Zhao. Social recommendation based on multi-relational analysis. In WI-IAT,
2012.
R. Burke, F. Vahedian, and B. Mobasher. Hybrid recommendation in heterogeneous networks. In User Modeling,
Adaptation, and Personalization. Springer, 2014.
J. Gemmell, T. S., B. Mobasher, and R. Burke. Resource recommendation in social annotation systems: A linear-weighted
hybrid approach. Journal of Computer and System Sciences, 78(4), 2012.
X. Yu, X. Ren, Y. Sun, Q. Gu, B. Sturt, U. Khandelwal, B. Norick, and J. Han. Personalized entity recommendation: A
heterogeneous information network approach. In WSDM, 2014.
H. Ma, D. Zhou, C. Liu, M. R. Lyu, and I. King. Recommender systems with social regularization. In WSDM, 2011.
J. McAuley and J. Leskovec. Hidden factors and hidden topics: Understanding rating dimensions with review text. In
RecSys, 2013.
G. Ling, M. R. Lyu, and I. King. Ratings meet reviews, a combined approach to recommend. In RecSys, 2014.
I. Guy, N. Zwerdling, I. Ronen, D. Carmel, and E. Uziel. Social media recommendation based on people and tags. In
SIGIR, 2010.
S. Sedhain, S. Sanner, D. Braziunas, L. Xie, and J. Christensen. Social collaborative ltering for cold-start
recommendations. In RecSys, 2014.
63
Download