State of the Art Recommendation Systems: the Good, the Bad and the Ugly

advertisement
Collaborative Filtering:
Tuck Siong Chung
Roland Rust
Michel Wedel
Choice Conference 2007
Outline



Collaborative Filtering in Practice
Ratings: Do they work?
A Scalable Recommendation System
Collaborative Filtering




Recommendation systems make predictions of items of interest
based on user information and/or product characteristics
Collaborative filtering systems make predictions what items
interest a user by using information from other users.
Origin: Information Tapestry project at Xerox PARC.
System-Input:



Taxonomy:




Active: ratings by users, text comments, expert opinions
Passive: purchase data, usage data, browsing data
attribute based (this author also wrote …)
item-to-item (people who bought this item also bought …)
people-to-people (users like you …)
Method:


Memory-based: use past data and matching heuristics
Model-based: use models to make predictions
Patents Filed 1995-2005
35
Microsoft Corporation
Amazon.com, Inc.
Sony
International Business Ma
Clix Network, Inc.
FOLIOfn.inc
Koninklijke Philips
Rosetta Marketing Strateg
Q-Tec Systems LLC
Nokia Corporation
MusicGenome.Com
30
25
20
# of patents
Frequency
17
8
4
3
3
3
3
2
2
2
2
15
10
5
0
1996
1997
1998
1999
2000
2001
year
Total: 128
2002
2003
2004
2005
Patents by Product and Medium
CATEGORY
SERVICE
PRODUCT
MEDIUM
MOBILE&COMMUNICATION
MEDIA
INTERNET BASED
#CASES %CASES #MENTIONS
MEDIA
37
43.50%
885
MUSIC
33
38.80%
367
SONG
27
31.80%
321
ENTERTAINMENT
16
18.80%
48
MOVIE
7
8.20%
26
ADVERTISEMENT
7
8.20%
35
RESTAURANT
5
5.90%
10
COUPON
4
4.70%
56
BOOK
14
16.50%
43
CD
13
15.30%
24
ELECTRONICS
2
2.40%
6
DVD
2
2.40%
11
PHONE
19
22.40%
68
PDA
15
17.60%
27
MOBILE
12
14.10%
86
GPS
2
2.40%
7
TELEVISION
20
23.50%
328
RADIO
16
18.80%
270
BROADCAST
16
18.80%
225
COMPUTER
65
76.50%
754
WEB
55
64.70%
530
E-MAIL
9
10.60%
39
Patents by Data and Engine
DATA
MECHANICS
USER PROFILE
RATING
VOICE
REVIEW
FEEDBACK
FREQUENCY
CORRELATION
CLUSTERING
ARTIFICIAL INTELLIGENCE
BAYESIAN
FUZZY
REGRESSION
LOGISTIC REGRESSION
#CASES %CASES #MENTIONS
40 47.10%
619
25 29.40%
142
19 22.40%
182
19 22.40%
245
14 16.50%
72
23 27.10%
107
13 15.30%
72
11 12.90%
54
10 11.80%
20
8
9.40%
73
4
4.70%
5
3
3.50%
4
1
1.20%
6
Some Examples






Pandora:
 Customizes web broadcasts based on song attributes
MSNBC's Newsbot
 most popular list and recommendations for news items
Findory
 News item recommendations based on user clickstream
StoryCode
 book recommendations based on user reviews
MovieLens
 movie recommendations based on user ratings
Epinions
 User reviews in many categories and user profiles
Developments in Practice

Massive Data:




Amazon: over 6 million product reviews
TiVo: 100 million ratings of 30,000 TV shows
Google News: millions of news items from 4500
sources updated minute-by-minute
Shifts:




from collaborative filtering to hybrid systems
from ratings data to purchase/usage data
from e-tailer systems to stand-alone services
to integration with social network sites
Eye-Tracking Analysis of Ratings-Usage
Product Photo
Price
Recommendation
Other Attributes
Chosen Not Chosen
7.292
4.237
1.589
1.286
1.286
0.829
0.563
0.363
Some Problems with Ratings






Cold Start. Before an individual has interacted with the recommendation system,
no information is available that enables the system to generate useful
recommendations. That makes these systems unsuitable for customer retention
Missingness. Customers rate only a very small subset of all available items,
perhaps only those they like or dislike and the ratings history of any particular
customer is extremely sparse. In addition, the product rating data is missing nonrandomly (Ying, Feinberg and Wedel 2006).
Scale Usage. Many recommendation systems ask customers to award products
1-5 stars. But, people use scales differently. Recommendations based on ratings
may reflect scale usage behavior rather than product preference (Rossi, Gilula and
Allenby 2001).
Shilling. Users (human or agent) may provide specially crafted ratings that cause
the recommendation system to make the desired recommendations. Shilling
attacks have been shown to be effective in particular for infrequently
recommended items (Lam and Riedl 2004).
Endogeneity. Choice behavior from customers is constrained by the
recommendations based on purchase/usage received in the past. For model-based
approaches biases will accumulate and the quality of the recommendation will
decline (Ebbes, Wedel, Bockenholt and Steerneman 2005).
Scalability. Model-based recommendation systems proposed in the academic
literature are estimated with MCMC algorithms that are not scalable to datasets
with the number of individuals and attributes encountered in practice (Ridgeway
and Madigan 2002).
Some Problems with Ratings






Cold Start. Before an individual has interacted with the recommendation system,
no information is available that enables the system to generate useful
recommendations. That makes these systems unsuitable for customer retention
Missingness. Customers rate only a very small subset of all available items,
perhaps only those they like or dislike and the ratings history of any particular
customer is extremely sparse. In addition, the product rating data is missing nonrandomly (Ying, Feinberg and Wedel 2006).
Scale Usage. Many recommendation systems ask customers to award products
1-5 stars. But, people use scales differently. Recommendations based on ratings
may reflect scale usage behavior rather than product preference (Rossi, Gilula and
Allenby 2001).
Shilling. Users (human or agent) may provide specially crafted ratings that cause
the recommendation system to make the desired recommendations. Shilling
attacks have been shown to be effective in particular for infrequently
recommended items (Lam and Riedl 2004).
Endogeneity. Choice behavior from customers is constrained by the
recommendations based on purchase/usage received in the past. For model-based
approaches biases will accumulate and the quality of the recommendation will
decline (Ebbes, Wedel, Bockenholt and Steerneman 2005).
Scalability. Model-based recommendation systems proposed in the academic
literature are estimated with MCMC algorithms that are not scalable to datasets
with the number of individuals and attributes encountered in practice (Ridgeway
and Madigan 2002).
Some Problems with Ratings






Cold Start. Before an individual has interacted with the recommendation system,
no information is available that enables the system to generate useful
recommendations. That makes these systems unsuitable for customer retention
Missingness. Customers rate only a very small subset of all available items,
perhaps only those they like or dislike and the ratings history of any particular
customer is extremely sparse. In addition, the product rating data is missing nonrandomly (Ying, Feinberg and Wedel 2006).
Scale Usage. Many recommendation systems ask customers to award products
1-5 stars. But, people use scales differently. Recommendations based on ratings
may reflect scale usage behavior rather than product preference (Rossi, Gilula and
Allenby 2001).
Shilling. Users (human or agent) may provide specially crafted ratings that cause
the recommendation system to make the desired recommendations. Shilling
attacks have been shown to be effective in particular for infrequently
recommended items (Lam and Riedl 2004).
Endogeneity. Choice behavior from customers is constrained by the
recommendations based on purchase/usage received in the past. For model-based
approaches biases will accumulate and the quality of the recommendation will
decline (Ebbes, Wedel, Bockenholt and Steerneman 2005).
Scalability. Model-based recommendation systems proposed in the academic
literature are estimated with MCMC algorithms that are not scalable to datasets
with the number of individuals and attributes encountered in practice (Ridgeway
and Madigan 2002).
Some Problems with Ratings






Cold Start. Before an individual has interacted with the recommendation system,
no information is available that enables the system to generate useful
recommendations. That makes these systems unsuitable for customer retention
Missingness. Customers rate only a very small subset of all available items,
perhaps only those they like or dislike and the ratings history of any particular
customer is extremely sparse. In addition, the product rating data is missing nonrandomly (Ying, Feinberg and Wedel 2006).
Scale Usage. Many recommendation systems ask customers to award products
1-5 stars. But, people use scales differently. Recommendations based on ratings
may reflect scale usage behavior rather than product preference (Rossi, Gilula and
Allenby 2001).
Shilling. Users (human or agent) may provide specially crafted ratings that cause
the recommendation system to make the desired recommendations. Shilling
attacks have been shown to be effective in particular for infrequently
recommended items (Lam and Riedl 2004).
Endogeneity. Choice behavior from customers is constrained by the
recommendations based on purchase/usage received in the past. For model-based
approaches biases will accumulate and the quality of the recommendation will
decline (Ebbes, Wedel, Bockenholt and Steerneman 2005).
Scalability. Model-based recommendation systems proposed in the academic
literature are estimated with MCMC algorithms that are not scalable to datasets
with the number of individuals and attributes encountered in practice (Ridgeway
and Madigan 2002).
Some Problems with Ratings






Cold Start. Before an individual has interacted with the recommendation system,
no information is available that enables the system to generate useful
recommendations. That makes these systems unsuitable for customer retention
Missingness. Customers rate only a very small subset of all available items,
perhaps only those they like or dislike and the ratings history of any particular
customer is extremely sparse. In addition, the product rating data is missing nonrandomly (Ying, Feinberg and Wedel 2006).
Scale Usage. Many recommendation systems ask customers to award products
1-5 stars. But, people use scales differently. Recommendations based on ratings
may reflect scale usage behavior rather than product preference (Rossi, Gilula and
Allenby 2001).
Shilling. Users (human or agent) may provide specially crafted ratings that cause
the recommendation system to make the desired recommendations. Shilling
attacks have been shown to be effective in particular for infrequently
recommended items (Lam and Riedl 2004).
Endogeneity. Choice behavior from customers is constrained by the
recommendations based on purchase/usage received in the past. For model-based
approaches biases will accumulate and the quality of the recommendation will
decline (Ebbes, Wedel, Bockenholt and Steerneman 2005).
Scalability. Model-based recommendation systems proposed in the academic
literature are estimated with MCMC algorithms that are not scalable to datasets
with the number of individuals and attributes encountered in practice (Ridgeway
and Madigan 2002).
Some Problems with Ratings






Cold Start. Before an individual has interacted with the recommendation system,
no information is available that enables the system to generate useful
recommendations. That makes these systems unsuitable for customer retention
Missingness. Customers rate only a very small subset of all available items,
perhaps only those they like or dislike and the ratings history of any particular
customer is extremely sparse. In addition, the product rating data is missing nonrandomly (Ying, Feinberg and Wedel 2006).
Scale Usage. Many recommendation systems ask customers to award products
1-5 stars. But, people use scales differently. Recommendations based on ratings
may reflect scale usage behavior rather than product preference (Rossi, Gilula and
Allenby 2001).
Shilling. Users (human or agent) may provide specially crafted ratings that cause
the recommendation system to make the desired recommendations. Shilling
attacks have been shown to be effective in particular for infrequently
recommended items (Lam and Riedl 2004).
Endogeneity. Choice behavior from customers is constrained by the
recommendations based on purchase/usage received in the past. For model-based
approaches biases will accumulate and the quality of the recommendation will
decline (Ebbes, Wedel, Bockenholt and Steerneman 2005).
Scalability. Model-based recommendation systems proposed in the academic
literature are estimated with MCMC algorithms that are not scalable to datasets
with the number of individuals and attributes encountered in practice (Ridgeway
and Madigan 2002).
Studies have shown that


Recommendation agents may reduce the prices paid (Diehl, Kornish,
and Lynch 2003) and improve decision quality and efficiency (Ariely,
Lynch, and Aparicio 2004; Haübl and Trifts 2000; West 1996), and
may influence user opinions (Cosley e.a. 2003; Haubel & Murray
2003). Agents and collaborative filtering learn at different rates
(Ariely, Lynch & Aparicio 2004) and their effectiveness depends on
the similarity with the users (Aksoy e.a. 2006).
Model-based methods, including



Bayes net (Breese, Heckerman, & Kadie 1998), Nearest Neighbor
(Herlocker, Konstan & Riedl 2002), Tree-based (Breese, Heckerman &
Kadie, 1998), Mixture (Chien & George 1999), Dual Mixture (Bodapati
2007) HB models (Ansari, Essegaier & Kohli 2000), HB selection models
(Ying, Feinberg & Wedel 2004).
in most cases show substantial improvements in the quality of
recommendations on test datasets.
However, the models in the academic literature are mostly estimated
with MCMC algorithms and are not scalable.
A Music Recommendation System

Model-Based Play-lists generated


Hybrid System


Combines recommendation agent and
collaborative filtering
Scalable:


Problems with scale usage, missing data and
shilling are alleviated
Large n, large p
Sequential Recommendations

Alleviates endogeneity
Tuck Siong Chung
Ph.D. Thesis
Conclusions



Massive Data Pose Challenges in
Collaborative Filtering
Other problems relate to the use of ratings
We proposed and tested a method that



Utilizes usage data
Is a hybrid agent/collaborative filtering approach
Yields impressive recommendation performance
Download