Overview of KDDCUP 2011 Nathan Liu nliu@cse.ust.hk KDDCUP 2011 Music Recommendation • KDDCUP is the most prominent data mining competition. • In recent years, there have been a number of contest related to movie recommendation: – Netflix 2006: predict future ratings – KDDCUP 2007: how many ratings and who rated what – CAMRA 2010: context aware movie recommendation • KDDCUP 2011 is organized by yahoo and provides the first and largest music ratings datasets. Yahoo Music KDDCUP 2011 • There are three types of items: songs, artists, albums. • Songs and albums are annotated with genres. • You are given the date, time and scores of each user’s ratings of these different items. • Challenges: – Scale: biggest public dataset ever. 1 million user, 0.6 million items, 300 million ratings – Hierarchical item relation: song belong to albums, albums belong to artists. All of them are annotated with genre tags. – Rich meta data: over 900 genres – Fine temporal resolution: no previous challenge provided time in addition to date. • For the project, you will be provided with a small subset of the data and we will held a mini internal competition to determine which group obtained the best results. KDDCUP 2011: Task 1 • The test set consists of hold out ratings from users in the training set. Each rating is time stamped. • In the test set, you are given who rated which items at what time. • You are asked to predict the rating scores. • Closely related to Netflix competition, but may require time of day effect consideration. • References: – Koren. Matrix Factorization Techniques for Recommender Systems. (IEEE Computer 2009) – Koren. Collaborative Filtering with Temporal Dynamics (KDD’09) – Xiong. Time-Evolving Collaborative Filtering (SDM’10) – Liu. Online Evolutionary Collaborative Filtering (RECSYS’10) KDDCUP 2011: Task 2 • The test set consists of hold out ratings from users in the training set. Time has been removed. • In the test set, you are given 6 items for each user. • You are asked to predict which 3 of the 6 are actually rated by the user. • Closely related to KDDCUP 2007 “who rated what” and CAMRA2010 weekly recommendation track • References: – Hu. Collaborative Filtering for Implicit Feedback Datasets (ICDM’08) – Rendle. Bayesian Personalized Ranking from Implicit Feedback (UAI’09) – Cremonesi. Performance of Recommender Algorithms on Top-N Recommendation Tasks (RECSYS’10) – Steck. Training and Testing of Recommender Systems on Data Missing Not at Random (KDD’10) For The Project • We will extract a subset for you to work on. • We will provide some basic algorithms. • You can choose to work on one of the two tasks. • The minimum requirement is that you should run thorough experiments with the provided algorithms and write a report on your findings about different algorithms. • There are also new things to try…. Things to Try (1): Ensemble • Same algorithm different parameter settings • Different algorithms • Stacking: – What meta learner? Gradient Boosted Decision Tree, Linear Regression – Any meta features? Tail vs. Head segmentation strategy • References: – Bao et. al. Stacking Recommendation Engines with Additional Meta-Features (RECSYS’09) – Jahrer et. al. Combining Predictions for Accurate Recommender Systems (KDD’10) Things to Try (3): Exploiting Item Relations and Genres • From social network of users to networks of items. • Combining collaborative filtering with genre based prediction for alleviating sparseness. • References: – Ma. Recommender Systems with Social Regularization (WSDM’11) – Agarwal. Regression based Latent Factor Models (KDD’09) – Popescul. Probabilistic Models for Unified Collaborative and Content-based Recommendation in sparse-data environments (UAI’01) – Gunawardana. Tied Boltzman Machines for Cold Start Recommendations (RecSys’08) Things to Try (2): Temporal Dynamics • Various possible types of temporal dynamics: – Long term effect: people getting pickier over time – Short term effect: festival mood – Time of day effect: day time vs. night time preference – Periodicity: every Friday night is party time • References: • Koren. Collaborative Filtering with Temporal Dynamics (KDD’09)