1 IDEAS 2011 Lisbon 21-23 September MINING SEMANTIC DATA FOR SOLVING FIRST-RATER AND COLD-START PROBLEMS IN RECOMMENDER SYSTEMS María N. Moreno, Saddys Segrera, Vivian F. López, M. Dolores Muñoz and Ángel Luis Sánchez Data Mining Research Group http://mida.usal.es Department of Computing and Automatic CEDI 2010 Contents Introduction Recommender Systems Recommendation framework Case Study Conclusions Mining Semantic Data for Solving First-rater and Cold-start Problems in Recommender Systems María N. Moreno, Saddys Segrera, Vivian F. López, M. Dolores Muñoz and Ángel Luis Sánchez CEDI 2010 Introduction Recommender systems commerce Server Recommender systems provide users with intelligent mechanisms to find products to purchase Catalog Applications: e-commerce, e-learning, tourism, news’ pages… Drawbacks: low performance, low reliability of recommendations… Client Mining Semantic Data for Solving First-rater and Cold-start Problems in Recommender Systems María N. Moreno, Saddys Segrera, Vivian F. López, M. Dolores Muñoz and Ángel Luis Sánchez CEDI 2010 Introduction Proposal Objective: overcome critical drawbacks in recommender systems Methodology: Semantic based Web Mining Associative classification (Web Mining) Machine learning technique that combines concepts from classification and association Domain-specific ontology (Semantic Web) Enrichment of the data to be mined with semantic annotations Mining Semantic Data for Solving First-rater and Cold-start Problems in Recommender Systems María N. Moreno, Saddys Segrera, Vivian F. López, M. Dolores Muñoz and Ángel Luis Sánchez CEDI 2010 Recommender Systems Classification of recommendation methods Content-based: compare text documents to user profiles Collaborative filtering: is based on opinions of other users (ratings) Memory based (User-based): find users with similar preferences (neighbors) by means of statistical techniques Model based (Item-based): use data mining techniques to develop a model of user ratings Mining Semantic Data for Solving First-rater and Cold-start Problems in Recommender Systems María N. Moreno, Saddys Segrera, Vivian F. López, M. Dolores Muñoz and Ángel Luis Sánchez CEDI 2010 Recommender Systems Critical drawbacks Sparsity: the number of ratings needed for prediction is greater than the number of the ratings obtained from users Scalability: performance problems presented mainly in memorybased methods where the computation time grows linearly with both the number of customers and the number of products in the site First-rater problem: new products never have been rated, therefore they cannot be recommended Cold-Start problem: new users cannot receive recommendations since they have no evaluations about products Mining Semantic Data for Solving First-rater and Cold-start Problems in Recommender Systems María N. Moreno, Saddys Segrera, Vivian F. López, M. Dolores Muñoz and Ángel Luis Sánchez CEDI 2010 Recommendation framework Associative classification (Web Mining) Sparsity: slightly sensitive to sparse data Scalability: model based approach Domain-specific ontology (Semantic Web) First-rater problem: Use of taxonomies to classify products Induction of abstracts patterns which relate user profiles with categories of products Cold-Start problem: Recommendations based on user profiles Mining Semantic Data for Solving First-rater and Cold-start Problems in Recommender Systems María N. Moreno, Saddys Segrera, Vivian F. López, M. Dolores Muñoz and Ángel Luis Sánchez CEDI 2010 Recommendation framework Off-line process Data mining algorithms Historical data Domain ontology Provide annotations Low level model Historical data with semantic annotations Data mining algorithms High level model On-line process [new user] Registration Check high level model Recommendation request Active user Check high level model new products [old user] Recommendations Check low level model old products Mining Semantic Data for Solving First-rater and Cold-start Problems in Recommender Systems María N. Moreno, Saddys Segrera, Vivian F. López, M. Dolores Muñoz and Ángel Luis Sánchez CEDI 2010 Case Study MovieLens Data Movies Data User Data ID Gender Age Occupation Num. Binary Num. String Zip ID Title Genre (19 attributes) Num. Num. String Binary Ratings Data score ID User ID Movie ID Rating Num. Num. Num. Num. (1 - 5) rating_bin CEDI 2010 Case Study MovieLens Data ID User Gender Num. Binary *User Age < 18 [18, 24] [25, 34] [35, 44] [45, 49] [50, 55] > 55 User Occupation Movie Title String String *Movie Genre String CEDI 2010 Case Study Ontology definition Mining Semantic Data for Solving First-rater and Cold-start Problems in Recommender Systems María N. Moreno, Saddys Segrera, Vivian F. López, M. Dolores Muñoz and Ángel Luis Sánchez CEDI 2010 Case Study Results Associative classification methods (CBA, CMAR, FOIL and CPAR) were compared to non-associative classification algorithms Mining Semantic Data for Solving First-rater and Cold-start Problems in Recommender Systems María N. Moreno, Saddys Segrera, Vivian F. López, M. Dolores Muñoz and Ángel Luis Sánchez CEDI 2010 Conclusions A framework for recommender systems is proposed in order to overcome some critical drawbacks The proposal combines web mining methods and domain specific ontologies in order to induce models at two abstraction levels: The low level model relates users, movies and ratings for making the recommendations High level model is used for recommender not rated movies or for making recommendation to new users and overcome the first-rater and the cold-start problem The off-line model induction avoids scalability problems in recommendation time Associative classification methods provides a way to deal with sparsity problem Mining Semantic Data for Solving First-rater and Cold-start Problems in Recommender Systems María N. Moreno, Saddys Segrera, Vivian F. López, M. Dolores Muñoz and Ángel Luis Sánchez IDEAS 2011 Lisbon 21-23 September THANKS FOR YOUR ATTENTION ! MINING SEMANTIC DATA FOR SOLVING FIRST-RATER AND COLDSTART PROBLEMS IN RECOMMENDER SYSTEMS María N. Moreno*, Saddys Segrera, Vivian F. López, M. Dolores Muñoz & Ángel Luis Sánchez *mmg@usal.es Department of Computing and Automatic