MRI: Meaningful Interpretations of Collaborative Ratings Mahashweta Das Gautam Das Sihem Amer-Yahia Cong Yu 37th International Conference on Very Large Data Bases, 2011 @ Seattle Roadmap Introduction Motivation Problem: MRI Sub problem: DEM Sub problem: DIM Data Model Algorithms Experiments Quantitative Qualitative Conclusion & Future Work VLDB 2011 2 Roadmap Introduction Motivation Problem: MRI Sub problem: DEM Sub problem: DIM Data Model Algorithms Experiments Quantitative Qualitative Conclusion & Future Work VLDB 2011 3 Motivation VLDB 2011 4 Motivation VLDB 2011 5 Motivation VLDB 2011 6 Motivation Examining reviews vs. trusting overall aggregate rating IMDB ratings demographic breakdown not meaningful enough VLDB 2011 7 MRI Problem Examining reviews vs. trusting overall aggregate rating IMDB ratings demographic breakdown not meaningful enough Novel and powerful third option: Meaningful Rating Interpretation VLDB 2011 Explain ratings by leveraging user and item attribute information 8 MRI Problem Examining reviews vs. trusting overall aggregate rating IMDB ratings demographic breakdown not meaningful enough Novel and powerful third option: Meaningful Rating Interpretation VLDB 2011 Explain ratings by leveraging user and item attribute information Example: 9 MRI Problem Examining reviews vs. trusting overall aggregate rating IMDB ratings demographic breakdown not meaningful enough Novel and powerful third option: Meaningful Rating Interpretation VLDB 2011 Explain ratings by leveraging user and item attribute information Example: 10 MRI Sub-problem DEM: Meaningful Description Mining Identify groups of reviewers who consistently share similar ratings on items VLDB 2011 11 MRI Sub-problem DEM: Meaningful Description Mining Identify groups of reviewers who consistently share similar ratings on items VLDB 2011 12 MRI Sub-problem DIM: Meaningful Difference Mining Identify groups of reviewers who consistently disagree on item ratings VLDB 2011 13 MRI Sub-problem DIM: Meaningful Difference Mining Identify groups of reviewers who consistently disagree on item ratings VLDB 2011 14 Roadmap Introduction Motivation Problem: MRI Sub problem: DEM Sub problem: DIM Data Model Algorithms Experiments Quantitative Qualitative Conclusion & Future Work VLDB 2011 15 Data Model Collaborative rating site: <Set of Items, Set of Users, Ratings> Rating tuple: <item attributes, user attributes, rating> ID Title Genre Director Name Gender Location Rating 1 Titanic Drama James Cameron Amy Female New York 8.5 2 Schindler’s List Drama Steven Speilberg John Male New York 7.0 Group: Set of ratings describable by a set of attribute values Notion of group based on data cube OLAP literature for mining multidimensional data VLDB 2011 16 Data Model Notion of group based on data cube lattice Each node in lattice is a data cube/cuboid Query condition on database Figure: 4-Dimensional Data Cube Lattice VLDB 2011 17 Data Model Notion of group based on data cube lattice Each node in lattice is a data cube/cuboid Query condition on database A = Gender B = Age C = Location D = Occupation Figure: 4-Dimensional Data Cube Lattice VLDB 2011 18 Data Model Each node/data cube/ cuboid in lattice is a group Selection Query Condition A = Gender: Male B = Age: Young C = Location: CA D = Occupation: Student Figure: Partial Rating Lattice for a Movie (M:Male, Y:Young, CA:California, S:Student) VLDB 2011 19 Data Model Each node/data cube/ cuboid in lattice is a group Selection Query Condition A = Gender: Male B = Age: Young C = Location: CA D = Occupation: Student Figure: Partial Rating Lattice for a Movie (M:Male, Y:Young, CA:California, S:Student) VLDB 2011 20 Data Model Task Quickly indentify “good” groups in the lattice that help users understand ratings effectively Figure: Partial Rating Lattice for a Movie (M:Male, Y:Young, CA:California, S:Student) VLDB 2011 21 Roadmap Introduction Motivation Problem: MRI Sub problem: DEM Sub problem: DIM Data Model Algorithms Experiments Quantitative Qualitative Conclusion & Future Work VLDB 2011 22 DEM: Meaningful Description Mining For an input item covering RI ratings, return set C of cuboids, such that: description error is minimized, subject to: |C| ≤ k; coverage ≥a Description Error Measures how well a cuboid average rating approximates the numerical score of each individual rating belonging to it Coverage Measures the percentage of ratings covered by the returned cuboids DEM is NP-Hard: Proof details in paper VLDB 2011 23 DEM Algorithms Exact Algorithm (E-DEM) Brute-force enumerating all possible combinations of cuboids in lattice to return the exact (i.e., optimal) set as rating descriptions Random Restart Hill Climbing Algorithm Often fails to satisfy Coverage constraint; Large number of restarts required Need an algorithm that optimizes both Coverage and Description Error constraints simultaneously Randomized Hill Exploration Algorithm (RHE-DEM) VLDB 2011 24 RHE-DEM Algorithm Satisfy Coverage Minimize Error C= {Male, Student} {California, Student} Figure: Partial Rating Lattice for a Movie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student) VLDB 2011 25 RHE-DEM Algorithm Satisfy Coverage Minimize Error C= {Male, Student} {California, Student} Say,C does not satisfy Coverage Constraint Figure: Partial Rating Lattice for a Movie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student) VLDB 2011 26 RHE-DEM Algorithm Satisfy Coverage Minimize Error C= {Male, Student} {California, Student} C= {Male} {California,Student} C= {Student} {California,Student} Figure: Partial Rating Lattice for a Movie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student) VLDB 2011 27 RHE-DEM Algorithm Satisfy Coverage √ Minimize Error C= {Male} {California, Student} Say, C satisfies Coverage Constraint Figure: Partial Rating Lattice for a Movie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student) VLDB 2011 28 RHE-DEM Algorithm Satisfy Coverage √ Minimize Error C= {Male} {California, Student} Figure: Partial Rating Lattice for a Movie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student) VLDB 2011 29 RHE-DEM Algorithm Satisfy Coverage √ Minimize Error C= {Male} {California, Student} Figure: Partial Rating Lattice for a Movie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student) VLDB 2011 30 RHE-DEM Algorithm Satisfy Coverage √ Minimize Error √ C= {Male} {Student} Figure: Partial Rating Lattice for a Movie; k=2, a=80% (M:Male, Y:Young, CA:California, S:Student) VLDB 2011 31 DIM: Meaningful Difference Mining For an input item covering RI+ RI- ratings, return set C of cuboids, such that: difference balance is minimized, subject to: |C| ≤ k; ≥a∩ ≥a Difference Balance Measures whether the positive and negative ratings are “mingled together" (high balance) or “separated apart" (low balance) Coverage Measures the percentage of +, - ratings covered by the returned cuboids DIM is NP-Hard: Proof details in paper VLDB 2011 32 DIM Algorithms Exact Algorithm (E-DIM) Randomized Hill Exploration Algorithm (RHE-DIM) Unlike DEM “error”, DIM “balance” computation is expensive Quadratic computation scanning all possible positive and negative ratings for each set of cuboids Introduce the concept of Fundamental Regions to aid faster balance computation Partition space of all ratings and aggregate rating tuples in each region VLDB 2011 33 DIM Algorithms: Fundamental Region C1 = {Male, Student} C2 = {California, Student} Balance = Figure: Computing Balance using Fundamental Region Set of k=2 cuboids having 75 ratings (44+, 31-),10 ratings (6+, 4-) VLDB 2011 34 Roadmap Introduction Motivation Problem: MRI Sub problem: DEM Sub problem: DIM Data Model Algorithms Experiments Quantitative Qualitative Conclusion & Future Work VLDB 2011 35 Experiments Dataset MovieLens:100,000 ratings for 1682 movies by 943 users Each user has 4 attributes: Gender, Age, Occupation, Location Binning the movies: Order movies according to number of ratings and then partition into 6 bins Bin 1: movies with fewest ratings, Bin 6: movies with highest ratings Evaluation Quantitative Indicator: Efficiency, Quality and Scalability Qualitative Indicator: Mechanical Turk User Study VLDB 2011 36 Quantitative Experiments: DEM VLDB 2011 37 Quantitative Experiments: DEM VLDB 2011 38 Qualitative Experiments: User Study Amazon Mechanical Turk study Two sets: one for description mining, one for difference mining Each set: 4 randomly chosen movies, 30 independent singleuser tasks Study 1: Users prefer simple aggregate ratings over rating interpretations Study 2: Users prefer rating interpretations by exact algorithm or heuristic randomized hill exploration algorithm VLDB 2011 39 Qualitative Experiments: User Study VLDB 2011 40 Roadmap Introduction Motivation Problem: MRI Sub problem: DEM Sub problem: DIM Data Model Algorithms Experiments Quantitative Qualitative Conclusion & Future Work VLDB 2011 41 Conclusion and Future Work Novel problem of meaningful rating interpretation (MRI) in collaborative rating sites Meaningful Description Mining Meaningful Difference Mining Heuristic algorithmic solutions that generate equally good rating interpretations as exact brute-force with much less execution time Meaningful interpretations of ratings by reviewers of interest Additional constraints such as diversity of rating explanations VLDB 2011 42 Related Work Data Cubes Clustering & Dimensionality Reduction Gray et. al, A relational aggregation operator generalizing group-by, cross-tab, and sub-totals, ICDE 1996 Sathe et. al, Intelligent rollups in multidimensional olap data, VLDB 2001 Lakshmanan et. al, Quotient cube: how to summarize the semantics of a data cube, VLDB 2002 Ramakrishnan et. al, Exploratory mining in cube space, ICDM 2006 Wu et. al, Promotion analysis in multi-dimensional space, VLDB 2009 Agrawal et. al, Automatic subspace clustering of high dimensional data for data mining applications, SIGMOD 1998 Recommendation Explanation Herlocker et. al, Explaining collaborative filtering recommendations, CSCW 2000 Bilgic et. al, Explaining recommendations: Satisfaction vs. promotion, IUI 2005 VLDB 2011 43 Thank You Questions Quantitative Experiments: DIM VLDB 2011 45 Quantitative Experiments: DEM, DIM VLDB 2011 46