Slides

advertisement
MRI: Meaningful Interpretations
of Collaborative Ratings
Mahashweta Das
Gautam Das
Sihem Amer-Yahia
Cong Yu
37th International Conference on Very Large Data Bases, 2011 @ Seattle
Roadmap





Introduction
 Motivation
 Problem: MRI
 Sub problem: DEM
 Sub problem: DIM
Data Model
Algorithms
Experiments
 Quantitative
 Qualitative
Conclusion & Future Work
VLDB 2011
2
Roadmap





Introduction
 Motivation
 Problem: MRI
 Sub problem: DEM
 Sub problem: DIM
Data Model
Algorithms
Experiments
 Quantitative
 Qualitative
Conclusion & Future Work
VLDB 2011
3
Motivation
VLDB 2011
4
Motivation
VLDB 2011
5
Motivation
VLDB 2011
6
Motivation


Examining reviews vs. trusting overall aggregate rating
IMDB ratings demographic breakdown not meaningful
enough
VLDB 2011
7
MRI Problem


Examining reviews vs. trusting overall aggregate rating
IMDB ratings demographic breakdown not meaningful
enough
 Novel and powerful third option: Meaningful Rating Interpretation

VLDB 2011
Explain ratings by leveraging user and item attribute information
8
MRI Problem


Examining reviews vs. trusting overall aggregate rating
IMDB ratings demographic breakdown not meaningful
enough
 Novel and powerful third option: Meaningful Rating Interpretation


VLDB 2011
Explain ratings by leveraging user and item attribute information
Example:
9
MRI Problem


Examining reviews vs. trusting overall aggregate rating
IMDB ratings demographic breakdown not meaningful
enough
 Novel and powerful third option: Meaningful Rating Interpretation


VLDB 2011
Explain ratings by leveraging user and item attribute information
Example:
10
MRI Sub-problem

DEM: Meaningful Description Mining
 Identify groups of reviewers who consistently share similar
ratings on items
VLDB 2011
11
MRI Sub-problem

DEM: Meaningful Description Mining
 Identify groups of reviewers who consistently share similar
ratings on items
VLDB 2011
12
MRI Sub-problem

DIM: Meaningful Difference Mining
 Identify groups of reviewers who consistently disagree on item
ratings
VLDB 2011
13
MRI Sub-problem

DIM: Meaningful Difference Mining
 Identify groups of reviewers who consistently disagree on item
ratings
VLDB 2011
14
Roadmap





Introduction
 Motivation
 Problem: MRI
 Sub problem: DEM
 Sub problem: DIM
Data Model
Algorithms
Experiments
 Quantitative
 Qualitative
Conclusion & Future Work
VLDB 2011
15
Data Model

Collaborative rating site: <Set of Items, Set of Users, Ratings>
 Rating tuple:
<item attributes, user attributes, rating>
ID
Title
Genre
Director
Name
Gender
Location
Rating
1
Titanic
Drama
James
Cameron
Amy
Female
New York
8.5
2
Schindler’s
List
Drama
Steven
Speilberg
John
Male
New York
7.0

Group: Set of ratings describable by a set of attribute values

Notion of group based on data cube
 OLAP literature for mining multidimensional data
VLDB 2011
16
Data Model

Notion of group based on data cube lattice
Each node in lattice
is a data cube/cuboid
Query condition on
database
Figure: 4-Dimensional Data Cube Lattice
VLDB 2011
17
Data Model

Notion of group based on data cube lattice
Each node in lattice
is a data cube/cuboid
Query condition on
database
A = Gender
B = Age
C = Location
D = Occupation
Figure: 4-Dimensional Data Cube Lattice
VLDB 2011
18
Data Model
Each node/data cube/
cuboid in lattice is a group
Selection Query Condition
A = Gender: Male
B = Age: Young
C = Location: CA
D = Occupation: Student
Figure: Partial Rating Lattice for a Movie
(M:Male, Y:Young, CA:California, S:Student)
VLDB 2011
19
Data Model
Each node/data cube/
cuboid in lattice is a group
Selection Query Condition
A = Gender: Male
B = Age: Young
C = Location: CA
D = Occupation: Student
Figure: Partial Rating Lattice for a Movie
(M:Male, Y:Young, CA:California, S:Student)
VLDB 2011
20
Data Model
Task
Quickly indentify
“good” groups in the
lattice that help users
understand ratings
effectively
Figure: Partial Rating Lattice for a Movie
(M:Male, Y:Young, CA:California, S:Student)
VLDB 2011
21
Roadmap





Introduction
 Motivation
 Problem: MRI
 Sub problem: DEM
 Sub problem: DIM
Data Model
Algorithms
Experiments
 Quantitative
 Qualitative
Conclusion & Future Work
VLDB 2011
22
DEM: Meaningful Description Mining

For an input item covering RI ratings, return set C of cuboids, such that:
 description error
is minimized, subject to:
 |C| ≤ k;
 coverage
≥a
Description Error
Measures how well a cuboid average rating approximates the numerical
score of each individual rating belonging to it
Coverage
Measures the percentage of ratings covered by the returned cuboids

DEM is NP-Hard: Proof details in paper
VLDB 2011
23
DEM Algorithms

Exact Algorithm (E-DEM)
 Brute-force enumerating all possible combinations of cuboids in
lattice to return the exact (i.e., optimal) set as rating descriptions

Random Restart Hill Climbing Algorithm
 Often fails to satisfy Coverage constraint; Large number of
restarts required
 Need an algorithm that optimizes both Coverage and Description
Error constraints simultaneously

Randomized Hill Exploration Algorithm (RHE-DEM)
VLDB 2011
24
RHE-DEM Algorithm
Satisfy Coverage
Minimize Error
C= {Male, Student}
{California, Student}
Figure: Partial Rating Lattice for a Movie; k=2, a=80%
(M:Male, Y:Young, CA:California, S:Student)
VLDB 2011
25
RHE-DEM Algorithm
Satisfy Coverage
Minimize Error
C= {Male, Student}
{California, Student}
Say,C does not satisfy
Coverage Constraint
Figure: Partial Rating Lattice for a Movie; k=2, a=80%
(M:Male, Y:Young, CA:California, S:Student)
VLDB 2011
26
RHE-DEM Algorithm
Satisfy Coverage
Minimize Error
C= {Male, Student}
{California, Student}
C= {Male}
{California,Student}
C= {Student}
{California,Student}
Figure: Partial Rating Lattice for a Movie; k=2, a=80%
(M:Male, Y:Young, CA:California, S:Student)
VLDB 2011
27
RHE-DEM Algorithm
Satisfy Coverage √
Minimize Error
C= {Male}
{California, Student}
Say, C satisfies
Coverage Constraint
Figure: Partial Rating Lattice for a Movie; k=2, a=80%
(M:Male, Y:Young, CA:California, S:Student)
VLDB 2011
28
RHE-DEM Algorithm
Satisfy Coverage √
Minimize Error
C= {Male}
{California, Student}
Figure: Partial Rating Lattice for a Movie; k=2, a=80%
(M:Male, Y:Young, CA:California, S:Student)
VLDB 2011
29
RHE-DEM Algorithm
Satisfy Coverage √
Minimize Error
C= {Male}
{California, Student}
Figure: Partial Rating Lattice for a Movie; k=2, a=80%
(M:Male, Y:Young, CA:California, S:Student)
VLDB 2011
30
RHE-DEM Algorithm
Satisfy Coverage √
Minimize Error
√
C= {Male}
{Student}
Figure: Partial Rating Lattice for a Movie; k=2, a=80%
(M:Male, Y:Young, CA:California, S:Student)
VLDB 2011
31
DIM: Meaningful Difference Mining

For an input item covering RI+ RI- ratings, return set C of cuboids, such that:
 difference balance
is minimized, subject to:
 |C| ≤ k;

≥a∩
≥a
Difference Balance
Measures whether the positive and negative ratings are “mingled together"
(high balance) or “separated apart" (low balance)
Coverage
Measures the percentage of +, - ratings covered by the returned cuboids

DIM is NP-Hard: Proof details in paper
VLDB 2011
32
DIM Algorithms

Exact Algorithm (E-DIM)

Randomized Hill Exploration Algorithm (RHE-DIM)


Unlike DEM “error”, DIM “balance” computation is expensive
 Quadratic computation scanning all possible positive and
negative ratings for each set of cuboids
Introduce the concept of Fundamental Regions to aid faster
balance computation
 Partition space of all ratings and aggregate rating tuples in
each region
VLDB 2011
33
DIM Algorithms: Fundamental Region
C1 = {Male, Student}
C2 = {California, Student}
Balance =
Figure: Computing Balance using Fundamental Region
Set of k=2 cuboids having 75 ratings (44+, 31-),10 ratings (6+, 4-)
VLDB 2011
34
Roadmap





Introduction
 Motivation
 Problem: MRI
 Sub problem: DEM
 Sub problem: DIM
Data Model
Algorithms
Experiments
 Quantitative
 Qualitative
Conclusion & Future Work
VLDB 2011
35
Experiments


Dataset
MovieLens:100,000 ratings for 1682 movies by 943 users
Each user has 4 attributes: Gender, Age, Occupation, Location
Binning the movies: Order movies according to number of
ratings and then partition into 6 bins






Bin 1: movies with fewest ratings, Bin 6: movies with highest ratings
Evaluation
Quantitative Indicator: Efficiency, Quality and Scalability
Qualitative Indicator: Mechanical Turk User Study
VLDB 2011
36
Quantitative Experiments: DEM
VLDB 2011
37
Quantitative Experiments: DEM
VLDB 2011
38
Qualitative Experiments: User Study

Amazon Mechanical Turk study
 Two sets: one for description mining, one for difference mining
 Each set: 4 randomly chosen movies, 30 independent singleuser tasks


Study 1: Users prefer simple aggregate ratings over rating
interpretations
Study 2: Users prefer rating interpretations by exact algorithm or
heuristic randomized hill exploration algorithm
VLDB 2011
39
Qualitative Experiments: User Study
VLDB 2011
40
Roadmap





Introduction
 Motivation
 Problem: MRI
 Sub problem: DEM
 Sub problem: DIM
Data Model
Algorithms
Experiments
 Quantitative
 Qualitative
Conclusion & Future Work
VLDB 2011
41
Conclusion and Future Work

Novel problem of meaningful rating interpretation (MRI) in
collaborative rating sites
 Meaningful Description Mining
 Meaningful Difference Mining

Heuristic algorithmic solutions that generate equally good rating
interpretations as exact brute-force with much less execution time

Meaningful interpretations of ratings by reviewers of interest

Additional constraints such as diversity of rating explanations
VLDB 2011
42
Related Work

Data Cubes






Clustering & Dimensionality Reduction


Gray et. al, A relational aggregation operator generalizing group-by, cross-tab,
and sub-totals, ICDE 1996
Sathe et. al, Intelligent rollups in multidimensional olap data, VLDB 2001
Lakshmanan et. al, Quotient cube: how to summarize the semantics of a data
cube, VLDB 2002
Ramakrishnan et. al, Exploratory mining in cube space, ICDM 2006
Wu et. al, Promotion analysis in multi-dimensional space, VLDB 2009
Agrawal et. al, Automatic subspace clustering of high dimensional data for data
mining applications, SIGMOD 1998
Recommendation Explanation


Herlocker et. al, Explaining collaborative filtering recommendations, CSCW 2000
Bilgic et. al, Explaining recommendations: Satisfaction vs. promotion, IUI 2005
VLDB 2011
43
Thank You
Questions
Quantitative Experiments: DIM
VLDB 2011
45
Quantitative Experiments: DEM, DIM
VLDB 2011
46
Download